For the past year and change I've been working with dkg and Linus Nordberg on Certificate Transparency Gossip. I'll assume you're familiar with Certificate Transparency (you can read more about it here.) The point of CT Gossip is to detect Certificate Transparency logs that have misbehaved (either accidentally, maliciously, or by having been compromised.)
The CT Gossip spec is large, and complicated - perhaps too complicated to be fully implemented! This blog post is not about an overview of the specification, but rather about a nuanced problem we faced during the development - and why we made the decision we made. I'll take this problem largely into the abstract - focusing on the difficulty of providing protections against an intelligent adversary with statistics on their side. I won't reframe the problem or go back to the drawing board here. I imagine someone will want to, and we can have that debate. But right now I want to focus on the problem directly in front of us.
In several points of the Gossip protocol an entity will have a bucket of items. We will call the entity the 'server' for simplicity - this is not always the case, but even when it is the web browser (a client), we can model it as a server. So the server has a bucket of items and a client (who will be our adversary) can request items from the bucket.
The server will respond with a selection of items of its choosing - which items and how many to respond with are choices the server makes. The server also chooses to delete items from the bucket at a time and by a policy of the server's choosing.
What's in the bucket? Well by and large they are innocuous items. But when an adversary performs an attack - evidence of that attack is placed into the bucket. The goal of the adversary is to 'flush' the evidence out of the bucket such that it is not sent to any legitimate clients, and is only sent to the adversary (who will of course delete the evidence of their attack.) Besides requesting items from the bucket, the attacker can place (innocuous) items into the bucket, causing the bucket to require more storage space.
The adversary can create any number of Sybils (or fake identities) - so there's no point in the server trying to track who they send an item to in an effort to send it to a diversity of requestors. We assume this approach will always fail, as the adversary can simply create false identities on different network segments.
Similarly, it's not clear how to distinguish normal client queries from an adversary performing a flushing attack. So we don't make an effort to do so.
Our goal is to define policies for the 'Release' Algorithm (aka 'which items from the bucket do I send') and the 'Deletion' Algorithm (aka 'do I delete this item from the bucket') such that an attacker is unsure about whether or not a particular item (evidence of their attack) actually remains in the bucket - or if they have successfully flushed it.
This problem is tantalizingly close to existing problems that exist in mix networks. Perhaps the best treatment of the flushing attack, and how different mixing algorithms resist it, is From a Trickle to a Flood from 2002.
But as intimated - while the problem is close, it is not the same. In particular, when (most | deployed) mix networks release a message, they remove it from the server. They do not retain it and send a duplicate of it later. Whereas in our situation, that is absolutely the case. This difference is very important.
The second difference is the attacker's goal. With Mix Networks, the attacker's goal is not to censor or discard messages, but instead to track them. In our model, we do want to eliminate messages from the network.
Defining The Attacker
So we have defined the problem: Server has a bucket. Attacker wants to flush an item from the bucket. How can we make the attacker unsure if they've flushed it? But we haven't defined the capabilities of the attacker.
To start with, we assume the attacker knows the algorithm. The server will draw random numbers during it, but the probabilities that actions will be taken are fixed probabilities (or are determined by a known algorithm.)
If we don't place limits on the attacker, we can never win. For example, if the attacker is all-powerful it can just peek inside the bucket. If the attacker can send an infinite number of queries per second - infinity times any small number is still infinity.
So we define the costs and limits. An attacker's cost is time and queries. They need to complete an attack before sufficient clock time (literally meaning hours or days) elapses, and they need to complete the attack using less than a finite number of queries. This number of queries is actually chosen to be a function of clock time - we assume the attacker has infinite bandwidth and is only gated by how quickly they can generate queries. We also assume the attacker is able to control the network of the server for a limited period of time - meaning they can isolate the server from the internet and ensure the only queries it receives are the attacker's. (Not that the server knows this of course.)
The defender's cost is disk space. With infinite disk space, the defender can win - we must design a mechanism that allows the defender to win without using infinite disk space.
An attacker WINS if they can achieve ANY of these three objectives:
- Determine with certainty greater than 50% whether an item remains in the opponent's bucket while sending fewer than M queries to the opponent.
- Determine with certainty greater than 50% whether an item remains in the opponent's bucket before N amount of time has past
- Cause the defender to use more than O bytes of storage.
M is chosen to be a number of queries that we consider feasible for an attacker to do in a set period of time. N is chosen to be long enough that sustaining the attack represents undue political or technical burden on an adversary. O is chosen to be a disk space size large enough that client developers or server operators are scared off of deploying Gossip.
Those numbers are pretty well established, but what about N and O? Basically, we can only make a 'good guess' about these. For example, sustaining a BGP hijack of Twitter or Facebook's routes for more than a short period of time would be both noticeable and potentially damaging politically. TLS MITM attacks have, in the past, been confined to brief period of time. And O? How much disk space is too much? In both cases we'll have to evaluate things in terms of "I know it when I see it."
An Introduction to the Statistics We'll Need
Let's dive into the math and see, if we use the structure above, how we might design a defense that meets our 768-million mark.
It turns out, the statistics of this isn't that hard. We'll use a toy example first.
- When I query the server, it has a 10% chance of returning an object, if it has it - and it performs this 10% test for each item. (You'll note that one of the assumptions we make about the 'Retrieval Algorithm' is that is evaluates each item independently.)
Thanks to the wonder of statistics - if it never sends me the object, then is no way to be certain it does not have it. I could have just gotten really, really unlucky over those umpteen million queries.
But the probability of being that unlucky, of not receiving the object after N queries if the server has it - that can be calculated. I'll call this, colloquially, being 'confident' to a certain degree.
How many queries must I make to be 50% confident the server does not have an object? 75%? 90%?
- Assume the server has the item. The probability of not receiving the item after one query is 90%.
- After two queries: 90% x 90% or 81%. Successive multiplications yield the following:
- ~59% chance of not receiving the item after 5 queries
- ~35% chance of not receiving the item after 10 queries
The equation is a specific instance of the Binomial Probability Formula:
F(n) = nCr * p^r * q^(n-r) nCr is the 'n choose r' equation: n! / (r! * (n-r)!) p is the probability of the event happening (here .1) r is the number of desired outcomes (here it is 0 - we want no item to be returned) q is the probability of the event not happening (here 1 - .1 or .9) n is the number of trials
Our equations can be checked:22 queries to be 90% confident the server does not have the item.
Also worth noting is that equation can be thankfully simplified. Because r is 0, we only need to calculate q^(n) - which matches our initial thought process.
Going Back to the 768 Million
So here's what to do with this math: I can use this method to figure out what the probability of sending an item will need to be, to defend against an attacker using the definition of winning we define above. I want .50 = q^(768million). That is to say, I want, after 768 Million queries, an attacker to have a 50% confidence level that the item does not remain in the bucket.
Now it just so happens that Wolfram Alpha can't solve the 768-millionth root of .5, but it can solve the 76.896 millionth root of .5 so we'll go with that. It's .99999999098591.
That is to say, to achieve the 50% confidence interval the probability of sending an item from the bucket needs to be about .00000009%.
Do you see a problem here? One problem is that I never actually defined the defender having the goal of ever sending an item! At this probability, an item has a 50% of being sent after about 50 million requests. I don't know how long it takes Google to reach the number of visits - but realistically this means the 'evidence of attack' would just never get shared.
So.... Send it more frequently?
This math, sending it so infrequently, would surely represent the end game. In the beginning, surely we would send the item more frequently, and then the more we send it, the less often we would send it. We could imagine it as a graph:
| | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x +-------------------------------------------------------------------------------
But the problem, remember, is not just figuring out when to send the item, but also when to delete it.
Consider Deleting After Sending?
Let's imagine a simple deletion algorithm.
- The server will 'roll for deletion' after sending the item to a client who requests it.
- The likelihood of deletion shall be 1%.
Now recall in the beginning, after an item is newly placed into the bucket, it shall be sent with high probability. Let's fix this probability at a lowly 40%, and say this probability applies for the first 500 times it is sent. What is the probability that an item has been deleted by the 500th response? It is 99%. And how many queries are needed on average by the attacker to have the item returned 500 times at 40% probability of sending? It is (thanks to some trial and error) 1249.
What this means is that an attacker who sends on average 1249 queries in the beginning (right after the evidence of the attack goes into the bucket) can be supremely confident that the item has been deleted.
Then, the attacker sends more queries - but far fewer than the 768-million figure. If the item is not returned in short order, the attacker can be very confident that the item was deleted. This is because at the top of that curve, the likelihood of receiving the item quickly is very good. When the item doesn't appear quickly, it's either because the attacker hit a .000000001% chance of being unlucky - or it's because the item was deleted.
'Rolling for deletion' after an item is sent is a poor strategy - it doesn't work when we want to send the item regurally.
A Deletion Algorithm That May Work
- The server will 'roll for deletion' every hour, and the odds of deleting an item are... we'll say 5%.
We can use the Binomial Probability Formula, again, to calculate how likely we are to delete the item after so many hours. It's 1 minus the probability of the deletion not occurring, which is .95num_hours
- 51% chance of deletion after 14 hours.
- 71% chance of deletion after 24 hours
- 91% chance of deletion after 48 hours
If we use a rough yardstick of 'Two Days' for the attacker's timeframe (with deletion rolls once an hour) to yield a 50% confidence level, the equation becomes .50 = q^48 or a 1.4% chance of deletion.
But What About Uncertainty!
If you're following along closely, you may have realized a flaw with the notion of "1.4% chance of deletion every hour." While it's true that after 2 days the probability an item is deleted is 50%, an attacker will be able to know if it has been deleted or not!
This is because the attacker is sending tons of queries, and we already determined that trying to keep the attacker in the dark about whether an item is 'in the bucket' requires such a low probability of sending the item that it's infeasible. So the attacker will know whether or not the item is in the bucket, and there's a 50% chance (that the attacker cannot influence) of it being deleted after two days.
This not ideal. But it seems to the best tradeoff we can make. The attacker will know whether or not the evidence has been erased, but can do nothing to encourage it to be erased. They merely must wait it out.
But what About Disk Space?
So far what we've determined is:
- A deletion algorithm that is based on how often the server sends the item won't work.
- A deletion algorithm that is based on time seems like it will work...
But we haven't determined how much disk will be used by this algorithm. To calculate this number, we must look at the broader CT and CT Gossip ecosystem.
We store two types of data STHs, and [SCTs+Cert Chains]. These are stored by both a Web Browser and Web Server. STHs and SCTs are multiplied by the number of trusted logs in the ecosystem, which we'll place at '20'. We'll make the following size assumptions:
- The size of a SCT is ~120 bytes.
- The size of a STH is ~250 bytes.
- A certificate chain is 5KB.
- But a disk sector is 4KB, so everything is 4KB, except for the chain which is 8KB. (Note that this is 'naive storage'. It doesn't include any associated counters or metadata which would increase size, nor does it include more efficient storage mechanisms which would decrease size.)
A server's SCT Store will be limited by the number of certificates issued for the domains it is authoritative for multiplied by the number of logs it trusts. Let's be conservative and say 10,000 certs. ((10000 SCTs * 4 Kb * 20 logs) + (10000 Cert Chains * 8kb)) / 1024 Kb/Mb = 860MB. That's a high number but it's not impossible for a server.
A server's STH store could in theory store every active STH out there. We limit Gossip to STHs in the past week, and STHs are issued on average once an hour. This would be (20 logs * 7 days * 24 hours * 4 Kb) / 1024 Kb/Mb = 13.1MB and that's quite reasonable.
On the client side, a client's STH store would be the same: 13.1MB.
Its SCT store is another story though. First, there is no time limit for how long I may store a SCT. Secondly, I store SCTs (and cert chains) for all sites I visit. Let's say the user has visited 10000 sites, each of which have 3 different certificates with 10 SCTs each. That's ((10000 Sites * 3 Cert Chains * 8 Kb) + (10000 Sites * 3 Certificates * 10 SCTs * 4 Kb)) / 1024 Kb/Mb) / 1024 Mb/Gb = 1.4 GB. On a client, that's clearly an unacceptable amount of data.
Deleting Data From the Client
So what we want to solve is the disk-space-on-the-client problem. If we can solve that we may have a workable solution. A client whose SCT Store is filling up can do one, or more, of the following (plus other proposals I haven't enumerated):
- Delete data that's already been sent
- Delete new, incoming data (freeze the state)
- Delete the oldest data
- Delete data randomly
I argue a mix of the the first and last is the best. Let's rule out the middle two right away. These are purely deterministic behavior. If I want to 'hide' a piece of evidence, I could either send it, then fill up the cache to flush it, or flood the cache to fill it up and prevent it being added.
On its face, deleting data at random seems like a surefire recipe for failure - an attacker performs an attack (which places the evidence item in the bucket), then floods the bucket with new items. Once the bucket if full, the probability of the the evidence item being deleted rises with each new item placed in. (With a 30,0000 item cache, the odds of evicting a particular item is 50% after 51,000 queries - 30,000 queries to fill it and 21,000 to have a 50% chance of flushing it.) These numbers are far short of 768-million query figure we wish to protect ourselves against.
Deleting data that's already been sent is a good optimization, but does not solve the problem - if an attacker is flooding a cache, all of the data will be unsent.
We seem to be sunk. In fact - we were unable to come to a generic fix for this attack. The best we can do it make a few recommendations that make the attack slightly more difficult to carry out.
- Aggressively attempt Inclusion Proof Resolution for SCTs in the cache. If the SCT is resolved, discard the SCT and save the STH. If this particular SCT is not resolved, but others are, save this SCT. If all SCT resolution fails, take no special action.
- Prioritize deleting SCTs that have already been sent to the server. If a SCT has been sent to the server, it means it has been sent over a connection that excludes that SCT. If it was a legit SCT, all is well (it's been reported). If it was a malicious SCT - either it's been reported to the legitimate server (and ideally will be identified) or it's been reported to an illegitimate server necessitating a second, illegitimate SCT we have in our cache.
- In the future, it may be possible for servers to supply SCTs with Inclusion Proofs to recent STHs; this would allow clients to discard data more aggressively.
The final recommendation is therefore:
- Servers and Clients will each store valid STHs without bound. The size needed for this is a factor of the number of logs and validity window (which is one week). The final size is manageable, under 20MB with naive storage.
- Servers will store SCTs and Certificate Chains without bound. The size needed for this is a factor of the number of certificates issued for domains the server is authoritative for, and the number of logs. The final size is manageable for most servers (under 1GB with naive storage) and can be reduced by whitelisting certain certificates/SCTs to discard.
- Clients will store SCTs and Certificate Chains in a fixed-size cache of their choosing, employ strategies to make flushing attacks more difficult, but ultimately remain vulnerable to a persistent flushing attack.