Universal Email Encryption Specification
27 Nov 2013 12:00:23 EST

Last May, I was in Hong Kong for OpenITP's Circumvention Tech Summit - and I ended up taking an afternoon walk with none other than Daniel Kahn Gillmor. Over a 7 hour and I-have-no-idea-how-many-kilometer walk, we talked about a ton of things, until I eventually asked him "Why don't you think we have email encryption?"

We talked about a lot of the hard problems of email encryption - problems that are difficult to solve while still being intutive and easy for non-technical users, and not disrupting their preferred workflow. Things like Key Discovery and Key Auththenticity, webmail, and moving keys between devices.

We were kind of beating around the bush, and finally I just said what we were both thinking: "Maybe providers should manage keys for most people." He agreed that this seemed like the best way to get wide adoption. (Remember, this was all pre-Snowden.)

We chatted some more, about key discovery in DNS (which would later be amended to HTTP), about encouraging email providers to do encryption on behalf of users, and more importantly (to us, as well as you I'm sure) - allowing users to manage their own keys and upload the public component.

What we saw was ubiquitous email encryption, where every email sent between major participating providers is encrypted. And in a large percentage of cases, it's encrypted provider-to-provider. But in a small percentage of cases - it's encrypted provider-to-end or end-to-end. We feel that if email encryption really was ubiquitous, the clients we have now (Enigmail, S/MIME in Outlook and so on) would be developed and promoted to first class citizens, and things like browser extension-based email encryption would be fleshed out considerably. So while the early adopters (the people who use email encryption today) would use the painful tools we have now - there'd be a huge investment in tool development, and .01% of users would grow to .1%, and then 1%, and maybe to 10%.

We laid out as many pieces as we could, specifying a key discovery mechanism that doesn't leak metadata, signaling information to learn if a provider (and user) has opted in to email encryption while blocking downgrade attacks, a report-only mode to prevent a flag day, a minimal level of key authenticity that can optionally be increased on a provider basis, failure reporting, enrollment steps for the many different ways email accounts are created on the web, and some suggestions for encrypted message format.

And then a few months later, a man named Edward Snowden came into our lives.

The PRISM revelations reveal widespread complicity on behalf of centralized email providers, but more importantly they reveal a broad overreach of the government, and the disturbing trend towards rule by secret law.

We still like this protocol, even post-Snowden. An encrypted email solution that requires end-to-end encryption, with no provision for an email provider to read the mail, is unlikely to be deployed by large corporations that have regulatory monitoring and reporting requirements - industries like large Law Firms, Financials, and Healthcare - plus all business that have to support E-Discovery and Data Loss Prevention. You may not like those things (and you may be morally opposed to them), but they are what companies require or have to live with. Those organizations could try to meet some of these requirements under an end-to-end encrypted e-mail scheme (for example, by operating key escrow services), but having direct cleartext access to their users' mail is technically much simpler. By making these use cases a standard, and making the feature as visible to mail users as https is to people who browse the web, we hoped to get large companies on board and have them share the initial development and deployment cost. We aimed for ubiquitous email encryption - business-to-business, between work email accounts and personal accounts. Yet another fragmented internet, where only a few of our contacts supported encryption, was no more interesting than the status quo.

But although we like it, the current situation in the US and the requirements placed upon (and cooperation of) large companies like Google and Verizon means that granting the provider a centralized place of trust in email encryption is a non-starter. And as a complicating factor, the thing the government has been most interested in has been metadata - the very thing that is afforded the least protection under the law and simultaneously the most difficult to protect in a point-to-point protocol. There are efforts to fight this technically (like Tor), but we feel the legal atmosphere must change as well as the technical infrastructure.

We're posting our specification and supporting documents online for people to refer to, in the public domain. It's over on github.. Email encryption is hard, and when you start thinking about all the corner cases (like automatically putting mailing list footers into a signed message) - it gets harder. We're hedging our bets. We hope that the legal atmosphere changes. Barring that, we hope this document and its appendixes help other people look at the problem and make progress where we got stalled.

Oh yea - what's it called? Well, when we walked around in Hong Kong, we were calling it "STEED-Like", after Werner Koch's STEED Proposal, which we drew inspiration from. When we realized how much we deviated from it, we dubbed it UEE for Universal Email Encryption - with the intention of finding a better name when we released it. But that day never came. So until we have a legal environment where this might make sense... pronounce is like wheeeeeeeee, but with a you in the front. YOU-EEEEEEEEEEEEEEEEEEEEEEEE!

The biggest argument we've seen against this proposal is that StartTLS (TLS-encrypted SMTP links between providers) gets you almost the same thing, for most users, with way less work. We love StartTLS and want to see it working way better than it does now. But we think that just getting to widespread email encryption (even if some or even most keys are provider-managed) would spur the development and smoothing out of client-based encryption, which would in turn let more people opt in to managing their own keys, getting true end-to-end security not possible with StartTLS.

Evernote, and Privacy Preserving Applications
14 Nov 2013 23:09:23 EST

I'd like to take a moment to talk about privacy preserving by default.

I don't intend for this to be a rant about current commercial decisions - instead I'd like it to be praise of what I think (and hope) is great design, and use it to try and set an example that other people can follow. I was talking with a friend recently, and he talked about how, ultimately, most people want personalization, they want ease of use, they want features from services and they're willing to give up their privacy in order to get those features. I don't disagree with either of his points - I agree with them entirely. But I challenge the assumption that getting those features, getting the ease of use requires giving up their data to a third party. And I instead pose the question: "If you can get the exact same feature, and it was provided in a privacy preserving way by say computing it locally on your phone - as opposed to bulk shipping your data out to a third party and having them analyze it on their servers. If you can get that exact same feature, I think everyone would prefer to get it in the privacy preserving way. So why not do that?"

Let's talk about a heart attack I had recently.

I use Evernote. Flame me, whatever, I'm not using it for work or for sensitive things, I'm using it for gift ideas I see in stores and simple things. If there was something I could run myself that had a shiny mobile app and a web UI, I'd use that but there isn't so let's move on. I took a photo, while I was at THREADS today, and when I went to add it in Evernote I got this screen:

Note, I recreated this with a quick shot of my laptop

How, in the hell, did it know I was at THREADS.

Was it doing some sort of geolocation combined with local events? I was so disturbed by this I searched for it: evernote smart title1. This lead me to a blog post announcing the feature.

Now, when you create a new note and save it without giving the note a title, the app will assign a contextual title using calendar events, your location, note contents, and other information.

This is a great example of a totally legitimate, useful feature that most people (including myself) would like. Without it, I'm going to have to type what is likely to be a redundant title (as I'm only putting a few words to remind myself what I took a photo of), or have the title remain 'Untitled'. But as someone somewhat concerned about my privacy, it also filled me with dread. I knew there was two ways this was likely to be implemented. One would be to read my location and calendar locally, and generate a title. The other would be to bulk-ship my data up to their servers, analyze, and send back a pregenerated title. Let's see which they do.

I don't really intend for this to be an Android App Reversing Walkthrough, but I do want to cover what I had to go through to figure this out, because it's really not that hard and I think the community should be doing more of this to answer questions like "Hey, how the heck does [Flavor of the Week 'Secure' Message App Work]?" So I'm going to skip the 'easy' parts, and dig into the more difficult reversing. I'll point to Intrepidus Group and your search engine for getting you past the part where you pull the APK off the device, and run it through dex2jar. At this point, we've got a pile of decompiled java files. Let's dig in.

$ grep -R title *

This yields 810 results. Way too broad, let's try another tactic.

$ grep -R "Picture from" *

This yields no results. This made me a little nervous, because if the title was generated locally, I'd expect that string fragment to be found somewhere. New tactic. This data came from my calendar, so let's look at calendar API calls. Searching for a few API calls, I found a folder called 'smart/noteworthy'. The feature is called 'smart titles' so this may be it. But before I spend a ton of time reading the code, I can do more 'quick, dirty, and coarse' approaches that may get me nothing, or may get me a jackpot.

In fact, I realized I was omitting a key debugging tool: running the application while tailing logs with adb logcat.

I/EN      (26873): [NewNoteFragment] - canAttachFile()::mimeType=image/*
I/EN      (26873): [NewNoteFragment] - canAttachFile()result=true
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=null
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::5
D/EN      (26873): [a] - starting events query+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I/EN      (26873): [a] - Attachment()::uri=content://media/external/images/media/1764 mMimeType=image/jpeg type=0::mTitle=null
I/EN      (26873): [a] - findType()::type=4 mimeType=image/jpeg
I/EN      (26873): [a] - isSupportedUri()true
I/EN      (26873): [a] - Attachment()::mType=4mTitle=IMG_20131114_184957 mMetainfo=1 Mb mMimeType=image/jpeg
I/EN      (26873): [a] - Attachment()::mTitle=IMG_20131114_184957 mMetainfo=1 Mb
D/EN      (26873): [a] - events=1
D/EN      (26873): [ab] - isMultishotCameraAvailable: platform support = true Native library support = true
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=null
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::6
D/EN      (26873): [NewNoteFragment] - getAddress-running
I/ActivityManager( 1681): Displayed com.evernote/.note.composer.NewNoteAloneActivity: +373ms
D/EN      (26873): [ab] - isMultishotCameraAvailable: platform support = true Native library support = true
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=Picture from THREADS
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=Picture from THREADS
I/EN      (26873): [NewNoteFragment] - mHandler()::handleMessage()::7
D/EN      (26873): [a] - generateAutoTitle()::title=Picture from THREADS @ Town, State
D/EN      (26873): [NewNoteFragment] - showHelpDialog()

Now that is what I'm looking for. I locate that method, and it has code fragments like:

localObject2 = paramContext.getString(2131165339);

If you're a little familiar with Android, you probably realize this is something like context.getString(R.string.YOUR_STRING); but now it's been turned into a constant. Let's trace it down.

$ grep -R 2131165339 *
jad-ed/com/evernote/android/multishotcamera/R$string.java:  public static int untitled_note = 2131165339;
jad-ed/com/evernote/note/composer/a.java:              localObject2 = paramContext.getString(2131165339);
jad-ed/com/evernote/note/composer/p.java:      paramString = paramContext.getString(2131165339);
jad-ed/com/evernote/provider/a.java:            str2 = this.b.getString(2131165339);
jad-ed/com/evernote/ui/NewNoteFragment.java:        str1 = this.bl.getString(2131165339);
jad-ed/com/evernote/ui/NewNoteFragment.java:      str = b(2131165339);
jad-ed/com/evernote/ui/QuickSaveFragment.java:        this.bm = b(2131165339);

$ grep -R untitled_note *
Binary file com.evernote-1.apk matches
jad-ed/com/evernote/android/multishotcamera/R$string.java:  public static int untitled_note = 2131165339;
Binary file unzipped/classes.dex matches
Binary file unzipped/resources.arsc matches

Frankly, I'm still not sure why this was tracked down to the exact human-readable resource string - but generally speaking, our goal in Reverse Engineering is to stay as broad as we can until we have to go deep. I traced down more of these constants, inlined them, and followed the trail.

//These statements were not in this order, just placing them together for brevity
localObject2 = paramContext.getString("auto_title_from_meeting_at_location", new Object[] { localObject2, str1, str2 });
localObject2 = paramContext.getString("untitled_note");
localObject2 = paramContext.getString("auto_title_from_meeting", new Object[] { localObject2, str1 });
localObject2 = paramContext.getString("auto_title_at_location", new Object[] { localObject2, str2 });

//Clearly str1 refers to the meeting name, and str2 the location
//Where do they come from?

str1 = b();
str2 = c();

private String b() //get meeting
    if ((this.m != null) && (this.m.length > 0))
        return this.m[0];
    return null;
private String c() //get location
    StringBuilder localStringBuilder = new StringBuilder("");
    if (this.a != null) //this.a is "public Address a;"
        String str1 = this.a.getLocality();
        boolean bool = TextUtils.isEmpty(str1);
        int i1 = 0;
        if (!bool)
            i1 = 1;
        String str2 = this.a.getAdminArea();
        if ((!TextUtils.isEmpty(str2)) && (!str2.equalsIgnoreCase(str1)))
            if (i1 != 0)
                localStringBuilder.append(", ");
    return localStringBuilder.toString().trim();

//Let's trace down this.m
public final void b(Bundle paramBundle)
    this.a = ((Address)paramBundle.getParcelable(q));
    this.m = paramBundle.getStringArray(p);

So we've figured out where the location-based part comes from. It's using a Geolocation API to grab that. Hunting down where the Meeting name came from is going to be much more difficult.

In fact, this is where I spent the bulk of my time. I did greps like grep -R ".b(" ../../, and when that was too coarse, grep -R "b(" ../../../../ | grep ";" | grep -v "," but I wasn't finding much. I decided to import it into Eclipse. Now clearly, this wasn't going to help me build it.

But I was hoping Eclipse would be able to build enough of it, and provide enough code navigation features to get a couple of hints out of it. And indeed, when I references all public calls of "b(Bundle paramBundle)", I did wind up with one:

Now at this point, I wasn't really getting much out of it. I read a lot of this code, and tried to figure out where things were going. I deciphered a lot more of the surrounding code, running down getString() calls and such. Like I said before - we stay broad until we have to go deep. I alternated between going deep, trying to outline what individual functions did while periodically stepping back and skimming the 2-3 surrounding classes.

Eventually, I was confused enough to take a step back. You see, I'm working with decompiled Java code - this is not what the original developers wrote. It's what a tool has translated from bytecode back into Java. It's a bit spaghetti-like, it's a bit wrong. In fact, one function was actually marked as it could not be decompiled:

// ERROR //
  private static String[] b(Context paramContext, String paramString)
    // Byte code:
    //   0: aconst_null
    //   1: astore_2
    //   2: aload_0

So with the experience that only comes from having done this before and understanding the limitations of one's tools, I stepped back even further. I needed to redo this decompilation. Fortunately, there are other Java decompilers out there. And using a second, I was able to get a successful decompilation of this previously undecompilable-b() function.

    private static String[] b(Context context, String s1)
        Cursor cursor;
        ContentResolver contentresolver;
        cursor = null;
        contentresolver = context.getContentResolver();
        Cursor cursor2 = contentresolver.query(Uri.parse("content://com.android.calendar/calendars"), new String[] {
        }, s1, null, null);
        Cursor cursor1 = cursor2;
        if(cursor1 == null) goto _L2; else goto _L1
        String as[] = new String[cursor1.getCount()];
        int i1 = 0;
        if(!cursor1.moveToNext()) goto _L4; else goto _L3
        as[i1] = cursor1.getString(0);
          goto _L5
        if(cursor1 != null)
        as = null;
        return as;

This seems obvious in retrospect, but it's only by comparing the decompilations in detail that I saw just how wrong the first one was. Seemingly useless and unreachable code suddenly transformed into meaningful control flow statements. (Protip: compilers almost never emit unreachable code.) The calendar event clearly comes from the calendar, locally, on note creation.

Okay, let's step back again. I suspected Evernote might be doing something really cruddy like sending all my calendar events to their server so they can server these note titles. I'm fairly certain that is not the case. I have not reversed Evernote in its entirety and I am not saying they are not doing something very shady. They may well be. But for this single feature I looked at, I don't think they are.2.

But ultimately, I come back to the question I posed in the beginning: "If you can provide an awesome feature, and do it in a privacy preserving way, as opposed to a 'do the computation on our servers' approach, why not do that?" and I'll add, why not advertise that. In the age of legal liability for privacy violations and consumer interest in privacy, which is now even more compounded from Snowden - why not differentiate and advertise on technical constraints for privacy, in addition to making a sleek and awesome app and service? Tell people "Hey, we don't just take your privacy 'seriously' like everyone else, we provide our features on your phone so we never see the data."

Another great example of a complete start-up idea: geolocation based notes. I would love, and pay, for an App that let me put down groceries on a shopping list, and it'd remind me when I go in the grocery store. Let me put a marker "When I drive by this point in the road, at this time of day, remind me to pull over and put the clothes I've been trying to donate for a month in the donation bin."3 But all the apps I'm aware of that do this either don't work well, or send all your data (including location) to the server. This could run on the phone, there's no reason why it couldn't. I'll do your monetization strategy one better - sell me little bluetooth or NFC thingies I put at my front door, car, wallet, whatever. Let me make a note like "If I leave the house, remind me to grab the bills I need to mail" or "If I get in the car, and I don't have my wallet, freak out." Or go turn Paul Wouters's privacy preserving Google Latitude-like location-sharing into an app. There's a lot of ideas here.

I'll talk about one more example. RedPhone is an app that lets you make encrypted phone calls to people who also have RedPhone. But because it's annoying to have to manually choose to use RedPhone, plus the problem of knowing which of contacts have RedPhone to begin with - RedPhone will prompt you to upgrade your call to an encrypted call if the person you're calling also uses RedPhone. How does it know the other person uses RedPhone? Well, it could A) send all your contacts to the server and tell you which people have RedPhone4 B) send all the people who have RedPhone to you, or C) do something way sexier. What it does is send down a bloom filter that allows you to ask if any individual number has RedPhone, but doesn't give you the entire list of RedPhone users, nor send your contacts to the server.5

That's what I'd like to see more of. I'd like to see novel apps selling an innovative product that people want, not necessarily selling privacy - but still developed in a privacy-preserving way. I believe it can be done - call me naive but I believe you can build an awesome, innovative app that fills a niche and is privacy preserving - not privacy preserving as it's selling point with half-baked features added elsewhere6.

And also, to close up with Evernote, I think it'd be awesome if Evernote came out and confirmed that their Smart Title Feature, and their app in general, does not send all your contacts, calendar events, or anything up to their servers except the notes you create.

I recognize there are constraints to doing what I describe: battery life, computation speed, backgrounding, etc etc. I view these in the same vein as other engineering problems - they can be overcome with ingenuity, challenging assumptions, testing, and hard work.

1 Flame me again for using google, but DuckDuckgo's search results just aren't as good on this query.
2 I can't stress this enough. I don't know if Evernote is doing something scummy that I didn't uncover in the literally-one-hour I spent on this. Their privacy policy says things like "we automatically gather non-personally identifiable information that indicates your association with one of our business partners", "Any Content you add to your account", and "The geographic area where you use your computer and mobile devices (as indicated by an IP address or similar identifier)". However, it doesn't say "We take all your data."
3 I've had a bag in my car for over a month. Sell me this app, please.
4 I think, but am not sure, this is what SnapChat does.
5 I'm aware this design isn't perfect, but it's pretty good given the objectives and constraints.
6 While apps like Silent Circle and Wickr are coming close, I think apps should remember to build an awesome useable product first, and make the privacy preserving part supplemental as opposed to the primary selling point.

Update: I got a response back from Evernote!

Shortly after posting this article, I got an email from a nice guy named Grant at Evernote, who gave me permission to post:

Hi Tom,

I read your blog post about privacy preserving applications and Evernote. I can confirm that the Smart Title feature, and the app in general, does not send all your contacts, calendar events, or anything else to Evernote servers except for the content of the notes you create.

You might find Evernote’s Three Laws of Data Protection interesting–specifically the second law: http://blog.evernote.com/blog/2011/03/24/evernotes-three-laws-of-data-protection/

Having an employee (Grant works as an Engineer, not in 'Public Relations') reach out to a random blog author is, in my opinion, a good sign of a straightforward and honest company. And I quite like the second law.

Everything you put into Evernote is private by default. We never look at it, analyze it, share it, use it to target ads, data mine it, etc.–unless you specifically ask us to do one of these things. Our business model does not depend on "monetizing" your data in any way. Rather, it depends on building trust and providing a great service that more and more people choose to pay for.

So props to Evernote. :)

Funniest Exchange Ever on TLs Mailing List
06 Nov 2013 16:33:23 EST

Background: there's this huge problem where TLS ClientHellos that exceed 255 bytes result in hangs for certain hardware (like some F5 hardware). Hangs are horrible because the only thing you can do is have a timeout and reconnect – super slow. So we're trying to add extensions (like ALPN for SPDY) and new ciphersuites, all while keeping the size under 255 bytes. Someone asks "Hey how come this happens at all." Someone from F5 responds...


Xiaoyong Wu X.Wu@f5.com via ietf.org 
It is a little bit more calculation than that and related to some historic reasons, aka SSLv2.

For SSL records, the SSLv3 and TLS ClientHello headers are as follows:

| 22 | version major | version minor | length high bits | length low bits |

If this is interpreted as an SSLv2 header, it will be considered as a 3 byte header:
| v2 header b0 | v2 header b1 | v2 header b2 | message type |

The value for Client Hello message type is SSLV2_MT_CLIENTHELLO which is 1.
When there is an SSLv3/TLS client-hello of length 256 - 511 bytes, this is ambiguous as "message 
type" is 1 or it is the "length high bits" to be 1.

Our implementation before the patch was to prefer SSLv2 and thus the issue.

As I am explaining this in detail, I would say that another work around on this would be making a 
client hello that exceeds 512 in length.
Adam Langley via ietf.org 
On Wed, Nov 6, 2013 at 1:00 PM, Xiaoyong Wu  wrote:
> As I am explaining this in detail, I would say that another work around on this would be making a 
> client hello that exceeds 512 in length.

^^^ Holy crap. I wish I had known that sooner. That might solve the issue.


Yoav Nir via ietf.org 
On Nov 6, 2013, at 10:03 AM, Adam Langley  wrote:
> On Wed, Nov 6, 2013 at 1:00 PM, Xiaoyong Wu  wrote:
>> As I am explaining this in detail, I would say that another work around on this would be making a 
>> client hello that exceeds 512 in length.
> ^^^ Holy crap. I wish I had known that sooner. That might solve the issue.

Time to standardize the "jpeg-of-cat" extension for TLS.
Dr Stephen Henson lists@drh-consultancy.co.uk via ietf.org 

On 06/11/2013 18:03, Adam Langley wrote:
> On Wed, Nov 6, 2013 at 1:00 PM, Xiaoyong Wu  wrote:
>> As I am explaining this in detail, I would say that another work around on this would be making a 
>> client hello that exceeds 512 in length.
> ^^^ Holy crap. I wish I had known that sooner. That might solve the issue.

Just did a quick test with OpenSSL on a couple of known "hang" machines. Seems
to work.


The thread is here. Obviously it'll take a lot of testing to figure out if this works reliably, but I think a lot of people are cautiously excited.

Open Technology Fund Audit Report
21 Oct 2013 12:37:23 EST

Over the past year, iSEC Partners has worked with the Open Technology Fund on several of their supported projects, and I've been extremely fortunate to have a finger, arm, or whole body in each of the audits. Most of them were as an Account Manager (just helping arrange the audit between the project and some of our extremely talented consultants) but I also got to roll up my sleeves and pick on a couple myself.

If you haven't heard of OTF, they fund projects that develop open and accessible technologies promoting human rights and open societies. Some of the projects they support that we've been able to work on are Open Whisper Systems' RedPhone and TextSecure, Commotion, and GlobaLeaks, among others.

I also got to work on a followup of the Liberation Technology Auditing Guidelines I authored in the beginning of the year. In conjunction with the audits iSEC performed, I also helped OTF perform a review of their audit process. The goal of this review was to take a look at the breadth, scope, and coverage of security audits performed on OTF funded applications to date. I aimed to identify the strengths and shortcomings in OTF's current process and provide recommendations to improve the breadth of coverage and to derive greater value in the future. The report is (hopefully) applicable to both OTF and other funding agencies in the Liberation Technology and Civil Society communities, and I and iSEC hopes this work inspires more development and more integration between security professionals and project teams. OTF has published this review over on their website where you can take a look.

About the Tor/NSA Slide Decks
7 Oct 2013 09:02:23 EST

Unless you've been living under a rock for the past weekend, you heard about several documents publish by The Guardian and The Washington Post that are (likely) from the NSA explaining how they deal with Tor. I wanted to take a look and analyze them from a technical standpoint.

The good news from these documents is that Tor is doing pretty good. There's a host of quotes that confirm what we hoped: that while there are a lot of techniques the NSA uses to attack Tor, they don't have a complete break. Quotes like "We will never be able to de-anonymize all Tor users all the time" and that they have had "no success de-anonymizing a user in response to a request/on-demand". [0] We shouldn't take these as gospel, but they're a good indicator.

Now something else to underscore in the documents that were released, and in the DNI statement, is that bad people use Tor too. Tor is a technology, no different from Google Earth or guns - it can be used for good or bad. It's not surprising or disappointing to me that the NSA and GCHQ are analyzing Tor.

But from a threat modeling perspective - there's no difference between the NSA/GHCQ, and China or Iran. They're both well-funded adversaries who can operate over the entire Internet and have co-opted national ISPs and inter-network links to monitor traffic. But from my perspective, the NSA is pretty smart, and they have resources unmatched. If they want to target someone, they're going to be able to do so, it's only a matter of putting effort into it. It's impossible to completely secure an entire ecosystem against them. But you can harden it. The documents we've seen say that they have operating concerns[1], and Schneier says the following:

The most valuable exploits are saved for the most important targets. Low-value exploits are run against technically sophisticated targets where the chance of detection is high. TAO maintains a library of exploits, each based on a different vulnerability in a system. Different exploits are authorized against different targets, depending on the value of the target, the target's technical sophistication, the value of the exploit, and other considerations.[4]

What this means to me is that by hardening Tor, we're ensuring that the attacks the NSA does run against it (which no one would be able to stop completely) will only be run against the highest value targets - real bad guys. The more difficult it is to attack, the higher the value of a successful exploit the NSA develops - that means the exploit is conserved until there is a worthwhile enough target. The NSA still goes after bad guys, while all state-funded intelligence agencies have a significantly harder time de-anonymizing or mass-attacking Tor users. That's speculation of course, but I think it makes sense.

That all said - let's talk about some technical details.


The NSA says they're interested in fingerprinting Tor users from non-Tor users[1]. They describe techniques that fingerprint the Exit Node -> Webserver connection, and techniques on the User -> Entry Node connection.

They mention that Tor Browser Bundle's buildID is 0, which does not match Firefox's. The buildID is a javascript property - to fingerprint with it, you need to send javascript to the client from them to execute. (TBB's behavior was recently changed, for those trying it at home.) But the NSA sitting on a website wondering if a visitor is from Tor doesn't make any sense. All Tor exit nodes are public. In most cases (unless you chain Tor to a VPN) - the exit IP will be an exit node, and you can fingerprint on that. Changing TBB's buildID to match Firefox may eliminate that one, single fingerprint - but in the majority of cases you don't need that.

What about the other direction - watching a user and seeing if they're connecting to Tor. Well, most of the time, users connect to publicly known Tor nodes. A few percentage of the time, they'll connect to unknown Tor bridges. Those bridges will then connect to known Tor nodes, and the bridges are distinguishable from a user/Tor client because they accept connections. So while a Globally Passive Adversary could enumerate all bridges, the NSA is mostly-but-not-entirely-global. They can enumerate all bridges within their sphere of monitoring, but if they're monitoring a single target outside their sphere, that target may connect to Tor without them being sure it's Tor.

Well, without being sure it's Tor based solely on the source and destination IPs. There are several tricks they note in [2] that let them distinguish the traffic using Deep Packet Inspection. Those include a fixed TLS certificate lifetime, the Issuer and Subject names in the TLS certificate, and the DH modulus. I believe, but am not sure, that some of these have been changed recently in Tor, or are in the process of being redesigned - I need to follow up on that.

Another thing the documents go into length about is "staining" a target so they are distinguishable from all other individuals[3]. This "involves writing a unique marker (or stain) onto a target machine". The most obvious technique would be putting a unique string in the target's browser's User Agent. The string would be visible in HTTP traffic - which matches the description that the stain is visible in "passive capture logs".

However, the User agent has a somewhat high risk of detection. While it's not that common for someone to look at their own User Agent using any of the many free tools out there - I wouldn't consider unusual. Especially if you were concerned with how trackable you were. Also, not to read too closely into a single sentence, but the documents do say that the "stain is visible in passively collected SIGINT and is stamped into every packet". "Every packet" - not just HTTP traffic.

If you wanted to be especially tricky, you could put a marker into something much more subtle - like TCP sequence numbers, IP flags, or IP identification fields. Someone had proffered the idea that a particularly subtle backdoor would be replacing the system-wide Random Number Generator for Windows to DUAL_EC_DRBG using a registry hack.

Something else to note is that the NSA is aware of Obfs Proxy, Tor's obfuscated protocol to avoid nation-state DPI blocking; and also the tool Psiphon. Tor and Psiphon both try to hide as other protocols (SSL and SSH respectively.) According to the leaked documents, they use a seed and verifier protocol that the NSA has analyzed[2]. I'm not terribly familiar with the technical details there, so the notes in the documents may make more sense after I've looked at those implementations.

Agencies Run Nodes

Yup, Intelligence Agencies do run nodes. It's been long suspected, and Tor is explicitly architected to defend against malicious nodes - so this isn't a doomsday breakthrough. Furthermore, the documents even state that they didn't make many operational gains by running them. According to Runa, the NSA never ran exit nodes.

What I said in my @_defcon_ talk is still true: the NSA never ran Tor relays from their own networks, they used Amazon Web Services instead.

— Runa A. Sandvik (@runasand) October 4, 2013

The Tor relays that the NSA ran between 2007 and 2013 were NEVER exit relays (flags given to these relays were fast, running, and valid).

— Runa A. Sandvik (@runasand) October 4, 2013

Correction of https://t.co/U5v7krwZH0: the NSA #Tor relays were only running between 2012-02-22 and 2012-02-28.

— Runa A. Sandvik (@runasand) October 4, 2013

Something I'm going to hit upon in the conclusion is how we shouldn't assume that these documents represent everything the NSA is doing. As pointed out by my partner in a conversation on this - it would be entirely possible for them to slowly run more and more nodes until they were running a sizable percentage of the Tor network. Even though I'm a strident supporter of anonymous and pseudonymous contributions, it's still a worthwhile exercise to measure what percentage of exit, guard, and path probabilities can be attributed to node operators who are known to the community. Nodes like NoiseTor's, or torservers.net are easy, but also nodes whose operators have a public name tied into the web of trust. If we assume the NSA would want to stay as anonymous and deniable as possible in an endeavor to become more than a negligible percentage of the network - tracking those percentages could at least alert us to a shrinking percentage of nodes being run by people 'unlikely-to-be-intelligence-agencies'.

This is especially true because in [0 slide 21] they put forward several experiments they're interested in performing on the nodes that they do run:

  1. Deploying code to aid with circuit reconstruction
  2. Packet Timing attacks
  3. Shaping traffic flows
  4. deny-degrade/disrupt comms to certain sites

Operational Security

Something the documents hit upon is what they term EPICFAIL - mistakes made by users that 'de-anonymize' them. It's certainly the case that some of the things they mention lead to actionable de-anonymization. Specifically, they mention some cookies persisting between Tor and non-Tor sessions (like Doubleclick, the ubiquitous ad cookie) and using unique identifiers, such as email and web forum names.

If your goal is to use Tor anonymously, those practices are quite poor. But something they probably need to be reminded of is that not everyone uses Tor to be anonymous. Lots of people log into their email accounts and Facebook over Tor - they're not trying to be anonymous. They're trying to prevent snooping and secure their web browsing from corporate proxies, their ISP, national monitoring systems, bypass censorship, or disguise their point of origin.

So - poor OpSec leads to a loss of anonymity. But if you laughed at me because you saw me log into my email account over Tor - you missed the point.

Hidden Services

According to the documents, the NSA had made no significant effort to attack Hidden Services. However, their goals were to distinguish Hidden Services from normal Tor clients and harvest .onion addresses. I have a feeling the latter is going to be considerably easier when you can grep every single packet capture looking for .onion's.

But even though the NSA hadn't focused much on Hidden Services by the time the slides had been made doesn't mean others haven't been. Weinmann, et al. authored an explosive paper this year on Hidden Services, where they are able to enumerate all Hidden Service addresses, measure the popularity of a Hidden Service, and in some cases, de-anonymize the a HS. There isn't a much bigger break against HS than these results - if the NSA hadn't thought of this before Ralf, I bet they kicked themselves when the paper came out.

And what's more - the FBI had two high profile takedowns of Hidden Services - Freedom Hosting and Silk Road. While Silk Road appears to be a result of detective work finding the operator, and then following him to the server, I've not seen an explanation for how the FBI located or exploited Freedom Hosting.

Overall - Hidden Services need a lot of love. They need redesigning, reimplementing, and redeploying. If you're relying on Hidden Services for strong anonymity, that's not the best choice. But whether you are or not - if you're doing something illegal and high-profile enough, you can expect law enforcement to be following up sooner or later.

Timing De-Anonymization

This is another truly clever attack. The technique relies on "[sending] packets back to the client that are detectable by passive accesses to find client IPs for Tor users" using a "Timing Pattern" [0 slide 13]. This doesn't seem like that difficult of an attack - the only wrinkle is that Tor splits and chunks packets into 512-byte cells on the wire.

If you're in control of the webserver the user s contacting (or between the webserver and the exit node) - the way that I'd implement this is by changing the IP packet timing to be extremely uncommon. Imagine sending one 400-byte packet, waiting 5 seconds, sending two 400-byte packets, waiting 5 seconds, sending three 400-byte packets, and so on. What this will look like for the user is receiving one 512-byte Tor cell, then a few seconds later, two 512 byte Tor cell, and so on. While the website load may seem slow - it'd be almost impossible to see this attack in action unless you were performing packet captures and looking for irregular timing. (Another technique might ignore/override the TCP Congestion Window, or something else - there's several ways you could implement this.)

Two other things worth noting about this is that the slides say "GCHQ has research paper and demonstrated capability in the lab". It's possible this attack has graduated out of the lab and is now being run live - this would be concerning because this is potentially an attack that could perform mass de-anonymization on Tor users. It's also extremely difficult to counter. The simplest countermeasures (adding random delays, cover traffic, and padding) can generally be defeated with repeated observations. That said - repeated observations are not always possible in the real world. I think a worthwhile research paper would be to implement some or all of these features, perform the attack, and measure what type of security margins you can gain.

There is also the question "Can we expand to other owned nodes?" These 'owned nodes' may be webservers they've compromised, Tor nodes they control, Quantum servers - it's not clear.

End-to-End Traffic Confirmation

Of all our academic papers and threat modeling - this is the one we may have feared the most. The NSA is probably the closest thing to a Global Passive Adversary we have - they're able to monitor large amounts of the Internet infrastructure, log it, and review it. In an End-to-End Traffic confirmation attack, they lay out the plan of attack in [0 slide 6]: "look for connections to Tor, from the target's suspected country, near time of target's activity". They're performing it in the reverse of how I generally think of it: instead of trying to figure out what this particular user is doing - they see activity on a website, and try and figure out which user is performing it

There's no indication of the specifics of how they perform the query: they say they look for "connections to Tor", but does that mean single-hop directory downloads, circuit creation, initial conversation, or something else? Do they take into account packet timings? Traffic size? All of these things could help refine the attack.

According to the slides, GCHQ has a working version dubbed QUICKANT. The NSA has a version as well that "produced no obvious candidate selectors". The NSA goals were to figure out if QUICKANT was working for GCHQ, and continue testing the NSA's version using profiles of "consistent, random and heavy user" - if we assume they don't like the Oxford comma, that's three profiles of a regular, consistent connection to a server, a random connection, and a heavy Tor users.

How do you frustrate End-to-End Confirmation attacks? Well, the bad news is in a Low Latency Onion Routing network - you don't. Ultimately it's going to be a losing proposition, so most of the time you don't try, and instead focus on other tasks. Just like "Timing De-Anonymization" above (which itself is a form of End-To-End Confirmation), it'd be worth investigating random padding, random delays, and cover traffic to see how much a security margin you can buy.

Cookie De-Anonymization

This is a neat trick I hadn't thought of. They apparently have servers around the Internet that are dubbed "Quantum" servers that perform attacks at critical routing points. One of the things they do with these servers it to perform Man-in-the-Middle attacks on connections. The slides[0] describe an attack dubbed QUANTUMCOOKIE that will detect a request to a specific website, and respond with a redirection to Hotmail or Yahoo or a similar site. The client receives the redirect and will respond with any browser cookies they have for Hotmail or Yahoo. (Slides end, Speculation begins:) The NSA would then hop over to their PRISM interface for Hotmail or Yahoo, query the unique cookie identifier and try and run down the lead.

Now the thing that surprises me the most about this attack is not how clever it is (it's pretty clever though) - it's how risky it is. Let's imagine how it would be implemented. Because they're trying to de-anonymize a user, and because they're hijacking a connection to a specific resource - they don't know what user they're targeting. They just want to de-anonymize anyone who accesses, say, example.com. So already, they're sending this redirection to indiscriminate users who might detect it. Next off - the slides say "We detect the GET request and respond with a redirect to Hotmail and Yahoo!". You can't send a 300-level redirect to two sites, but if I was implementing the attack, I'd want to go for the lowest detection probability. The way I'd implement that is by doing a true Man-in-the-middle and very subtly adding a single element reference to Hotmail, Yahoo, and wherever else. The browser will request that element and send along the cookie. However, two detection points remain: a) if you use a javascript or css element, you risk the target blocking it and being alerted by NoScript and b) if the website uses Content Security Policy, you will need to remove that header also. Both points can be overcome - but the more complicated the attack, the more risky.


Finally - let's talk about the exploits mentioned in the slides.

Initial Clientside Exploitation

Let's first touch on a few obvious points. They mention that their standard exploits don't work against Tor Browser Bundle, and imply this may be because they are Flash-based. [0 slide 16, 1] But as Chrome and other browsers block older Flash versions, and click-to-play becomes more standard, any agency interested in exploitation would need to migrate to 'pure' browser-based exploits, which is what [1] indicates. [1] talks about two pure-Firefox exploits, including one that works on TBB based on FF-10-ESR.

This exploit was apparently a type confusion in E4X that enabled code execution via "the CTypes module" (which may be js-ctypes, but I'm not sure). [1] They mention that they can't distinguish the Operating System, Firefox version, or 32/64 bitness "until [they're] on the box" but that "that's okay" - which seems very strange to me because every attacker I know would just detect all those properties in javascript and send the correct payload. Does the NA have some sort of cross-OS payload that pulls down a correct stager? Seems unlikely, I'll chalk this up to a reading too much into semi-technical bullet points in a PowerPoint deck.

This vulnerability was fixed in FF-ESR-17, but the FBI's recent exploitation of a FF-ESR-17 bug (fixed in a point release most users had not upgraded to) shows that the current version of FF is just as easily exploited. These attacks show the urgency of hardening Firefox and creating a smooth update mechanism. The Tor Project is concerned about automatic updates (as they create a significant amount of liability in that signing key and the possibility of compelled updates) - but I think that could be overcome through multi-signing and distribution of trust and jurisdictions. Automatic updates are critical to deploy

Hardening Firefox is also critical. If anyone writes XML in javascript I don't think I want to visit their website anyway. This isn't an exhaustive list, but some of the things I'd look at for hardening Firefox would be:

  1. Sandboxing - a broad category, but Chrome's sandboxing model makes exploitation significantly more difficult.
  2. The JIT Compiler
  3. All third party dependencies - review what they're used for, what percentage of the imported library is actually used, what it's security history is, and if they can be shrunk, removed, or disabled. Just getting an inventory of these with descriptions and explanations of use will help guide security decisions.
  4. Obscure and little-used features - especially in media and CSS parsing. See if these features can be disabled in Tor Browser Bundle to reduce the attack surface, or blocked until whitelisted by NoScript. The E4X feature is a fantastic example of this. Little-used Web Codecs would be another.
  5. Alternate scheme support - looking at about:config with the filter "network.protocol-handler" it looks like there are active protocol handlers for snews, nntp, news, ms-windows-store (wtf?), and mailto. I think those first four can probably be disabled for 99% of users.

For what it's worth, [2 slide 46] mentions that Tails (the Tor live-DVD distribution) adds severe exploitation annoyances. Tails would prevent an attacker from remaining persistent on the machine.

Random Musings on Operations

There's a strange slide with some internal references in one of the presentations. It took me a couple read-throughs, but then something clicked as a possible explanation. This is speculation, but judge for yourself if you think it makes sense.

Apparently they tested their callback mechanism (the piece of code that will phone home to their server) and while it worked on normal Firefox, it didn't work on Tor Browser Bundle. It "gained execution" but "didn't receive FINKDIFFERENT", and that it was "defeated by prefilter hash".

When running a clientside exploitation test for a client, we'll limit our exploits from being served to random people outside their corporate network, usually using IP address filtering. If you're not in the netblock we expect - you don't get the exploit. For us, this is to avoid hacking anyone who hasn't asked for it - for the NSA it's (probably) to limit the chance of detection, and (hopefully) for legal reasons.

If their generic exploitation engine (which Schneier says is called FOXACID[4]) was built in a similar way, they would load the target's IP address into it, and have a mechanism that blocks exploitation unless the payload was accessed by that IP address. Tor would break that - there's the target IP address (which they may read from the machine), and there's the exit node's IP address which is likely read by the server sending the exploit.

I don't know what EGGI or FIDI are, but that's my interpretation of this slide.

Further Clientside Exploitation

There is also a section on dirty tricks that can be played if you've gained access to the user's machine. [0 slide 17] mentions "given CNE access to a target computer" - I interpret that to mean "given Computer Network Exploitation access" - that is "if we're controlling their box already, but we don't want to do something that may alert the user (like copying all their traffic to our IP), how can we subtly backdoor them". This is pretty far outside Tor's threat model - if you're running an application as the same user or as root/Administrator - that's not much a desktop operating system does to prevent one application (like a rootkit) from manipulating a second application (like Tor). But it's definitely worth examining their ideas:

  1. Alter path selection to use exit nodes they control or monitor closely - this could be done by modifying or replacing the Tor binaries on disk, or while it is running in memory
  2. Slide them onto an entire other Tor network - also possible to do on disk or in memory, all you'd need to do would be replace the hardcoded directory servers and signing keys
  3. Stain their user agent or traffic so they can identify it on the way out (as mentioned above, they say they can do this already)
  4. Have the computer call out and let them know where it is (which is what the FBI did in the Freedom Hosting situation)

Web Server Exploitation

Again, given a presence on the webserver (they control or have exploited it), they talk about sending the traffic out of the webserver in such a way that the Tor traffic going to the client is distinguishable. ([0 slide 18] and the Timing De-Anonymization section above)

Mass Node Exploitation

The fairly popular theory of "I bet the NSA has just exploited all the Tor nodes" seems to be partially debunked in [0 slide 19]. They explain "Probably not. Legal and technical challenges."

Tor Disruption

A popular topic in the slides is disrupting and denying access to Tor. My theory is that if they can make Tor suck for their targets, their targets are likely to give up on using Tor and use a mechanism that's easier to surveil and exploit. [0 slide 20] talks about degrading access to a web server they control if it's accessed through Tor. It also mentions controlling an entire network and deny/degrade/disrupt the Tor experience on it.

Wide scale network disruption is put forward on [0 slide 22]. One specific technique they mention is advertising high bandwidth, but actually perform very slowly. This is a similar trick to one used by [Weinmann13], so the Tor project is both monitoring of wide scale disruption of this type and combatting it through design considerations.


The concluding slide of [0] mentions "Tor Stinks... But It Could be Worse". A "Critical mass of targets do use Tor. Scaring them away may from Tor might be counterproductive".

Tor is technology, and can be used for good or bad - just like everything else. Helping people commit crimes is something no one wants to do, but everyone does - whether it's by just unknowingly giving a fugitive directions or selling someone a gun they use to commit a crime. Tor is a powerful force for good in the world - it helps drive change in repressive regimes, helps law enforcement (no, really), and helps scores of people protect themselves online. It's kind of naive and selfish, but I hope the bad guys are scared away, while we make Tor more secure for everyone else.

Something that's worth noting is that this is a lot of analysis and speculation off some slide decks that have uncertain provenance (I don't think they're lying, but they may not tell the whole truth), uncertain dates of authorship, and may be omitting more sensitively classified information. This a definite peek at a playbook - but it's not necessarily the whole playbook. We should keep that in mind and not base all of our actions off these particular slide decks. But it's a good place to start and a good opportunity to reevaluate our progress.

[0] http://media.encrypted.cc/files/nsa/tor-stinks.pdf
[1] http://media.encrypted.cc/files/nsa/egotisticalgiraffe-wapo.pdf
[1.5] http://media.encrypted.cc/files/nsa/egotisticalgiraffe-guardian.pdf
[2] http://media.encrypted.cc/files/nsa/advanced-os-multi-hop.pdf
[3] http://media.encrypted.cc/files/nsa/mullenize-28redacted-29.pdf
[4] http://www.theguardian.com/world/2013/oct/04/tor-attacks-nsa-users-online-anonymity
[Weinmann13] http://www.ieee-security.org/TC/SP2013/papers/4977a080.pdf
Add a comment...
required, hidden, gravatared

required, markdown enabled (help)
you type:you see:
[stolen from reddit!](http://reddit.com)stolen from reddit!
* item 1
* item 2
* item 3
  • item 1
  • item 2
  • item 3
> quoted text
quoted text
Lines starting with four spaces
are treated like code:

    if 1 * 2 < 3:
        print "hello, world!"
Lines starting with four spaces
are treated like code:
if 1 * 2 < 3:
    print "hello, world!"
quick links