CRLSets
On 27-Oct-2014 01:58:23:
Since none of the other SSL certificate revocation mechanisms actually work, Google invented their own for Chrome, they call it CRLSets. Their conclusion was that if the existing mechanisms didn't work, maybe they could design one that works in a more limited scope. The limited scope they chose was EV certificates, ideally only those revoked for actual security reasons. I think that within that scope they succeeded, although others certainly disagree.
I was curious exactly what was being pushed out to Chrome, so I decided to dig a little deeper. If you don't mind installing Go, then Adam Langley released a tool for pulling the latest set. I certainly don't mind installing Go, but what better way to see what the data is like than porting Adam's tool to say Python? I've never done the kind of crypto work I need to verify the file in Python, might as well go learn it.
What does a CRLSet look like? (I'll skip over the authentication and integrity checking, its interesting to code that sort of thing in python for the first time, but its not that interesting to talk about) Well it has a sequence number (currently 1882) and an expiration date that looks to be about 4 days in the future. From what I've seen so far its generated twice a day right at around noon and midnight in the Bay AreaExclusion Zone. The file contains a set of certificate serial numbers categorized by the SHA-256 hash of the "Subject Public Key Info" field of the issuing certificate autthority, basically the RSA public key (although there are DSA and ECDSA certificate authorities out there). Packed in with the metadata (it was added after the fact) is an explicit list of blocked certificates, against listed by the SHA-256 hash of their SPKI.
So there you have it, thats a CRLSet. You'll notice something though, the limited scope of the CRLSet means that only 54 CAs are included. Given that many of those are likely to be intermediate certificates, just how many CAs do we really have in the file? What if you were setting up an important website and wanted to be *SURE* that you used a CA that was included? Well Google doesn't officially disclose the participating CAs, they are also upfront about the fact that some of the revocation information they crawl is non-public. I don't think they really intend for the list of CAs in the CRLset to be a secret though, they just aren't volunteering the information either.
So now that I'm done figuring out what is in a CRLSet, now it is time to figure out what is in a CRLSet. So there are currently 54 CAs in the file and 8 explicitly blocked certificates. The latter bit is faster to talk about, because with the exception of the 2 well known entries in that list (cloudflarechallenge.com and revoked.grc.com) we are probably out of luck finding out what those are. The list of CAs on the other hand is probably knowable. For starters, I think it is pretty unlikely that Google would include a CA that Chrome wouldn't trust, which means we can start by extracting the root certificate store from a few operating systems.
Well my listing of about 200 root CAs only got me 13 entries out of 54. :( Turns out very few CAs directly issue end user certificates off of their root certificate anymore, instead they issue themselves (or subsidiaries) intermediate CA certificates and issue end user certificates from those. So we need a good source of intermediate CA data, which is generally not included with the operating system. Some of the CAs have nice and easy to download bundle files, which helps a litttle, but still didn't get much. Since end users with chained certificates are supposed to be serving up the intermediate certs as part of the SSL connection, I could just scan the internet on port 443, but that would take a while.
Notice how I said supposed to up there? Yeah, some servers are misconfigured and don't share the intermediate certificates which means their end user certificate can't be verified. Apparently Mozilla caches these intermediate CAs indefinitely when it receives them, presumably to help it complete the certificate chains when it finds these bad servers. Even better, we easily can extract this from the internal certificate store! I now have nearly 600 CA certificates which gets me 36 out of 54, not bad for a little scripting work. CA hash mapping. I'll keep updating that as new SPKI hashes show up in the CRLSets and as I collect new intermediate certificates.
The names of the CAs in and of itself isn't terribly interesting, its exactly what Google said it was. The CAs are mostly EV and High-Assurance intermediate CAs. So what was the point? I'll admit, mostly because as far as I can tell no one has done this yet. Dumping the list let's everyone know who the star bellied sneetches are in the eyes of Google. The explicitly blocked SPKIs would be more interesting, but guessing those just isn't possible. I suspect they all correspond to high profile revocations and with some research I might be able to identify one or two others. Perhaps any researchers doing long term monitoring of revocations would be interested.