-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making Cert-to-CRL bindings fully static #7094
Comments
Implementation plan:
This bug tracks all stages of this implementation plan. |
Add a new "revokedCertificates" table to the database schema. This table is similar to the existing "certificateStatus" table in many ways, but the idea is that it will only have rows added to it when certificates are revoked, not when they're issued. Thus, it will grow many orders of magnitude slower than the certificateStatus table does. Eventually, it will replace that table entirely. The one column that revokedCertificates adds is the new "ShardIdx" column, which is the CRL shard in which the revoked certificate will appear. This way we can assign certificates to CRL shards at the time they are revoked, and guarantee that they will never move to a different shard even if we change the number of shards we produce. This will eventually allow us to put CRL URLs directly into our certificates, replacing OCSP URLs. Add new logic to the SA's RevokeCertificate and UpdateRevokedCertificate methods to handle this new table. If these methods receive a request which specifies a CRL shard (our CRL shards are 1-indexed, so shard 0 does not exist), then they will ensure that the new revocation status is written into both the certificateStatus and revokedCertificates tables. This logic will not function until the RA is updated to take advantage of it, so it is not a risk for it to appear in Boulder before the new table has been created. Also add new logic to the SA's GetRevokedCertificates method. Similar to the above, this reads from the new table if the ShardIdx field is supplied in the request message. This code will not operate until the crl-updater is updated to include this field. We will not perform this update for a minimum of 100 days after this code is deployed, to ensure that all unexpired revoked certificates are present in the revokedCertificates table. Part of #7094
IN-9706 has been file to track creating the new table in prod. I've begun work on the RA code which cannot land until after the table actually exists. |
Add a new clock argument to the test-only ThrowAwayCert function, and use that clock to generate reasonable notBefore and notAfter timestamps in the resulting throwaway test cert. This is necessary to easily test functions which rely on the expiration timestamp of the certificate, such as upcoming work about computing CRL shards. Part of #7094
Make crl.GetChunkAtTIme an exported function, so that it can be used by the RA in a future PR without making that PR bigger and harder to review. Also move it from being a method to an independent function which takes two new arguments to compensate for the loss of its receiver. Also move some tests from batch_test to updater_test, where they should have been in the first place. Part of #7094
We've always known that this project would involve touching the CA's CRL issuance code. However, I incorrectly believed that that work would be able to be put off until it was time to actually include CRL URLs in our certs. Comments on #7133 have made it clear that no, that work should actually happen now. I'm tracking that work in #7159, which blocks this bug. |
Move the CRL issuance logic -- building an x509.RevocationList template, populating it with correctly-built extensions, linting it, and actually signing it -- out of the //ca package and into the //issuance package. This means that the CA's CRL code no longer needs to be able to reach inside the issuance package to access its issuers and certificates (and those fields will be able to be made private after the same is done for OCSP issuance). Additionally, improve the configuration of CRL issuance, create additional checks on CRL's ThisUpdate and NextUpdate fields, and make it possible for a CRL to contain two IssuingDistributionPoint URIs so that we can migrate to shorter addresses. IN-10045 tracks the corresponding production changes. Fixes #7159 Part of #7296 Part of #7294 Part of #7094 Part of #7100
Move the CRL issuance logic -- building an x509.RevocationList template, populating it with correctly-built extensions, linting it, and actually signing it -- out of the //ca package and into the //issuance package. This means that the CA's CRL code no longer needs to be able to reach inside the issuance package to access its issuers and certificates (and those fields will be able to be made private after the same is done for OCSP issuance). Additionally, improve the configuration of CRL issuance, create additional checks on CRL's ThisUpdate and NextUpdate fields, and make it possible for a CRL to contain two IssuingDistributionPoint URIs so that we can migrate to shorter addresses. IN-10045 tracks the corresponding production changes. Fixes letsencrypt#7159 Part of letsencrypt#7296 Part of letsencrypt#7294 Part of letsencrypt#7094 Part of letsencrypt#7100
This is not a hard computer science problem, but it is a hard engineering problem. Deciding the separation of responsibilities between the RA and the SA, encapsulating logic, preventing config from being duplicated between too many different services, etc. Current plan:
|
How does this interact with certificates issued under the current scheme that uses temporal sharding? Proposal: at CRL generation time, check for certificates by both issuance time and shardIdx. Specifically, modify GetRevokedCerts ( Lines 1062 to 1068 in 7137605
Once that's deployed, modify crl-updater to unconditionally pass ShardIdx in its call to GetRevokedCerts. Then add the code to recognize embedded CRL URLs at revocation time and write to Then add the code to generate embedded CRL URLs at issuance time. |
I believe that plan would naively result in some certificates being included in multiple CRL shards -- they'd appear in one shard due to temporal sharding, and appear in a different shard due to the RevokedCerts table (if they were revoked after the RA starts supplying a shard number at revocation time). The original plan was to have the RA supply a shard number, then wait for all certs that were revoked without a shard number to expire (so we have 100% coverage), then flip the flag. This would cause certs to move between shards, but they'd never be duplicated. And it would take 3 months to bake. I think that your idea works, and works much faster, at the cost of temporarily inflating our CRL sizes. I'm not personally aware of any compliance issue with a cert appearing in multiple shards of a sharded CRL, as long as the data in each entry is identical. |
Yes, it would definitely result in most revoked certificate showing up in two shards - its temporal shard and its explicitly assigned shard. I also think this is okay. If we want to avoid that, another option: when assigning explicit shards (at issuance time), assign them from a range of numbers that don't overlap with the temporal sharding range at all. This would require temporarily doubling the number of shards we publish in CCADB. This has the advantage that it might be a little more obvious to debug. "This shard number is > 2000, so we expect it to be filled with certificates that had an explicit shard assigned at issuance time." |
I think simply inflating our current shards is the simpler and better solution. I'm doing some digging now, but so far nothing has popped up to indicate this would be a compliance issue. |
#7918 is out for review. Of course, having made that change and updated the tests, I'm now wondering: perhaps it would be better to expose a new SA RPC: Any possible memory bloat would be taken by the crl-updater (which already has some memory bloat), not the SA. Also, it would let crl-updater get clever about when the "last" temporal shard is - that is, the last span of time that covers certificates without a CRLDP. |
I do like the idea of having crl-updater take the memory bloat, rather than the SA. It could be done even without the new method: |
I'm choosing to do it with the new method, because the set of fields required for |
The current plan has these parts:
|
In RA.RevokedCertificate, if the certificate being revoked has a crlDistributionPoints extension, parse the URL and pass the appropriate shard to the SA. This required some changes to the `admin` tool. When a malformed certificate is revoked, we don't have a parsed copy of the certificate to extract a CRL URL from. So, specifically when a malformed certificate is being revoked, allow specifying a CRL shard. Because different certificates will have different shards, require one-at-a-time revocation for malformed certificates. To support that refactoring, move the serial-cleaning functionality earlier in the `admin` tool's flow. Also, split out one of the cases handled by the `revokeCertificate` helper in the RA. For admin malformed revocations, we need to accept a human-specified ShardIdx, so call the SA directly in that case (and skip stat increment since admin revocations aren't useful for metrics). This allows `revokeCertificate` to be a more helpful helper, by extracting serial, issuer ID, and CRL shard automatically from an `*x509.Certificate`. Note: we don't yet issue certificates with the crlDistributionPoints extension, so this code will not be active until we start doing so. Part of #7094.
The SA had some logic (not yet in use) to return revoked certificates either by temporal sharding (if `req.ShardIdx` is zero) or by explicit sharding (if `req.ShardIdx` is nonzero). This PR splits the function into two. The existing `GetRevokedCerts` always does temporal sharding. The new `GetRevokedCertsByShard` always does explicit sharding. Eventually only `GetRevokedCertsByShard` will be necessary. This change was discussed in #7094 (comment) and is a precursor to having the crl-updater call both methods, so we can merge the results when generating CRLs.
For #7974 I needed the CA to know at issuance time how many CRL shards it has. Right now, the config for number of CRL shards is in the crl-updater, and it's global - all CAs have the same number of shards. In the CA, it made sense to configure the number of CRL shards as part of But that does mean the CA can have different numbers of shards configured for different issuers. Perhaps even more importantly, the crl-updater can disagree with the CA about how many shards there are. One possible solution: each time the crl-updater calls UpdateCRL, it sends metadata containing the current number of shards. If the issuer's config disagrees, the CA errors. This would cause the CRL updater to error out CRL generation; we could detect that with metrics monitoring (for some new metric), or with end-to-end monitoring that CRLs are being updated promptly. Or we could cause the crl-updater to crash on this specific kind of error. |
One problem with the current design: during the transition period between temporally sharded certificates and explicitly sharded certificates, the explicitly sharded certificates will typically show up on two CRLs: their explicit shard and their temporal shard (which is typically different). Proposal: Use the CA's
This has some nice advantages. It allows us to avoid duplicating entries, even across different shards. Conceptually it allows a cleaner separation of certificates: they are either temporally or explicitly sharded, based on how they were issued. Operationally it makes it easy to see whether a given serial is temporally sharded or explicitly sharded. |
Today, our certs are assorted into CRL shards at the time those shards are generated. The decision of which shard a cert will end up in is based on: the number of shards, the configured "chunk width", and the cert's notAfter timestamp. This is good, and prevents us from moving certs between shards unnecessarily.
However, it is not good enough for us to be able to include CRL URLs directly in our certs. In order to do that, the mapping needs to be fully static.
#7007 was a necessary prerequisite for this work.
Our design is as follows:
Phase 1: Populating the revokedCertificates Table
We create a new database table with the following schema:
This table has all of the information necessary to look up revoked certificates quickly by shard, and all of the information necessary to populate a Revoked Certificate entry in a CRL. All columns are NOT NULL because rows will only be inserted when a certificate is actually revoked, and all of this information is available.
We augment the SA’s RevokeCertificate method to take a new parameter, the CRL shard to which the certificate belongs. If it receives a request which contains a shard index, it writes a row to the new revokedCertificates table in addition to its current update to the certificateStatus table.
We augment the RA with the same CRL bucketing configuration parameters as the crl-updater has, specifically NumShards and ShardWidth, which control which shard a certificate falls into. We also add to the RA a version of the shard computation code which can quickly determine the correct shard number given the config values above and a certificate’s notAfter date. When the RA processes a revocation request, it will compute the shard index for that certificate, and include it in the request to the SA, triggering the codepath described above.
Phase 2: Using the revokedCertificates Table
When Phase 1 has been deployed for more than 90 days, and we are sure that all unexpired revoked certificates have rows in the revokedCertificates table, we can begin reading from it instead of reading from the certificateStatus table.
We will augment the SA’s GetRevokedCerts method to take a new parameter, the CRL shard which is being queried. If it receives a request which contains a shard index, it will query the revokedCertificates table instead of the certificateStatus table, and will include the shard index as an additional WHERE clause in the query.
We then greatly simplify crl-updater’s query mechanism. Rather than using its configuration parameters to compute complex sets of ExpiresBefore and ExpiresAfter dates that contribute to each shard, it will instead send a single query per shard, including that shard ID in the request to the SA. The request will still include ExpiresAfter and ExpiresBefore timestamps, because the crl-updater is the only entity which knows how far back our look-back needs to go and what timestamp is being used as the cutoff for this CRL.
The text was updated successfully, but these errors were encountered: