Implement interface for invalidation after POST/PUT #1

kornelski · 2016-05-31T19:33:59Z

A cache MUST invalidate the effective Request URI (Section 5.5 of [RFC7230]) as well as the URI(s) in the Location and Content-Location response header fields (if present) when a non-error status code is received in response to an unsafe request method.

However, a cache MUST NOT invalidate a URI from a Location or Content-Location response header field if the host part of that URI differs from the host part in the effective request URI (Section 5.5 of [RFC7230]). This helps prevent denial-of-service attacks.

A cache MUST invalidate the effective request URI (Section 5.5 of [RFC7230]) when it receives a non-error response to a request with a method whose safety is unknown.

http://httpwg.org/specs/rfc7234.html#invalidation

goofballLogic · 2017-01-31T17:35:59Z

As far as I am aware, the prototcol for this is simply:

Expose an invalidation() method which returns:

If the Method is one of "GET", "HEAD", "OPTIONS", "TRACE", return null
Return an object with path, host and calculated full url to invalidate.

Does it need to be any more complex than that?

kornelski · 2017-01-31T18:52:35Z

That sounds about right.

I haven't thought through revalidation yet, so I wasn't sure if these would overlap.

goofballLogic · 2017-01-31T19:45:52Z

Spitballing:

A. CachePolicy would expose a method validation() which returns:

If the cached response can not be revalidated, return null [stop]
Return an object with props indicating
i. the pre-requisites for revalidation
ii. the necessary host, path, headers and calculated full URL for the validation request

B. CachePolicy would expose a method freshen( validationResponse ) which returns a RefreshPolicy object exposing:

validators - array of validators with which to select responses to freshen (if the new response contains strong/weak validators)
updateable( cachedResponse ) which returns true if the specified response should be selected for update
responseHeaders( cachedResponse ) which returns the updated, filtered set of response headers after update (following rules http://httpwg.org/specs/rfc7234.html#rfc.section.4.3.4 )

goofballLogic · 2017-01-31T19:49:39Z

If the above is acceptable then we would expect:

A.1. -> call invalidation()
B.2. "updateable" returns false -> call invalidation()

Which, to my mind makes these two concerns (invalidation, revalidation) related, but not overlapping.

kornelski · 2017-01-31T20:11:56Z

On higher level I'm using this API:

const response = await cache.get(request, modifiedRequest => {
    return http.makeRequest(modifiedRequest);
});

where the callback gets request with added headers for revalidation. Returns promise with updated response. The response is then used to update the cached request.

In such API invalidation is an implementation detail. However, I'm not sure if it's not too inflexible, e.g. it doesn't allow stating preference for last-modified over etag.

goofballLogic · 2017-01-31T20:16:39Z

ok but the freshening rules specify that you may need to update multiple cached responses.

If the new response contains a strong validator (see Section 2.1 of [RFC7232]), then that strong validator identifies the selected representation for update. All of the stored responses with the same strong validator are selected. If none of the stored responses contain the same strong validator, then the cache MUST NOT use the new response to update any stored responses.

kornelski · 2017-01-31T20:34:32Z

Good point. Am I correct thinking that even strong validators are per URL (not per host)?

Since there's satisfiesRequestWithoutRevalidation, let's try one with revalidation:

const x = cache.satisfiesRequest(request, options);

I'm not sure what it should return. It needs to contain revalidation request, but returning null on success is icky, so maybe {satisfies: false, request: {…}}.

or

const result = cache.evaluate(request, options);
if (result.satisfies) {
   return result.responseHeaders();
}
const response = await fetch(result.revalidationRequest());
result.update(response); 
if (result.satisfies) {
   return result.responseHeaders();
} else {
   // dunno?
}

edit: nah, it's terrible.

goofballLogic · 2017-02-01T11:45:44Z

It would seem that strong validators can be per URL - i can't find a limitation on this in the spec. Furthermore, the main use for freshening multiple responses seems to be related to multiple responses for the same resource but with different header values (other than those specified by the Vary restrictions).

Essentially, there are four possible outcomes when evaluating a request against a cached response:

The cached response is fresh
The cached response is stale but the request allows it using e.g. "Cache-Control: max-stale=1000"
The cached response is stale and must be validated
The cached response is stale and cannot be validated

(1) and (2) are covered by satisfiesRequestWithoutRevalidation

I think your second design has some merit - perhaps if we start with a method validationRequest so we can at least do:

if (cachePolicy.satisfiesRequestWithoutRevalidation(request, options)) {
    // proceed with cached response
} else {
   const validationRequest = cachePolicy.validationRequest(); // handles (3)
   if(validationRequest) { 
      const validationResponse = await fetch(validationRequest);
      // mechanism to freshen tbd
   } else {
      // mechanism to invalidate tbd
   }
}

kornelski · 2017-02-01T12:35:39Z

Yup, that makes sense.

I'd like to hide from users as much logic as possible, so validationRequest could return the original request, even if it doesn't revalidate.

if (cachePolicy.satisfiesRequestWithoutRevalidation(request, options)) {
    return [cachePolicy.responseHeaders(), cachedBody];
} else {
   const validationRequest = cachePolicy.validationRequest();
   const validationResponse = await fetch(validationRequest);

   // equivalent of cachePolicy = new CachePolicy(validationRequest, validationResponse); ?
   cachePolicy.update(validationResponse);

   // So here the cache only needs to know whether to keep the old body or use the new one
    return [cachePolicy.responseHeaders(), cachedOrNewBody];
}

kornelski · 2017-02-01T12:38:04Z

const validationRequest = cachePolicy.needsRequest(request);
if (validationRequest) {
   const validationResponse = await fetch(validationRequest);

   if (cachePolicy.update(validationResponse)) {
       cachedBody = validationResponse.body;
   }
}
return [cachePolicy.responseHeaders(), cachedBody];

kornelski · 2017-02-01T12:51:21Z

Oh, validationRequest() needs to know all cached representations to build If-None-Match, and then on 304 response will have to help choose the right one.

goofballLogic · 2017-02-01T13:09:21Z

An efficient public cache would need to know about all such representations, yes. But, I'm not sure that means this library needs to work on the basis of more than one. Perhaps that could be a later refinement?

I think it would be enough to make the validation request on behalf of 1 response, and then help in selecting responses which can be freshened.

goofballLogic · 2017-02-01T15:13:25Z

e.g. starting simple: goofballLogic@c18bbba

goofballLogic · 2017-02-01T15:40:49Z

Rules for forming the validation request appear to be very simple (complexity comes with sub-range requests which I note you don't support): goofballLogic@26339d9

goofballLogic · 2017-02-01T15:49:47Z

allowing revalidation via HEAD: goofballLogic@fbfb403

goofballLogic · 2017-02-01T16:55:08Z

Ok, here's a compromise proposal for processing response from the revalidation request:

freshen() method on CachePolicy, a factory method returning:
i. new (freshened or replaced) CachePolicy and Response
ii. criteria for selecting other responses to update
iii. callback to freshen other CachePolicy and responses.

const validationResponse = await fetch(validationRequest);
const {
    validCachePolicy, // freshened (or replaced) cache policy
    validBody, // body to use after validation
    updateCriteria, // object hash with props to match
    freshen, // (selectedCachePolicy, selectedResponseBody) => { validCachePolicy, validBody }
} = cachePolicy.freshen(validationResponse);

kornelski · 2017-02-01T20:25:27Z

Your implementation looks good.

Keeping CachePolicy immutable and returning a new one sounds good.

Currently the policy doesn't store bodies, and I think it shouldn't (I JSON-stringify it in a DB column, but store bodies as files on disk). So it can't return old validBody. But maybe it doesn't have to? For users the logic may be simple - if the validation response has a body, use it.

How would criteria look like?

goofballLogic · 2017-02-01T20:35:51Z

you are right - will rethink the body part a little
criteria - i think a simple has of props to match would suffice (b/c usually it'll just be an etag along with the url and host ).
perhaps:

{ url: '/my-resource', headers: { host: 'www.test.com', etag: '"123456789"' } }

kornelski · 2017-02-01T22:21:43Z

Criteria looks like it's exposing details and shifting hard work to the caller. And what about weak etags? What if the header had whitespace? or host had a default port?

So perhaps that should be a function that takes a request (or cachePolicy?) and does the matching (and maybe even updating).

goofballLogic · 2017-02-02T10:46:55Z

Agreed. the criteria concept was to allow efficient querying of databases, but, as you say, it makes correct usage more complex for the user. I'll have a stab at this today.

kornelski · 2017-02-02T14:28:39Z

Interesting point about databases. I was initially planning to expose some kind of "primary key" built from Vary headers in order to know which ones to keep and which ones are redundant.

This concept could be extended to all requests, so on high level it could be "these are the tags/keys for the request" when you cache it, and then "delete these tags/keys" when you invalidate.

goofballLogic · 2017-02-02T15:02:39Z

For most cases, it would probably be sufficient to query your database by url and method, then run each of the matching CachePolicies through our matching function to determine whether or not it's selected for refresh. The only case where i can see this being problematic is for URLs like "/" "/favicon.ico" etc. which might have a very large number of matches.

Your concept of a primary key is starting to sound more appealing. Would it include method, URL and host as well as the Varying headers?

Actually, new thought: in order to do the "weak-validator" work, our function probably needs to evaluate potential CachePolicies as a set (not one-by-one). This is in order to select "the most recent of those matching stored responses".

So, we can't really apply freshening logic to a single CachePolicy at all (unless by implication that there are no others).

Would the following work instead:

const validationResponse = await fetch(validationRequest);
/*
   selectablePolicies: array of retrieved CachePolicy which may be freshened
  (excluding the current cachePolicy?)
*/
const cacheSelectors = cachePolicy.selectors(validationResponse);
const selectorPolicies = await Promise.all( cacheSelectors.map( x => dbFetch.bySelector( x ) ) ); 
const selectablePolicies = [].concat.apply([],selectorPolicies);
/*
   freshened: [ { validCachePolicy, newResponseBody } ] (newResponseBody may be undefined)
*/
const freshened = cachePolicy.freshen(validationResponse, selectablePolicies);

this api could also be used by people who are comfortable that there will be no matching of multiple CachePolicy by just skipping the selectors step:

const freshened = cachePolicy.freshen(validationResponse);

goofballLogic · 2017-02-02T19:50:52Z

Preliminary work: Identify strong and/or weak validators for a response: goofballLogic@57e8e7a

goofballLogic · 2017-02-03T13:53:08Z

I hope it's ok - i'm going to go ahead an implement an algorithm for the cache key based on http://httpwg.org/specs/rfc7234.html#rfc.section.4.1 and http://httpwg.org/specs/rfc7234.html#freshening.responses (the preliminary work mentioned above).

I need a working implementation rather urgently for another project...

goofballLogic · 2017-02-03T17:08:35Z

Ok, here's a sample cachekey:

[ 1, 'http://www.w3c.org:80/Protocols/rfc2616/rfc2616-sec14.html', [ [ 'moon-phase', 'sun', 'weather' ], [ '', 'shining', 'nice,bright' ] ], { etag: '"123456789"' }, { 'last-modified': 'Tue, 15 Nov 1994 12:45:56 GMT' } ]

Code and tests show more here:

goofballLogic@3a5e6c8

If this cachekey were to be persisted in a relational database, you might consider storing it as 5 distinct columns:

v	url	vary	strong	weak	body
1	http://www.w3c.org:80/Protocols/rfc2616/rfc2616-sec14.html	[ [ 'moon-phase', 'sun', 'weather' ], [ '', 'shining', 'nice,bright' ] ]	{ etag: '"123456789"' }	{ 'last-modified': 'Tue, 15 Nov 1994 12:45:56 GMT' }	"<!DOCTYPE html>\n<html>. . ."

This would allow us to query the database using different combinations of validators along with primary (url) and secondary (vary) keys

goofballLogic · 2017-02-03T19:03:08Z

Generating selectors based on a validationResponse:

goofballLogic@7927178

kornelski · 2017-02-03T20:42:30Z

How is user supposed to know what to do with "strong" and "weak" columns?

kornelski · 2017-02-03T20:59:30Z

It's great that you're adding tests for everything.

The approach seems fine in general. I'm picky about implementation details, so I'll probably want to tweak a few things (code for getting weak and strong validators is "clever", and I'm suspicious of static methods, so I'll see if I can avoid them).

Currently I have limited time, so this will probably wait. I need to eventually improve my cache, so I'll merge the enhancements once I try them out.

goofballLogic · 2017-02-04T15:16:36Z

Ok. If you have detailed guidance about required changes, I'm happy to implement them - I'm completely ok with following your aesthetic and algorithmic preferences.

Alternatively, I can just submit a PR for what I've done and leave it for you to complete.

goofballLogic · 2017-02-04T18:35:14Z

Regarding "strong" or "weak" columns: the selector() call returns a selector which is always a cache key containing either a strong or a weak validator (per the spec). The user can then use this to compare against cache keys either in memory, in a relational store (such as shown) or in a nosql store. The selector specifies everything the user needs to query for cached policies which may be freshened or invalidated.

I'll put together a suggested markdown page showing usage, as this may help explain the overall design.

goofballLogic · 2017-02-04T22:39:34Z

Here's a start: https://github.com/goofballLogic/http-cache-semantics/blob/master/revalidation-invalidation.md:

Why validate/invalidate
Identifying a cached response
i. Detail (partially TBD)
Looking up a request in your cache
i. Secondary cache key matching
Validating a stale cache response
Handling a cache validation response

TBD:
6. Freshening cached response
7. Invalidation

kornelski · 2017-02-04T23:52:04Z

either a strong or a weak validator (per the spec).

BTW, I'm aiming for an API that doesn't require users to know the spec at all. I'm hoping the library can be used instead of reading the RFCs.

kornelski · 2017-02-05T00:43:34Z

I think the selector and key methods should not live on cachePolicy. The functionality probably should go to another class.

someServer.on(async req => {
    const matcher = CacheMatcher(req);
    const pkey = matcher.primaryKey(); // effective URL
    const cachePolicies = await db.get(pkey); // [CachePolicy, CachePolicy, …]
    const {satisfiedPolicy, revalidationRequest} = matcher.match(cachePolicies);
    if (satisfiedPolicy) {
       return fromCache();
    } else {
        const res = await fetch(revalidationRequest);
        …// still thinking
    }
});

kornelski · 2017-02-05T00:56:55Z

or another way

someServer.on(async req => {
    const matcher = CacheMatcher(req);
    const pkey = matcher.primaryKey(); // effective URL
    for row of await db.get_all("select * from cache where pkey = ?",[pkey]) {
        if (matcher.satisfiesWithoutRevalidation(row.cachePolicy)) {
            return row.cachedResponse;
        }
    }

    // now matcher has seen all responses, so it can build revalidation request with all etags
    const revalidationRequest = matcher.revalidationRequest();
    const res = await fetch(revalidationRequest);

    // I'm not sure about this. It'd probably need to instruct what to add/update and what to delete?
    const {cachePolicy, secondaryKeys} = matcher.update(res);

    // secondary key would be built from sorted vary headers
    for skey of secondaryKeys {
        db.exec("update cache set cachePolicy = ? where pkey = ? and skey = ?", [cachePolicy, pkey, skey]);
    }
});

goofballLogic · 2017-02-05T13:26:08Z

ok, i'll let you think it through in your own time. you have the makings of a good library here.

kornelski · 2017-03-23T13:17:53Z

I've published an update with a less ambitious API :)

Implement interface for invalidation after POST/PUT #1

Implement interface for invalidation after POST/PUT #1

Comments

kornelski commented May 31, 2016

goofballLogic commented Jan 31, 2017 • edited Loading

kornelski commented Jan 31, 2017

goofballLogic commented Jan 31, 2017 • edited Loading

goofballLogic commented Jan 31, 2017 • edited Loading

kornelski commented Jan 31, 2017 • edited Loading

goofballLogic commented Jan 31, 2017 • edited Loading

kornelski commented Jan 31, 2017 • edited Loading

goofballLogic commented Feb 1, 2017 • edited Loading

kornelski commented Feb 1, 2017

kornelski commented Feb 1, 2017

kornelski commented Feb 1, 2017

goofballLogic commented Feb 1, 2017 • edited Loading

goofballLogic commented Feb 1, 2017

goofballLogic commented Feb 1, 2017

goofballLogic commented Feb 1, 2017

goofballLogic commented Feb 1, 2017 • edited Loading

kornelski commented Feb 1, 2017

goofballLogic commented Feb 1, 2017

kornelski commented Feb 1, 2017 • edited Loading

goofballLogic commented Feb 2, 2017

kornelski commented Feb 2, 2017 • edited Loading

goofballLogic commented Feb 2, 2017 • edited Loading

goofballLogic commented Feb 2, 2017

goofballLogic commented Feb 3, 2017

goofballLogic commented Feb 3, 2017 • edited Loading

goofballLogic commented Feb 3, 2017

kornelski commented Feb 3, 2017

kornelski commented Feb 3, 2017

goofballLogic commented Feb 4, 2017

goofballLogic commented Feb 4, 2017 • edited Loading

goofballLogic commented Feb 4, 2017

kornelski commented Feb 4, 2017

kornelski commented Feb 5, 2017

kornelski commented Feb 5, 2017 • edited Loading

goofballLogic commented Feb 5, 2017

kornelski commented Mar 23, 2017

goofballLogic commented Jan 31, 2017 •

edited

Loading

goofballLogic commented Jan 31, 2017 •

edited

Loading

goofballLogic commented Jan 31, 2017 •

edited

Loading

kornelski commented Jan 31, 2017 •

edited

Loading

goofballLogic commented Jan 31, 2017 •

edited

Loading

kornelski commented Jan 31, 2017 •

edited

Loading

goofballLogic commented Feb 1, 2017 •

edited

Loading

goofballLogic commented Feb 1, 2017 •

edited

Loading

goofballLogic commented Feb 1, 2017 •

edited

Loading

kornelski commented Feb 1, 2017 •

edited

Loading

kornelski commented Feb 2, 2017 •

edited

Loading

goofballLogic commented Feb 2, 2017 •

edited

Loading

goofballLogic commented Feb 3, 2017 •

edited

Loading

goofballLogic commented Feb 4, 2017 •

edited

Loading

kornelski commented Feb 5, 2017 •

edited

Loading