Wrong search results due to stale request cache (in some rare situations) #94

d-maurer · 2020-02-14T09:45:32Z

I have just tracked down a difficult problem to UnIndex's request caching (details: "https://community.plone.org/t/potential-memory-corruption-during-migration-plone-4-2-5-2/11655/11"). This demonstrates that it is potentially dangerous. Looking at the code, it seems that UnIndex forgets to invalidate the cache when the index gets modified (and it would not be easy to get this done reliably as many derived classes override the modification methods and therefore may need to take care of the cache invalidation themselves); if this is really the case, then in a sequence of "search, modify index, search" the second search may give outdated results.

What is the use case for this request caching? For me, it seems quite rare that the same request issued the same subquery against the same index several times. If there is not convincing use case, should we not eliminate this feature?

The text was updated successfully, but these errors were encountered:

andbag · 2020-02-14T12:55:06Z

Under Plone there are countless catalog queries that use the same index including filters for different content listings. Without a request cache, the same intersection within the UnIndex would have to be created again and again within a page structure when the index is called repeatedly. In our experience, several 100ms can be saved during page generation for large sites. For example, the Plone indexes derived from UnIndex, allowedUsersAndRoles (KeywordIndex) and effective (DateRangeIndex), are logically ANDed with every query with every catalog call.

It would be sad if the performance improvement, whose implementation was work intensive, would be dropped for large Plone sites.

d-maurer · 2020-02-14T17:23:54Z

Andreas Gabriel wrote at 2020-2-14 04:55 -0800:

Under Plone there are countless catalog queries that use the same index including filters for different content listings. Without a request cache, the same intersection within the UnIndex would have to be created again and again within a page structure when the index is called repeatedly. In our experience, several 100ms can be saved during page generation for large sites. For example, the Plone indexes derived from UnIndex, allowedUsersAndRoles (KeywordIndex) and effective (DateRangeIndex), are logically ANDed with every query with every catalog call. It would be sad if the performance improvement, whose implementation was work intensive, would be dropped for large Plone sites.

In this case, the request caching should be made safer: * the cache is actually a catalog, not an index related cache. As a consequence, `PluginIndexes.cache` should be moved to `ZCatalog` and `UnIndex` obtain the cache to be used from its catalog. * for persistent catalogs, the cache key should include the oid and database in addition to the current information. This would prevent the wrong delivery of search results for foreign catalogs (as I have observed in a mass Plone portal migration script) * the catalog should invalidate its cache on "[un]catalogObject" to prevent the delivery of outdated cached search results when a request uses "search, modify catalog, search" like sequences.

d-maurer · 2020-02-15T08:29:23Z

Andreas Gabriel wrote at 2020-2-14 04:55 -0800:

... It would be sad if the performance improvement, whose implementation was work intensive, would be dropped for large Plone sites.

There is another, maybe more recent, `ZCatalog`optimization which might significantly impact the effectiveness of a request cache: trying to limit the size of intermediate results using a query plan. In order to allow an index to take advantage of a (usually small) intermediate result, it can be passed into the index search. Any index actively using this information (to work internally with small intermediate sets) cannot efficiently cache its results (depending on the passed in intermediate result) in the request (as the next time, the passed in intermediate result is almost surely different). Was this query plan optimization in effect, when you have observed the performance improvements (via request caching) you mentioned above? In a former post, I expressed doubts that "search, modify, search" is safe. Meanwhile, I have seen in the code that `UnIndex` may try to detect intermediate modifications by incorporating some "counter" into the cache key for the respective index. The idea is likely that index modifications change "counter". This may be fragile: `UnIndex` is designed to be subclassed and the subclasses often override index modifying methods; there is some risk that "counter" updates are lost in subclasses. In my original problem described in the plone community (link in my first post), the mass migration (Plone 4.2 --> 5.2) failed for more than 10 % of the portals due to wrong catalog search results from the request cache. I concede that working with different portals in the same request is an unusual use case - probably very rarely used outside scripts. Nevertheless, the quite high failure rate may indicate the "counter" based protection might not be very efficient. In the failure, I have analysed in detail, the query giving the wrong result has been `portal_type == 'Collection'`; it returned 2 results while the result should have been empty. This difference in the number of collections likely implies a different number of catalogued objects and thereby modifications of the "portal_type" index: in other words, the "counter"s should likely have been different.

andbag · 2020-02-17T21:47:41Z

There is another, maybe more recent, ZCatalogoptimization which
might significantly impact the effectiveness of a request cache:
trying to limit the size of intermediate results using a query plan.
In order to allow an index to take advantage of a (usually small)
intermediate result, it can be passed into the index search.
Any index actively using this information (to work internally
with small intermediate sets) cannot efficiently cache its results
(depending on the passed in intermediate result) in the request
(as the next time, the passed in intermediate result is almost
surely different).

I follow your arguments. However, the cache logic of unindex does cache its results before it is ANDed with the passed resultset. The cache is therefore independent of the passed resultset or the index order of the query plan. There are also conditions when the resultset is very small and the cache is not set for performance reasons.

d-maurer · 2020-02-19T05:39:26Z

Andreas Gabriel wrote at 2020-2-17 13:47 -0800:

> There is another, maybe more recent, `ZCatalog`optimization which might significantly impact the effectiveness of a request cache: trying to limit the size of intermediate results using a query plan. In order to allow an index to take advantage of a (usually small) intermediate result, it can be passed into the index search. Any index actively using this information (to work internally with small intermediate sets) cannot efficiently cache its results (depending on the passed in intermediate result) in the request (as the next time, the passed in intermediate result is almost surely different). I follow your arguments. However, the cache logic of unindex does cache its results before it is ANDed with the passed resultset. The cache is therefore independent of the passed resultset or the index order of the query plan. There are also conditions when the resultset is very small and the cache is not set for performance reasons.

But then, the index is not making active use of the optimization potential (it does not avoid the unnecessary large intermediate result); in fact, it is wasting time because the intersection is done (again) at the catalog level.

d-maurer added the question label Feb 14, 2020

andbag assigned hannosch Feb 14, 2020

This was referenced Mar 5, 2020

Use ffi.gc() on the ring nodes to avoid needing a weakref to PersistentPy zopefoundation/persistent#134

Merged

Rework request caching #95

Merged

d-maurer changed the title ~~Is request caching by UnIndex really useful?~~ Wrong search results due to stale request cache (in some rare situations) Mar 26, 2020

d-maurer added bug and removed question labels Mar 26, 2020

d-maurer mentioned this issue May 13, 2020

REQUEST entries "_catalogcache_myZCal_nb" #105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong search results due to stale request cache (in some rare situations) #94

Wrong search results due to stale request cache (in some rare situations) #94

d-maurer commented Feb 14, 2020

andbag commented Feb 14, 2020

d-maurer commented Feb 14, 2020 via email

d-maurer commented Feb 15, 2020 via email

andbag commented Feb 17, 2020

d-maurer commented Feb 19, 2020 via email

Wrong search results due to stale request cache (in some rare situations) #94

Wrong search results due to stale request cache (in some rare situations) #94

Comments

d-maurer commented Feb 14, 2020

andbag commented Feb 14, 2020

d-maurer commented Feb 14, 2020 via email

d-maurer commented Feb 15, 2020 via email

andbag commented Feb 17, 2020

d-maurer commented Feb 19, 2020 via email