Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong search results due to stale request cache (in some rare situations) #94

Open
d-maurer opened this issue Feb 14, 2020 · 5 comments
Assignees
Labels

Comments

@d-maurer
Copy link
Contributor

I have just tracked down a difficult problem to UnIndex's request caching (details: "https://community.plone.org/t/potential-memory-corruption-during-migration-plone-4-2-5-2/11655/11"). This demonstrates that it is potentially dangerous. Looking at the code, it seems that UnIndex forgets to invalidate the cache when the index gets modified (and it would not be easy to get this done reliably as many derived classes override the modification methods and therefore may need to take care of the cache invalidation themselves); if this is really the case, then in a sequence of "search, modify index, search" the second search may give outdated results.

What is the use case for this request caching? For me, it seems quite rare that the same request issued the same subquery against the same index several times. If there is not convincing use case, should we not eliminate this feature?

@andbag
Copy link
Member

andbag commented Feb 14, 2020

Under Plone there are countless catalog queries that use the same index including filters for different content listings. Without a request cache, the same intersection within the UnIndex would have to be created again and again within a page structure when the index is called repeatedly. In our experience, several 100ms can be saved during page generation for large sites. For example, the Plone indexes derived from UnIndex, allowedUsersAndRoles (KeywordIndex) and effective (DateRangeIndex), are logically ANDed with every query with every catalog call.

It would be sad if the performance improvement, whose implementation was work intensive, would be dropped for large Plone sites.

@d-maurer
Copy link
Contributor Author

d-maurer commented Feb 14, 2020 via email

@d-maurer
Copy link
Contributor Author

d-maurer commented Feb 15, 2020 via email

@andbag
Copy link
Member

andbag commented Feb 17, 2020

There is another, maybe more recent, ZCatalogoptimization which
might significantly impact the effectiveness of a request cache:
trying to limit the size of intermediate results using a query plan.
In order to allow an index to take advantage of a (usually small)
intermediate result, it can be passed into the index search.
Any index actively using this information (to work internally
with small intermediate sets) cannot efficiently cache its results
(depending on the passed in intermediate result) in the request
(as the next time, the passed in intermediate result is almost
surely different).

I follow your arguments. However, the cache logic of unindex does cache its results before it is ANDed with the passed resultset. The cache is therefore independent of the passed resultset or the index order of the query plan. There are also conditions when the resultset is very small and the cache is not set for performance reasons.

@d-maurer
Copy link
Contributor Author

d-maurer commented Feb 19, 2020 via email

@d-maurer d-maurer changed the title Is request caching by UnIndex really useful? Wrong search results due to stale request cache (in some rare situations) Mar 26, 2020
@d-maurer d-maurer added bug and removed question labels Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants