Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSRSimpleCache - Performance issue #1035

Closed
norberttech opened this issue Apr 2, 2024 · 2 comments · Fixed by #1147
Closed

PSRSimpleCache - Performance issue #1035

norberttech opened this issue Apr 2, 2024 · 2 comments · Fixed by #1147

Comments

@norberttech
Copy link
Member

Because of the internal index in PSRSimpleCache every single time we want to add something into the cache, we need to perform following operations:

  • check if an index exists in the cache
  • read index from the cache (if it exists)
  • merge that index with new rows
  • write that index into the cache

so only after that, we can put actual rows into the cache. The biggest bottleneck is here:

https://github.com/flow-php/flow/blob/1.x/src/core/etl/src/Flow/ETL/ExternalSort/CacheExternalSort.php#L42-L44

All the above operations on cache will be executed at least as many times as many rows we have. So 10k rows will generate around 40k hits to cache storage. This problem does not exists with the LocalFIlesystemCache because instead of checking if the index exists, it's simply trying to create or open it and then just appending new id at the end of it.

@norberttech
Copy link
Member Author

That issue was revealed when I was working on #1034

@norberttech
Copy link
Member Author

Not fully resolved, but improved a lot by #1036 mostly due to reducing number of writes/reads to/from cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant