Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable listings cache for HTTP filesystem #560

Closed
wants to merge 3 commits into from

Conversation

amotl
Copy link
Contributor

@amotl amotl commented Mar 6, 2021

Dear Martin,

we recognized your activity on the code base again and feared we would have missed some merge window already. So, we tried to hurry to bring in this patch I had slumbering on my workstation for some time already.

The patch will enable listings caching for the HTTP filesystem implementation. It is accompanied by appropriate tests and I will be happy if you could take the time to look into it and consider it for the next release if you like it. Please let us know about any amendments you would like to see.

With kind regards,
Andreas.

P.S.: After @gutzbenj submitted #507 the other day, this patch would be the last thing we would need for finally getting rid of the dbmfile-based cache we still use within Wetterdienst, see earthobservations/wetterdienst#243. Because of this, we haven't migrated the code base to filesystem_spec yet, but would be very keen doing so.

On the CI side, as the old cache implementation is apparently not thread-safe (earthobservations/wetterdienst#254 (comment)), this would hopefully enable us to run all tests in parallel using pytest-xdist, see earthobservations/wetterdienst#253. At the same time, it would be a tremendous improvement for our users and also for ourselves, as we would be able to remove some code from the code base and stand on the shoulders of filesystem_spec.

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding! Just a small question.

fsspec/implementations/http.py Outdated Show resolved Hide resolved


def test_list_cache_with_expiry_time(server):
h = fsspec.filesystem("http", use_listings_cache=True, listings_expiry_time=30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could test these three parametrised cases to actually trigger the condition: have a short listing expiry time and sleep past it; list more than 5 paths; make a new instance with/without the instance cache (one should already have its dircache populated, the other not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Martin,

thank you very much for your suggestions, I tried to implement some ideas from them as good as possible. However, maybe I am still missing some details. If you see the tests should be improved once more, I am humbly asking for your guidance and to elaborate a bit more about the details I might have missed.

have a short listing expiry time and sleep past it

I just added b1f4cad in order to bring in your idea of verifying the actual contents of the directory cache beyond the expiration time.

make a new instance with/without the instance cache (one should already have its dircache populated, the other not)

Do you mean something along the lines of ca0a61e?

list more than 5 paths

I am not sure I am following here. The test is currently using the some response mock as the other tests are also using. Is there some mock which would yield more than 5 paths? Would it actually make any difference?

With kind regards,
Andreas.

@martindurant
Copy link
Member

(this PR should be easy to finish off, if you have a little time)

@martindurant
Copy link
Member

  • friendly nudge -

@amotl amotl force-pushed the http-listings-cache branch from 61c14eb to 2cd6946 Compare April 24, 2021 13:46
@amotl amotl force-pushed the http-listings-cache branch from 2cd6946 to ca0a61e Compare April 24, 2021 14:04
@amotl amotl force-pushed the http-listings-cache branch from ca0a61e to 5c7377e Compare April 24, 2021 14:23
@amotl
Copy link
Contributor Author

amotl commented Apr 24, 2021

Hi Martin,

friendly nudge

Thank you. Finally, I have been able to catch up on this. Please let me know about further suggestions for this patch.

On a side: On my workstation, I observed different hiccups where the test suite would stall completely when invoking pytest -vvvv -k test_http. On the other hand I, can't see the outcome on CI, because [1] tells us that it would have to be approved beforehand? Did you recently change something on the GHA CI settings on this topic?

With kind regards,
Andreas.

[1] https://github.com/intake/filesystem_spec/actions/runs/780684958

@martindurant
Copy link
Member

You seem to have managed to hit an async deadlock in this code. I don't see why, but this may be good news - since deadlocks have occasionally occurred elsewhere, and this will give us the ability to investigate.

@martindurant
Copy link
Member

I cannot push to your branch, but earthobservations#1 appears to fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants