add http-sync #1177

martindurant · 2023-02-03T14:14:11Z

@philippjfr this was for you

philippjfr · 2023-02-03T14:27:06Z

This is great thanks. I'll try to test this asap. Is there anything you have to enable to switch to the sync implementation?

martindurant · 2023-02-03T14:30:11Z

You will need to either explicitly instantiate the class yourself, or call fsspec.register_implementation with both "http" and "https" to have it kick out the standard async version when using the standard API, which something like pd.read_csv does.

Mostly copied from async Question: should async FS use sync/requests stuff for things that don't multiplex? We would be able to share code.

martindurant · 2023-03-11T20:59:18Z

Would appreciate someone knowledgeable to chime in: how to test the pyodide functionality here?

Also, should this be a separate package, or should it instead merge like code with normal HTTPFileSystem, which of course has a lot in common?

Last question: I implement my own version of JS requests here, but there are others out there. Can we use one of those, so that the CPython and JS versions of this FS look identical? I suspect no...

martindurant · 2023-03-14T13:49:58Z

Considering merging this as-is, but experimental and unsupported, so that it can be included in a release for wider testing.

lobis · 2023-11-13T22:58:42Z

Hello @martindurant, I think this PR would be enough so that fsspec can be used to get http files using pyodide, right? Are there any plans to merge this PR in the near future? We would be very interested in this feature and can provide testing. Thanks!

martindurant · 2023-11-14T01:23:42Z

Yes, I was of the opinion that this PR was at least ready for serious testing. Unfortunately, I don't really know how to test it within CI - binary blocking transfers must happen in a webworker (but pyscript has made running python code in a worker smoother recently). There are probably unfinished features. Also, it's ugly to copy so much code, but perhaps that's not important.

There has been some renewed interest in getting this in, and I would dearly love to demo a pydata flow (e.g., intake-fsspec-pandas, or even fastparquet) in the browser. Note that https://github.com/koenvo/pyodide-http is probably better established than my requests shim, however - we should at least look at how they do things and find the sharp edges.

wachsylon · 2025-01-24T19:09:38Z

@martindurant I recently used this http-sync filesystem successfully with pyodide. Is this an outdated approach, especially with zarr v3? If not, can we somehow make it usable with the reference filesystem and create sth similar for an s3 filesystem? that would enhance pyodide-support directly. Thanks a lot!

martindurant · 2025-01-24T19:24:53Z

Is this an outdated approach, especially with zarr v3?

Zarr v3 is async internally and runs an event loop on a thread, in a way similar to how fsspec does for async FSs. This is definitely not compatible with pyodide! However, there is an async interface to zarr groups/arrays too, so it's possible that #960 would work, or wrapping the implementation in this PR with AsyncFileSystemWrapper, but I am not sure.

I don't believe the zarr team has any thoughts about pyodide, I would raise an issue on zarr-python.

martindurant · 2025-01-24T20:06:15Z

That pyodide URL does load the interface, but no data. Can it be filled in with a public (no HTTP header needed) URL just to prove the technology stack?

wachsylon · 2025-01-25T08:29:38Z

Well the idea is that one can open any open native zarr url with this App. I am not sure about cross-reference setting... If it is not working for you, i can provide an example with a preload. In the mean time, this example url should work.

martindurant · 2025-01-25T17:59:33Z

Says

{
detail: "Not Found"
}

wachsylon · 2025-01-26T09:07:04Z

Yes, but it is a consolidated zarr dataset. E.g. URL/.zmetadata works.
Anyway, you specify that URL in the zarr text input and click open (without options).

martindurant · 2025-01-26T20:13:38Z

Yes, but it is a consolidated zarr dataset. E.g. URL/.zmetadata works.

OK, now I follow you - it works nicely!

wachsylon · 2025-01-27T08:03:52Z

So what would you recommend to proceed?

Wrap this implementation in a AsyncFileSystemWrapper. This sounds rather easy. Do you have an example?
Using Add pyodide HTTP FS/aiohttp shim #960 ? I fear that this may not work. When I use the http-sync implementation in the chunk-urls of the reference filesystem, the reference file system still uses async methods to get these chunks. I guess that would also happen if I replace it with a js-filesystem.

For both cases, I would be very unsure about how to set it up right or even implement it.

martindurant · 2025-01-27T15:37:49Z

I think in the long run, zarr v3's async API coupled with option 2) is the way to go. Sync fetching of data with potentially many many chunks will not be scalable and suffer from sequential latency (although it's questionable how many bytes we should load into the browser anyway!). I haven't tried it, but it shouldn't be too hard, even with a referenceFS layer over js-http (with asynchronous=True). The top-level functions triggered by user interaction will need to be async too.

I still think that the sync implementation here is useful, though, and your zarr2 example shows this nicely. It will also useful for any pydata API that has no all-async code path, which is most of them (e.g., pd.read_parquet, intake read YAML catalog).

add http-sync

046aef6

martindurant added 2 commits March 7, 2023 13:07

Add pyodide shim to sync http

e2562d7

Put in JS stuff

863d609

hoodmane mentioned this pull request Mar 8, 2023

Worker detection API pyodide/pyodide#3642

Open

martindurant added 2 commits March 8, 2023 12:56

binary for JS

c2b95ab

add sync tests

b49e345

Mostly copied from async Question: should async FS use sync/requests stuff for things that don't multiplex? We would be able to share code.

martindurant marked this pull request as ready for review March 11, 2023 20:56

martindurant mentioned this pull request Apr 20, 2023

Usage with PyScript fsspec/s3fs#728

Closed

lobis mentioned this pull request Nov 14, 2023

feat: add fsspec as required dependency scikit-hep/uproot5#1021

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add http-sync #1177

add http-sync #1177

martindurant commented Feb 3, 2023

philippjfr commented Feb 3, 2023

martindurant commented Feb 3, 2023

martindurant commented Mar 11, 2023

martindurant commented Mar 14, 2023

lobis commented Nov 13, 2023

martindurant commented Nov 14, 2023

wachsylon commented Jan 24, 2025

martindurant commented Jan 24, 2025

martindurant commented Jan 24, 2025

wachsylon commented Jan 25, 2025

martindurant commented Jan 25, 2025 •

edited

Loading

wachsylon commented Jan 26, 2025

martindurant commented Jan 26, 2025

wachsylon commented Jan 27, 2025 •

edited

Loading

martindurant commented Jan 27, 2025

add http-sync #1177

Are you sure you want to change the base?

add http-sync #1177

Conversation

martindurant commented Feb 3, 2023

philippjfr commented Feb 3, 2023

martindurant commented Feb 3, 2023

martindurant commented Mar 11, 2023

martindurant commented Mar 14, 2023

lobis commented Nov 13, 2023

martindurant commented Nov 14, 2023

wachsylon commented Jan 24, 2025

martindurant commented Jan 24, 2025

martindurant commented Jan 24, 2025

wachsylon commented Jan 25, 2025

martindurant commented Jan 25, 2025 • edited Loading

wachsylon commented Jan 26, 2025

martindurant commented Jan 26, 2025

wachsylon commented Jan 27, 2025 • edited Loading

martindurant commented Jan 27, 2025

martindurant commented Jan 25, 2025 •

edited

Loading

wachsylon commented Jan 27, 2025 •

edited

Loading