Skip to content

Commit

Permalink
Exclude known non-FS kwargs in url_to_fs (#1316)
Browse files Browse the repository at this point in the history
  • Loading branch information
martindurant authored Jul 25, 2023
1 parent e64f0a1 commit 285094f
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions fsspec/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,20 @@ def url_to_fs(url, **kwargs):
urlpath : str
The file-systems-specific URL for ``url``.
"""
# non-FS arguments that appear in fsspec.open()
# inspect could keep this in sync with open()'s signature
known_kwargs = {
"auto_mkdir",
"compression",
"encoding",
"errors",
"expand",
"mode",
"name_function",
"newline",
"num",
}
kwargs = {k: v for k, v in kwargs.items() if k not in known_kwargs}
chain = _un_chain(url, kwargs)
inkwargs = {}
# Reverse iterate the chain, creating a nested target_* structure
Expand Down

5 comments on commit 285094f

@will-moore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I think that this commit is somehow involved in an error we're seeing at ome/ome-zarr-py#304
At least when I comment out the code in this commit, then the error goes away (using test code based on that issue).

We're passing in auto_mkdir = True when we create the FSStore() at
https://github.com/ome/ome-zarr-py/blob/30f55fccb2528b69482139a1a7592c9d0ed5e2da/ome_zarr/format.py#L196
but that is stripped out here so it never reaches the LocalFileSystem() init at

def __init__(self, auto_mkdir=False, **kwargs):

Any idea what's going on, or if we need to do something different?
Thanks

@martindurant
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@will-moore : the purpose of url_to_fs is to make filesystem instances with the same input as open/open_files, but without actually writing or reading any files. The trouble is, open_files takes auto_mkdir and creates directories for the specific locations it expects to write to, and so separately does localFS.

It is possible that using auto_mkdirs in open_files is a mistake. Certainly, no tests rely on it except fsspec/tests/test_core.py::test_automkdir (which tests this very thing). However, we would like to remove it gracefully if so.
Another possibility is to have url_to_fs call open_files directly.

For a workaround, you can set auto_mkdir=True in the fsspec config for the localFS alone.

@martindurant
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--- a/fsspec/core.py
+++ b/fsspec/core.py
@@ -210,7 +210,6 @@ def open_files(
     num=1,
     protocol=None,
     newline=None,
-    auto_mkdir=True,
     expand=True,
     **kwargs,
 ):
@@ -249,9 +248,6 @@ def open_files(
     newline: bytes or None
         Used for line terminator in text mode. If None, uses system default;
         if blank, uses no translation.
-    auto_mkdir: bool (True)
-        If in write mode, this will ensure the target directory exists before
-        writing, by calling ``fs.mkdirs(exist_ok=True)``.
     expand: bool
     **kwargs: dict
         Extra options that make sense to a particular storage connection, e.g.
@@ -288,9 +284,6 @@ def open_files(
         protocol=protocol,
         expand=expand,
     )
-    if "r" not in mode and auto_mkdir:
-        parents = {fs._parent(path) for path in paths}
-        [fs.makedirs(parent, exist_ok=True) for parent in parents]
     return OpenFiles(
         [
             OpenFile(
@@ -363,7 +356,6 @@ def url_to_fs(url, **kwargs):
     # non-FS arguments that appear in fsspec.open()
     # inspect could keep this in sync with open()'s signature
     known_kwargs = {
-        "auto_mkdir",
         "compression",
         "encoding",
         "errors",

@will-moore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @martindurant - Thanks for your response...

I'm wondering if we are doing something wrong in ome-zarr-py, but to simplify my test case a bit I've tried to distill the commands we're using.
This is essentially what we're doing:

import zarr
from zarr.storage import FSStore
import numpy as np

store = FSStore("testFSstore.zarr", auto_mkdir = True, dimension_separator="/")
group = zarr.group(store = store)
group.create_dataset("myarray", data=np.ones((10, 10, 10)))

The auto_mkdir option worked previously in this usage.
I'm sure there's a better way to achieve the same thing, but we currently use the store in other places so we create it manually like that.

As I understand it, the config workaround would need creation of a json file at ~/.config/fsspec/conf.json for all users? I think this would be kinda painful for all users of ome-zarr-py.

cc @joshmoore

@martindurant
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.