Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong file loaded when building new SpatialData #147

Open
dawe opened this issue May 17, 2024 · 2 comments
Open

Wrong file loaded when building new SpatialData #147

dawe opened this issue May 17, 2024 · 2 comments

Comments

@dawe
Copy link

dawe commented May 17, 2024

I am working with some data with many .h5ad files stored in the same directory. Whenever I create a new SpatialData object I specify the path to the correct AnnData counts file.
I noticed that the file that is loaded is typically the wrong one and the first .h5ad in the directory is loaded instead. I believe this is because the first element of a list is considered here

anndata_path_checked = _check_path(

I guess the same happens for the barcodes that are specified in the line below

barcode_position_checked = _check_path(

The only workaround right now is to have multiple paths for different files.

@LucaMarconato
Copy link
Member

From a quick look I think the reason is what you described. @lillux could you have a look at this please?

@lillux
Copy link
Contributor

lillux commented May 28, 2024

@dawe @LucaMarconato This has been solved in #139, that also describe how the parsing behavior has been changed. Now the reader gives priority the file of which the path has been specified, instead of prioritizing the pattern matching.
I have just tested the above scenario with spatialdata-io version 0.1.3.dev117+g3c03009, at commit 3c03009, and an Exception is raised when multiple matching files are found.
Please @dawe let me know if you still see the problem with the latest version of spatialdata-io.

Implementation details on path resolution behavior

This is in part to describe implementation details, in part to get opinions in improving it.
Just to clarify why we take index [0] of _check_path() output in the 2 calls below:

anndata_path_checked = _check_path(
path=path, path_specific=anndata_path, pattern=patt_h5ad, key=DbitKeys.COUNTS_FILE # type: ignore
)[0]
barcode_position_checked = _check_path(
path=path, path_specific=barcode_position, pattern=patt_barcode, key=DbitKeys.BARCODE_POSITION # type: ignore
)[0]

In the actual implementation, _check_path() takes some arguments and return a tuple:

def _check_path(
path: Path,
pattern: Pattern[str],
key: DbitKeys,
path_specific: Optional[str | Path] = None,
optional_arg: bool = False,
) -> tuple[Union[Path, None], bool]:

Index [0] of the tuple can be a path or None.
Index [1] is always a bool that indicates that the path has been resolved. This is needed for downstream tasks with optional arguments (optional_arg = True).

Index [0] is always a path for mandatory arguments (optional_arg = False), like the .h5ad and barcode_list, and an Error is raised by _check_path() if the path is not resolved, or if multiple match are found when optional_arg = False.
We do not care about index [1] when optional_arg = False, so we do not save its value when checking for .h5ad and barcode_list.

Instead we care about index [1] when optional_arg = True, for example in the case of an optional image:

image_path_checked, hasimage = _check_path(
path=path, # type: ignore
path_specific=image_path,
pattern=patt_lowres,
key=DbitKeys.IMAGE_LOWRES_FILE,
optional_arg=True,
)

Here we unpack the tuple, and take in account both the values in the downstream tasks. When optional_arg = True and multiple match are found, a warning is printed and None is returned instead of a path, neglecting the optional argument while warning the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants