Wrong file loaded when building new SpatialData #147

dawe · 2024-05-17T15:29:37Z

I am working with some data with many .h5ad files stored in the same directory. Whenever I create a new SpatialData object I specify the path to the correct AnnData counts file.
I noticed that the file that is loaded is typically the wrong one and the first .h5ad in the directory is loaded instead. I believe this is because the first element of a list is considered here

spatialdata-io/src/spatialdata_io/readers/dbit.py

Line 282 in e3c53b5

anndata_path_checked = _check_path(

I guess the same happens for the barcodes that are specified in the line below

spatialdata-io/src/spatialdata_io/readers/dbit.py

Line 285 in e3c53b5

barcode_position_checked = _check_path(

The only workaround right now is to have multiple paths for different files.

The text was updated successfully, but these errors were encountered:

LucaMarconato · 2024-05-25T12:54:03Z

From a quick look I think the reason is what you described. @lillux could you have a look at this please?

lillux · 2024-05-28T12:11:35Z

@dawe @LucaMarconato This has been solved in #139, that also describe how the parsing behavior has been changed. Now the reader gives priority the file of which the path has been specified, instead of prioritizing the pattern matching.
I have just tested the above scenario with spatialdata-io version 0.1.3.dev117+g3c03009, at commit 3c03009, and an Exception is raised when multiple matching files are found.
Please @dawe let me know if you still see the problem with the latest version of spatialdata-io.

Implementation details on path resolution behavior

This is in part to describe implementation details, in part to get opinions in improving it.
Just to clarify why we take index [0] of _check_path() output in the 2 calls below:

spatialdata-io/src/spatialdata_io/readers/dbit.py

Lines 282 to 287 in 3c03009

    
           anndata_path_checked = _check_path( 
        
               path=path, path_specific=anndata_path, pattern=patt_h5ad, key=DbitKeys.COUNTS_FILE  # type: ignore 
        
           )[0] 
        
           barcode_position_checked = _check_path( 
        
               path=path, path_specific=barcode_position, pattern=patt_barcode, key=DbitKeys.BARCODE_POSITION  # type: ignore 
        
           )[0]

In the actual implementation, _check_path() takes some arguments and return a tuple:

spatialdata-io/src/spatialdata_io/readers/dbit.py

Lines 26 to 32 in 3c03009

    
           def _check_path( 
        
               path: Path, 
        
               pattern: Pattern[str], 
        
               key: DbitKeys, 
        
               path_specific: Optional[str | Path] = None, 
        
               optional_arg: bool = False, 
        
           ) -> tuple[Union[Path, None], bool]:

Index [0] of the tuple can be a path or None.
Index [1] is always a bool that indicates that the path has been resolved. This is needed for downstream tasks with optional arguments (optional_arg = True).

Index [0] is always a path for mandatory arguments (optional_arg = False), like the .h5ad and barcode_list, and an Error is raised by _check_path() if the path is not resolved, or if multiple match are found when optional_arg = False.
We do not care about index [1] when optional_arg = False, so we do not save its value when checking for .h5ad and barcode_list.

Instead we care about index [1] when optional_arg = True, for example in the case of an optional image:

spatialdata-io/src/spatialdata_io/readers/dbit.py

Lines 288 to 294 in 3c03009

    
           image_path_checked, hasimage = _check_path( 
        
               path=path,  # type: ignore 
        
               path_specific=image_path, 
        
               pattern=patt_lowres, 
        
               key=DbitKeys.IMAGE_LOWRES_FILE, 
        
               optional_arg=True, 
        
           )

Here we unpack the tuple, and take in account both the values in the downstream tasks. When optional_arg = True and multiple match are found, a warning is printed and None is returned instead of a path, neglecting the optional argument while warning the user.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong file loaded when building new SpatialData #147

Wrong file loaded when building new SpatialData #147

dawe commented May 17, 2024

LucaMarconato commented May 25, 2024

lillux commented May 28, 2024 •

edited

Loading

Wrong file loaded when building new SpatialData #147

Wrong file loaded when building new SpatialData #147

Comments

dawe commented May 17, 2024

LucaMarconato commented May 25, 2024

lillux commented May 28, 2024 • edited Loading

Implementation details on path resolution behavior

lillux commented May 28, 2024 •

edited

Loading