Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The SpatialData object is not self-contained #710

Open
Felicie-Giraud-Sauveur opened this issue Sep 18, 2024 · 2 comments
Open

The SpatialData object is not self-contained #710

Felicie-Giraud-Sauveur opened this issue Sep 18, 2024 · 2 comments

Comments

@Felicie-Giraud-Sauveur
Copy link

Hello,

I am contacting you about the “not self-contained” message when saving sdata to a new location.
Here is the example:

import spatialdata as sd
from spatialdata.datasets import blobs

sdata = blobs()
sdata.write("/Volumes/One Touch/MICS/data_HE2CellType/CT_DS/test_blobs.zarr")

sdata = sd.read_zarr("/Volumes/One Touch/MICS/data_HE2CellType/CT_DS/test_blobs.zarr")

sdata.table.obs['test'] = 'test'
sdata.table.obs.head()

sdata.write("/Volumes/One Touch/MICS/data_HE2CellType/CT_DS/test_2_blobs.zarr")

And it outputs :

INFO     The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from    
         locations outside [/Volumes/](https://file+.vscode-resource.vscode-cdn.net/Volumes/)One Touch/MICS/data_HE2CellType/CT_DS/test_2_blobs.zarr). Please see the       
         documentation of `is_self_contained()` to understand the implications of working with SpatialData objects 
         that are not self-contained.                                                                              
INFO     The Zarr backing store has been changed from [/Volumes/](https://file+.vscode-resource.vscode-cdn.net/Volumes/)One                                                 
         Touch/MICS/data_HE2CellType/CT_DS/test_blobs.zarr the new file path: [/Volumes/](https://file+.vscode-resource.vscode-cdn.net/Volumes/)One                         
         Touch/MICS/data_HE2CellType/CT_DS/test_2_blobs.zarr

I was wondering if in this case, if I completely delete test_blobs.zarr from my disk, can I lose information in test_2_blobs.zarr or have a problem afterwards? I am having trouble understanding the implications of being “not self-contained”.

Thanks in advance for your help!

@LucaMarconato
Copy link
Member

LucaMarconato commented Oct 3, 2024

Hi, the problem here can be investigated by printing the sdata object after write. Here is an example (see the bottom part):

SpatialData object, with associated Zarr store: /Users/macbook/temp/test_blobs.zarr2
├── Images
│     ├── 'blobs_image': DataArray[cyx] (3, 512, 512)
│     └── 'blobs_multiscale_image': DataTree[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│     ├── 'blobs_labels': DataArray[yx] (512, 512)
│     └── 'blobs_multiscale_labels': DataTree[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     └── 'blobs_points': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│     ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│     ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│     └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (26, 3)
with coordinate systems:
    ▸ 'global', with elements:
        blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)
with the following Dask-backed elements not being self-contained:
    ▸ blobs_image: /Users/macbook/temp/test_blobs.zarr/images/blobs_image
    ▸ blobs_multiscale_image: /Users/macbook/temp/test_blobs.zarr/images/blobs_multiscale_image
    ▸ blobs_labels: /Users/macbook/temp/test_blobs.zarr/labels/blobs_labels
    ▸ blobs_multiscale_labels: /Users/macbook/temp/test_blobs.zarr/labels/blobs_multiscale_labels
    ▸ blobs_points: /Users/macbook/temp/test_blobs.zarr/points/blobs_points/points.parquet/part.0.parquet

Basically, the images, labels and points that have been read still refer to the old Zarr location. To fix, you can simply read again the object from the new disk location.

  • I will try to think of a way to make this info message less obscure, maybe by asking the user to read agian the object if they want to have a self-contained object.

Please let me know if you have additional questions on this!

@Felicie-Giraud-Sauveur
Copy link
Author

Hi, that's very clear, thank you very much for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants