Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groundtruth conversion #58

Merged
merged 12 commits into from
Sep 15, 2023
1 change: 1 addition & 0 deletions docs/api/cli/tiff2zarr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.cli.tiff2zarr
1 change: 1 addition & 0 deletions docs/api/cli/zarr2json.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.cli.zarr2json
1 change: 1 addition & 0 deletions docs/api/cli/zarr_scan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.cli.zarr_scan
1 change: 1 addition & 0 deletions docs/api/io/core.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.core
1 change: 1 addition & 0 deletions docs/api/io/dask.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.dask
1 change: 1 addition & 0 deletions docs/api/io/dat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.dat
1 change: 1 addition & 0 deletions docs/api/io/h5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.h5
1 change: 1 addition & 0 deletions docs/api/io/mrc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.mrc
1 change: 1 addition & 0 deletions docs/api/io/multiscale.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.multiscale
1 change: 1 addition & 0 deletions docs/api/io/neuroglancer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.neuroglancer
1 change: 1 addition & 0 deletions docs/api/io/server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.server
1 change: 1 addition & 0 deletions docs/api/io/tif.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.tif
1 change: 1 addition & 0 deletions docs/api/io/util.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.util
1 change: 1 addition & 0 deletions docs/api/io/xr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.xr
1 change: 1 addition & 0 deletions docs/api/io/zarr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.zarr
1 change: 1 addition & 0 deletions docs/api/metadata/groundtruth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.metadata.groundtruth
1 change: 1 addition & 0 deletions docs/api/metadata/neuroglancer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.metadata.neuroglancer
1 change: 1 addition & 0 deletions docs/api/metadata/transform.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.metadata.transform
3 changes: 3 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# FIB-SEM tools

This is a python library used by the [cellmap](https://www.janelia.org/project-team/cellmap) project team to store and manipulate large FIB-SEM images.
78 changes: 78 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
site_name: "fibsem-tools"
site_url: https://janelia-cellmap.github.io/fibsem-tools/
site_author: Davis Bennett
site_description: >-
Documentation for fibsem-tools

# Repository
repo_name: janelia-cellmap/pydantic-zarr
repo_url: https://github.com/janelia-cellmap/pydantic-zarr

# Copyright
copyright: Copyright © 2016 - 2023 HHMI / Janelia

theme:
name: material
palette:
# Palette toggle for light mode
- scheme: default
toggle:
icon: material/brightness-7
name: Switch to dark mode

# Palette toggle for dark mode
- scheme: slate
toggle:
icon: material/brightness-4
name: Switch to light mode

nav:
- About: index.md
- Usage: usage.md
- API:
- io:
- core: api/io/core.md
- dask: api/io/dask.md
- dat: api/io/dat.md
- hdf5: api/io/h5.md
- mrc: api/io/mrc.md
- multiscale: api/io/multiscale.md
- neuroglancer: api/io/neuroglancer.md
- tif: api/io/tif.md
- util: api/io/util.md
- xarray: api/io/xr.md
- zarr: api/io/zarr.md
- metadata:
- groundtruth: api/metadata/groundtruth.md
- transform: api/metadata/transform.md
- neuroglancer: api/metadata/neuroglancer.md
- cli:
- tiff2zarr: api/cli/tiff2zarr.md
- zarr_scan: api/cli/zarr_scan.md
- zarr2json: api/cli/zarr2json.md

plugins:
- mkdocstrings:
handlers:
python:
options:
docstring_style: numpy
members_order: source
separate_signature: true
filters: ["!^_"]
docstring_options:
ignore_init_summary: true
merge_init_into_class: true

markdown_extensions:
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
pygments_lang_class: true
- pymdownx.inlinehilite
- pymdownx.snippets
- pymdownx.superfences
- toc:
baselevel: 2
toc_depth: 4
permalink: "#"
697 changes: 502 additions & 195 deletions poetry.lock

Large diffs are not rendered by default.

15 changes: 12 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ xarray = ">=2022.03.0"
pydantic = "^1.8.2"
backoff = "^1.10.0"
s3fs = ">=2022.2.0"
xarray-ome-ngff = "^1.2.0"
xarray-ome-ngff = "^1.2.1"
pint = "^0.20.1"
xarray-multiscale = "^2.0.0"
tifffile = "^2023.2.28"
pydantic-ome-ngff = "^0.2.0"
pydantic-ome-ngff = "^0.3.0"
click = "^8.1.3"
dask = "^2023.3.2"
textual = "^0.16.0"
Expand All @@ -31,14 +31,23 @@ pydantic-zarr = "^0.5.0"


[tool.poetry.group.dev.dependencies]
pytest = "^6.1.2"
pytest = "^7.4.0"
pytest-cov = "^3.0.0"
pre-commit = "2.21.0"
mypy = "^1.1.1"
requests = "^2.28.2"


[tool.poetry.group.docs.dependencies]
mkdocs = "^1.4.3"
mkdocs-material = "^9.1.18"
mkdocstrings = {extras = ["python"], version = "^0.22.0"}
pytest-examples = "^0.0.9"

[tool.poetry.scripts]
tiff2zarr = 'fibsem_tools.cli.tiff2zarr:run'
zarr-scan = 'fibsem_tools.cli.zarr_scan:cli'
zarr2json = 'fibsem_tools.cli.zarr2json:cli'

[build-system]
requires = ["poetry-core>=1.0.0"]
Expand Down
Empty file.
5 changes: 4 additions & 1 deletion src/fibsem_tools/cli/tiff2zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,10 @@ class ArrayMoveStrict(BaseModel):
chunks: List[int]


def represents_int(s):
def represents_int(s: Any) -> bool:
"""
Returns `True` if the value can be parsed as an int, and `False` otherwise.
"""
try:
int(s)
except ValueError:
Expand Down
38 changes: 38 additions & 0 deletions src/fibsem_tools/cli/zarr2json.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
from typing import Union
import click
from pydantic_zarr import GroupSpec, ArraySpec
import zarr
from fibsem_tools import read
from fibsem_tools.io.util import split_by_suffix
from rich import print
import os


def parse_zarr_path(path: str) -> Union[ArraySpec, GroupSpec]:
"""
Resolve a path to a zarr group or zarr array, and parse that array or group to an
instance of ArraySpec or GroupSpec, respectively.
"""
pre, post, suffix = split_by_suffix(path, (".n5", ".zarr"))
obj = read(os.path.join(pre, post, suffix))
if isinstance(obj, zarr.Array):
result = ArraySpec.from_zarr(obj)
elif isinstance(obj, zarr.Group):
result = GroupSpec.from_zarr(obj)
else:
raise ValueError(f"Got an unparseable object: {type(obj)}")
return result


@click.command()
@click.argument("path", type=click.STRING)
def cli(path: str):
"""
Generate a JSON representation of the structure of a zarr array or group.
"""
result = parse_zarr_path(path)
print(result.json(indent=2))


if __name__ == "__main__":
cli()
65 changes: 64 additions & 1 deletion src/fibsem_tools/cli/zarr_scan.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,29 @@

@dataclass
class Missing:
"""
This class represents a chunk that was missing.
"""

variant = "missing"


@dataclass
class Invalid:
"""
This class represents a chunk that raised an exception upon loading / decompression.
"""

variant = "invalid"
exception: BaseException


@dataclass
class Valid:
"""
This class represents a chunk that was valid.
"""

variant = "valid"


Expand All @@ -32,6 +44,26 @@ class ChunkSetResults(dict[ChunkState, dict[str, Union[Missing, Valid, Invalid]]


def check_zarray(array: zarr.Array) -> dict[str, Union[Missing, Invalid, Valid]]:
"""
Check the state of each chunk of a zarr array. This function iterates over the
chunks of an array, attempts to access each chunk, and records whether that chunk
is valid (the chunk was fetched + decompressed without incident), invalid (
an exception was raised when loading + decompressing the chunk) or missing (the
chunk was not found in storage).

Parameters
----------

array: Zarr.Array
The zarr array to check.

Returns
-------

A dict with string keys, where each key is the location of a chunk in the key
space of the store object associated with the array, and each value is either a
Valid, Missing, or Invalid object.
"""
ckeys = tuple(get_chunk_keys(array))
results = {}
for ckey in track(ckeys, description="Checking chunks..."):
Expand Down Expand Up @@ -76,7 +108,38 @@ def check_zarray(array: zarr.Array) -> dict[str, Union[Missing, Invalid, Valid]]
default=False,
help="delete invalid chunks",
)
def cli(array_path, valid, missing, invalid, delete_invalid):
def cli(
array_path: str, valid: bool, missing: bool, invalid: bool, delete_invalid: bool
):
"""
Checks the chunks of a zarr array, prints the results as JSON, and optionally
deletes invalid chunks.

Parameters
----------

array_path: string
The path to the array.

valid: bool
Whether to report valid chunks. Default is False, which results in no output if
a chunk is valid.

missing: bool
Whether to report missing chunks. Default is False, which results in no output
if a chunk is missing.

invalid: bool
Whether to report invalid chunks. Default is True. An invalid chunk is defined
as one which raises an OSError upon loading + decompression. This definition may
change to include more exception types, but the basic idea is that a chunk is
invalid if it has been corrupted or cannot be read with the compressor as
defined in the array metadata.

delete_invalid: bool
Whether to delete invalid chunks. Default is False.

"""
start = time.time()
array = access(array_path, mode="r")
all_results = check_zarray(array)
Expand Down
13 changes: 13 additions & 0 deletions src/fibsem_tools/io/multiscale.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,19 @@ def multiscale_group(
"""
Generate multiscale metadata of the desired flavor from a list of DataArrays

Arguments
---------

arrays : Sequence[DataArray]
The arrays to store.
metadata_types : List[str]
The metadata flavor(s) to use.
array_paths : Sequence[str]
The path for each array in storage, relative to the parent group.
name : Optional[str]
The name for the multiscale group. Only relevant for metadata flavors that
support this field, e.g. ome-ngff

Returns
-------

Expand Down
3 changes: 2 additions & 1 deletion src/fibsem_tools/io/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,8 @@ def split_by_suffix(uri: PathLike, suffixes: Sequence[str]) -> Tuple[str, str, s
suffixed = [Path(part).suffix in suffixes for part in parts]

if not any(suffixed):
raise ValueError(f"No path elements found with the suffix(es) {suffixes}")
msg = f"No path elements found with the suffix(es) {suffixes} in {uri}"
raise ValueError(msg)

index = [idx for idx, val in enumerate(suffixed) if val][-1]
if index == (len(parts) - 1):
Expand Down
16 changes: 9 additions & 7 deletions src/fibsem_tools/io/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
import numpy as np
import xarray
import zarr
from zarr.storage import FSStore
from zarr.storage import FSStore, contains_array, contains_group, BaseStore
from dask import bag, delayed
from distributed import Client, Lock
from toolz import concat
Expand Down Expand Up @@ -214,15 +214,17 @@ def get_url(node: Union[zarr.Group, zarr.Array]):
cannot be resolved to a url"""
)

def get_store(path: PathLike) -> zarr.storage.BaseStore:
if isinstance(path, Path):
path = str(path)

return DEFAULT_ZARR_STORE(path)

def access_zarr(
store: PathLike, path: PathLike, **kwargs: Any
store: Union[PathLike, BaseStore], path: PathLike, **kwargs: Any
) -> zarr.Array | zarr.Group:
if isinstance(store, Path):
store = str(store)

if isinstance(store, str):
store = DEFAULT_ZARR_STORE(store)
if isinstance(store, (Path, str)):
store = get_store(store)

# set default dimension separator to /
if "shape" in kwargs and "dimension_separator" not in kwargs:
Expand Down
Loading
Loading