Skip to content

Commit

Permalink
Merge pull request #58 from janelia-cosem/groundtruth_conversion
Browse files Browse the repository at this point in the history
Groundtruth conversion
  • Loading branch information
d-v-b authored Sep 15, 2023
2 parents 34f6d67 + eab6446 commit a5e1b0c
Show file tree
Hide file tree
Showing 33 changed files with 1,095 additions and 254 deletions.
1 change: 1 addition & 0 deletions docs/api/cli/tiff2zarr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.cli.tiff2zarr
1 change: 1 addition & 0 deletions docs/api/cli/zarr2json.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.cli.zarr2json
1 change: 1 addition & 0 deletions docs/api/cli/zarr_scan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.cli.zarr_scan
1 change: 1 addition & 0 deletions docs/api/io/core.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.core
1 change: 1 addition & 0 deletions docs/api/io/dask.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.dask
1 change: 1 addition & 0 deletions docs/api/io/dat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.dat
1 change: 1 addition & 0 deletions docs/api/io/h5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.h5
1 change: 1 addition & 0 deletions docs/api/io/mrc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.mrc
1 change: 1 addition & 0 deletions docs/api/io/multiscale.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.multiscale
1 change: 1 addition & 0 deletions docs/api/io/neuroglancer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.neuroglancer
1 change: 1 addition & 0 deletions docs/api/io/server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.server
1 change: 1 addition & 0 deletions docs/api/io/tif.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.tif
1 change: 1 addition & 0 deletions docs/api/io/util.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.util
1 change: 1 addition & 0 deletions docs/api/io/xr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.xr
1 change: 1 addition & 0 deletions docs/api/io/zarr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.io.zarr
1 change: 1 addition & 0 deletions docs/api/metadata/groundtruth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.metadata.groundtruth
1 change: 1 addition & 0 deletions docs/api/metadata/neuroglancer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.metadata.neuroglancer
1 change: 1 addition & 0 deletions docs/api/metadata/transform.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: fibsem_tools.metadata.transform
3 changes: 3 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# FIB-SEM tools

This is a python library used by the [cellmap](https://www.janelia.org/project-team/cellmap) project team to store and manipulate large FIB-SEM images.
78 changes: 78 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
site_name: "fibsem-tools"
site_url: https://janelia-cellmap.github.io/fibsem-tools/
site_author: Davis Bennett
site_description: >-
Documentation for fibsem-tools
# Repository
repo_name: janelia-cellmap/pydantic-zarr
repo_url: https://github.com/janelia-cellmap/pydantic-zarr

# Copyright
copyright: Copyright © 2016 - 2023 HHMI / Janelia

theme:
name: material
palette:
# Palette toggle for light mode
- scheme: default
toggle:
icon: material/brightness-7
name: Switch to dark mode

# Palette toggle for dark mode
- scheme: slate
toggle:
icon: material/brightness-4
name: Switch to light mode

nav:
- About: index.md
- Usage: usage.md
- API:
- io:
- core: api/io/core.md
- dask: api/io/dask.md
- dat: api/io/dat.md
- hdf5: api/io/h5.md
- mrc: api/io/mrc.md
- multiscale: api/io/multiscale.md
- neuroglancer: api/io/neuroglancer.md
- tif: api/io/tif.md
- util: api/io/util.md
- xarray: api/io/xr.md
- zarr: api/io/zarr.md
- metadata:
- groundtruth: api/metadata/groundtruth.md
- transform: api/metadata/transform.md
- neuroglancer: api/metadata/neuroglancer.md
- cli:
- tiff2zarr: api/cli/tiff2zarr.md
- zarr_scan: api/cli/zarr_scan.md
- zarr2json: api/cli/zarr2json.md

plugins:
- mkdocstrings:
handlers:
python:
options:
docstring_style: numpy
members_order: source
separate_signature: true
filters: ["!^_"]
docstring_options:
ignore_init_summary: true
merge_init_into_class: true

markdown_extensions:
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
pygments_lang_class: true
- pymdownx.inlinehilite
- pymdownx.snippets
- pymdownx.superfences
- toc:
baselevel: 2
toc_depth: 4
permalink: "#"
697 changes: 502 additions & 195 deletions poetry.lock

Large diffs are not rendered by default.

15 changes: 12 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ xarray = ">=2022.03.0"
pydantic = "^1.8.2"
backoff = "^1.10.0"
s3fs = ">=2022.2.0"
xarray-ome-ngff = "^1.2.0"
xarray-ome-ngff = "^1.2.1"
pint = "^0.20.1"
xarray-multiscale = "^2.0.0"
tifffile = "^2023.2.28"
pydantic-ome-ngff = "^0.2.0"
pydantic-ome-ngff = "^0.3.0"
click = "^8.1.3"
dask = "^2023.3.2"
textual = "^0.16.0"
Expand All @@ -31,14 +31,23 @@ pydantic-zarr = "^0.5.0"


[tool.poetry.group.dev.dependencies]
pytest = "^6.1.2"
pytest = "^7.4.0"
pytest-cov = "^3.0.0"
pre-commit = "2.21.0"
mypy = "^1.1.1"
requests = "^2.28.2"


[tool.poetry.group.docs.dependencies]
mkdocs = "^1.4.3"
mkdocs-material = "^9.1.18"
mkdocstrings = {extras = ["python"], version = "^0.22.0"}
pytest-examples = "^0.0.9"

[tool.poetry.scripts]
tiff2zarr = 'fibsem_tools.cli.tiff2zarr:run'
zarr-scan = 'fibsem_tools.cli.zarr_scan:cli'
zarr2json = 'fibsem_tools.cli.zarr2json:cli'

[build-system]
requires = ["poetry-core>=1.0.0"]
Expand Down
Empty file.
5 changes: 4 additions & 1 deletion src/fibsem_tools/cli/tiff2zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,10 @@ class ArrayMoveStrict(BaseModel):
chunks: List[int]


def represents_int(s):
def represents_int(s: Any) -> bool:
"""
Returns `True` if the value can be parsed as an int, and `False` otherwise.
"""
try:
int(s)
except ValueError:
Expand Down
38 changes: 38 additions & 0 deletions src/fibsem_tools/cli/zarr2json.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
from typing import Union
import click
from pydantic_zarr import GroupSpec, ArraySpec
import zarr
from fibsem_tools import read
from fibsem_tools.io.util import split_by_suffix
from rich import print
import os


def parse_zarr_path(path: str) -> Union[ArraySpec, GroupSpec]:
"""
Resolve a path to a zarr group or zarr array, and parse that array or group to an
instance of ArraySpec or GroupSpec, respectively.
"""
pre, post, suffix = split_by_suffix(path, (".n5", ".zarr"))
obj = read(os.path.join(pre, post, suffix))
if isinstance(obj, zarr.Array):
result = ArraySpec.from_zarr(obj)
elif isinstance(obj, zarr.Group):
result = GroupSpec.from_zarr(obj)
else:
raise ValueError(f"Got an unparseable object: {type(obj)}")
return result


@click.command()
@click.argument("path", type=click.STRING)
def cli(path: str):
"""
Generate a JSON representation of the structure of a zarr array or group.
"""
result = parse_zarr_path(path)
print(result.json(indent=2))


if __name__ == "__main__":
cli()
65 changes: 64 additions & 1 deletion src/fibsem_tools/cli/zarr_scan.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,29 @@

@dataclass
class Missing:
"""
This class represents a chunk that was missing.
"""

variant = "missing"


@dataclass
class Invalid:
"""
This class represents a chunk that raised an exception upon loading / decompression.
"""

variant = "invalid"
exception: BaseException


@dataclass
class Valid:
"""
This class represents a chunk that was valid.
"""

variant = "valid"


Expand All @@ -32,6 +44,26 @@ class ChunkSetResults(dict[ChunkState, dict[str, Union[Missing, Valid, Invalid]]


def check_zarray(array: zarr.Array) -> dict[str, Union[Missing, Invalid, Valid]]:
"""
Check the state of each chunk of a zarr array. This function iterates over the
chunks of an array, attempts to access each chunk, and records whether that chunk
is valid (the chunk was fetched + decompressed without incident), invalid (
an exception was raised when loading + decompressing the chunk) or missing (the
chunk was not found in storage).
Parameters
----------
array: Zarr.Array
The zarr array to check.
Returns
-------
A dict with string keys, where each key is the location of a chunk in the key
space of the store object associated with the array, and each value is either a
Valid, Missing, or Invalid object.
"""
ckeys = tuple(get_chunk_keys(array))
results = {}
for ckey in track(ckeys, description="Checking chunks..."):
Expand Down Expand Up @@ -76,7 +108,38 @@ def check_zarray(array: zarr.Array) -> dict[str, Union[Missing, Invalid, Valid]]
default=False,
help="delete invalid chunks",
)
def cli(array_path, valid, missing, invalid, delete_invalid):
def cli(
array_path: str, valid: bool, missing: bool, invalid: bool, delete_invalid: bool
):
"""
Checks the chunks of a zarr array, prints the results as JSON, and optionally
deletes invalid chunks.
Parameters
----------
array_path: string
The path to the array.
valid: bool
Whether to report valid chunks. Default is False, which results in no output if
a chunk is valid.
missing: bool
Whether to report missing chunks. Default is False, which results in no output
if a chunk is missing.
invalid: bool
Whether to report invalid chunks. Default is True. An invalid chunk is defined
as one which raises an OSError upon loading + decompression. This definition may
change to include more exception types, but the basic idea is that a chunk is
invalid if it has been corrupted or cannot be read with the compressor as
defined in the array metadata.
delete_invalid: bool
Whether to delete invalid chunks. Default is False.
"""
start = time.time()
array = access(array_path, mode="r")
all_results = check_zarray(array)
Expand Down
13 changes: 13 additions & 0 deletions src/fibsem_tools/io/multiscale.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,19 @@ def multiscale_group(
"""
Generate multiscale metadata of the desired flavor from a list of DataArrays
Arguments
---------
arrays : Sequence[DataArray]
The arrays to store.
metadata_types : List[str]
The metadata flavor(s) to use.
array_paths : Sequence[str]
The path for each array in storage, relative to the parent group.
name : Optional[str]
The name for the multiscale group. Only relevant for metadata flavors that
support this field, e.g. ome-ngff
Returns
-------
Expand Down
3 changes: 2 additions & 1 deletion src/fibsem_tools/io/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,8 @@ def split_by_suffix(uri: PathLike, suffixes: Sequence[str]) -> Tuple[str, str, s
suffixed = [Path(part).suffix in suffixes for part in parts]

if not any(suffixed):
raise ValueError(f"No path elements found with the suffix(es) {suffixes}")
msg = f"No path elements found with the suffix(es) {suffixes} in {uri}"
raise ValueError(msg)

index = [idx for idx, val in enumerate(suffixed) if val][-1]
if index == (len(parts) - 1):
Expand Down
16 changes: 9 additions & 7 deletions src/fibsem_tools/io/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
import numpy as np
import xarray
import zarr
from zarr.storage import FSStore
from zarr.storage import FSStore, contains_array, contains_group, BaseStore
from dask import bag, delayed
from distributed import Client, Lock
from toolz import concat
Expand Down Expand Up @@ -214,15 +214,17 @@ def get_url(node: Union[zarr.Group, zarr.Array]):
cannot be resolved to a url"""
)

def get_store(path: PathLike) -> zarr.storage.BaseStore:
if isinstance(path, Path):
path = str(path)

return DEFAULT_ZARR_STORE(path)

def access_zarr(
store: PathLike, path: PathLike, **kwargs: Any
store: Union[PathLike, BaseStore], path: PathLike, **kwargs: Any
) -> zarr.Array | zarr.Group:
if isinstance(store, Path):
store = str(store)

if isinstance(store, str):
store = DEFAULT_ZARR_STORE(store)
if isinstance(store, (Path, str)):
store = get_store(store)

# set default dimension separator to /
if "shape" in kwargs and "dimension_separator" not in kwargs:
Expand Down
Loading

0 comments on commit a5e1b0c

Please sign in to comment.