Skip to content

Commit

Permalink
update README, CHANGELOG
Browse files Browse the repository at this point in the history
  • Loading branch information
ryan-williams committed Jan 27, 2025
1 parent 1c83926 commit 887347d
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 28 deletions.
57 changes: 41 additions & 16 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,55 @@

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).
The format is based on [Keep a Changelog] and this project adheres to [Semantic Versioning].

## [Unreleased] - yyyy-mm-dd

Initial release of a PyTorch Dataset for SOMA Experiments. This is a port and
enhancement of code contributed by the Chan Zuckerberg Initiative Foundation
[CELLxGENE](https://cellxgene.cziscience.com/) project.
Initial release of a [PyTorch Dataset] for [SOMA] Experiments. This is a port and enhancement of code contributed by the Chan Zuckerberg Initiative Foundation [CELLxGENE] project.

This is not a one-for-one migration of the contributed code. Substantial changes have
been made to the package utility (e.g., multi-GPU support), improved API UX, performance
improvements, and more.
This is not a one-for-one migration of the contributed code. Substantial changes have been made to the package utility (e.g., multi-GPU support), improved API UX, performance improvements, and more.

### Added

- Initial project organization and other scaffolding [PR #4](https://github.com/single-cell-data/TileDB-SOMA-ML/pull/4)
- Simple, non-shuffling Dataset/DataPipe implementation [PR #6](https://github.com/single-cell-data/TileDB-SOMA-ML/pull/6)
- Add CI workflows [PR #7](https://github.com/single-cell-data/TileDB-SOMA-ML/pull/7)
- Add a DataLoader creation wrapper function [PR #8](https://github.com/single-cell-data/TileDB-SOMA-ML/pull/8)
- Add shuffling support [PR #9](https://github.com/single-cell-data/TileDB-SOMA-ML/pull/9)
- Add first draft of tutorial notebooks [PR #10](https://github.com/single-cell-data/TileDB-SOMA-ML/pull/10)
- Archive script used to populate the repo commit history [PR #11](https://github.com/single-cell-data/TileDB-SOMA-ML/pull/11)
- I/O buffer performance optimization [PR #13](https://github.com/single-cell-data/TileDB-SOMA-ML/pull/13)
#### 2025-02
- [#26]: rm `IterDataPipe`, consolidate into `ExperimentDataset`, add `docs`/`www` builds

#### 2024-12
- [#23]: Split `{test_,}pytorch.py` into a few files
- [#20]: Add `vfs.s3.no_sign_request` config
- [#19]: Support `somacore>=1.0.24` / `tiledbsoma>=1.15`

#### 2024-10
- [#13]: I/O buffer performance optimization
- [#11]: Archive script used to populate the repo commit history
- [#10]: Add first draft of tutorial notebooks
- [#9]: Add shuffling support
- [#8]: Add a DataLoader creation wrapper function
- [#7]: Add CI workflows
- [#6]: Simple, non-shuffling Dataset/DataPipe implementation
- [#4]: Initial project organization and other scaffolding

### Changed

### Fixed


[Keep a Changelog]: http://keepachangelog.com/
[Semantic Versioning]: http://semver.org/

[PyTorch Dataset]: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
[SOMA]: https://github.com/single-cell-data/SOMA
[CELLxGENE]: https://cellxgene.cziscience.com/

[#26]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/26
[#23]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/23
[#20]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/20
[#19]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/19
[#13]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/13
[#11]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/11
[#10]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/10
[#9]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/9
[#8]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/8
[#7]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/7
[#6]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/6
[#4]: https://github.com/single-cell-data/TileDB-SOMA-ML/pull/4
31 changes: 19 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,19 @@
# TileDB-SOMA-ML

# tiledbsoma_ml

A Python package containing ML tools for use with `tiledbsoma`.
A Python package containing ML tools for use with [TileDB-SOMA].

**NOTE:** this is a _pre-release_ package, and may be subject to breaking API changes prior to first release.

## Description

The package contains a prototype PyTorch `IterableDataset` for use with the
[`torch.utils.data.DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)
The package contains a prototype PyTorch [`IterableDataset`] for use with the
[`torch.utils.data.DataLoader`]
API. For a general introduction to PyTorch data loading,
[see this tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html).
[see this tutorial][torch data tutorial].
Additional information on the DataLoader/Dataset pattern
[can be found here](https://pytorch.org/docs/stable/data.html).
[can be found here][`torch.data`].

Defects and feature requests should be filed as a GitHub issue in this repo. Please include a reproducible
test case in all bug reports.
Defects and feature requests should be filed as a GitHub issue in this repo. Please include a reproducible test case in all bug reports.

## Getting Started

Expand All @@ -35,8 +33,8 @@ pip install -e .

### Documentation

Documentation is pending. Preliminary documentation can be found in API docstrings, and in
the [notebooks](notebooks) directory.
Documentation is pending. Preliminary documentation can be found at [single-cell-data.github.io/TileDB-SOMA-ML], and in
the [notebooks] directory.

## Builds

Expand All @@ -48,7 +46,7 @@ python -m build .

## Version History

See the [CHANGELOG.md](CHANGELOG.md) file.
See the [CHANGELOG.md] file.

## License

Expand All @@ -58,3 +56,12 @@ This project is licensed under the MIT License.

The SOMA team is grateful to the Chan Zuckerberg Initiative Foundation [CELLxGENE Census](https://cellxgene.cziscience.com)
team for their initial contribution.

[TileDB-SOMA]: https://github.com/single-cell-data/TileDB-SOMA
[`IterableDataset`]: https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset
[`torch.utils.data.DataLoader`]: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
[torch data tutorial]: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
[`torch.data`]: https://pytorch.org/docs/stable/data.html
[single-cell-data.github.io/TileDB-SOMA-ML]: https://single-cell-data.github.io/TileDB-SOMA-ML/
[notebooks]: notebooks
[CHANGELOG.md]: CHANGELOG.md

0 comments on commit 887347d

Please sign in to comment.