This workspace includes the crates that I developed to learn Rust and to support my research. It is named after the salts of the citric acid and the design and development process and experience will be documented on my personal blog at lemonfold.io.
A crate for models that can be trained by Expectation Maximization with a focus on Mixture Models. Its generic, flexible design that extensively uses the composition patterns is agnostic of external dependencies (i.e., the numerical library chosen) and users only need to implement a trait with few methods.
The crate is still heavily work in progress and currently, only clustering with Gaussian Mixture Models is in a rudimentary working state. Future development, may include classification and regression with Mixture models, support for more densities (Poisson, negative Binomial, van Mises, ..), SOM, $k$-means, HMM, or Kalman filter.
To run the algorithm itself, you currently need to call a unit test:
RUST_BACKTRACE=full cargo test -F ndarray --package potpourri --lib -- mixture::tests::em_step --exact --nocapture
Currently, there are only models implemented for the NDArrary backend but clusters and GPGPU is planned.
A crate for self-organizing neural networks written in Rust. The crate is still work in progress and not suited for use in other projects yet.
This is an experiment on writing a Python extension for numerical computation / machine learning in Rust with Python bindings. The aim of this project is to create an implementation of the self-organizing maps algorithm with support for parallelization and eventually, GPGPU. The algorithm has been chosen for its efficiency and simplicity (both in terms of implementation and comprehensibility). Plus, it's my favorite and I'm doing active research with it.
- Numerical python extensions written in Rust
- Seamless integration via PyO3 and rust-numpy
- High-performance via ndarray/rayon, tch-rs (PyTorch bindings)
- Dependency management / publishing with Poetry and Maturin
- Monorepo with [Cargo workspaces]()
- Technical documentation / GitHub page with Sphinx and MyST
- Eventually, distributed computation (e.g., actor model or timely dataflow)
- Hopefully, GUI application using Tauri, Angular and ThreeJS
Feel free use as a basis for your own projects (MIT licensed).
For the monorepo, there is a top-level python project defined using poetry. This project is
mainly for dependency locking, integration testing and the overall project's documentation.
The python module
build with Maturin is in a nested folder and it's project file is also created with poetry.
This is necessary as the top-level project can only add local packages that are either
created with Poetry or contain a setup.py
file.
While this setup supports a monorepo setup (and should support integration
testing on the imported local packages), there is another caveat. Building the python extension with
pip
creates a temp directory which does not copy the local rust dependencies. Newer versions of pip
build within the tree, so this limitation can be avoided easily. Use the following commands to
setup the environment.
⚠️ We want to avoid creating a virtual environment for the nested packages. Work in a top-level shell instead.
cd self-organization
pyenv shell 3.10.2
poetry env use 3.10.2
poetry run pip install -U pip # only required until pip>=22.0 becomes the default
poetry install
# to debug / develop the extension
poetry shell
cd pysom
maturin develop
To install the virtual environment as a kernel for jupyter:
python -m ipykernel install --user --name py310_selforganization --display-name "Python3.10 (self-organization)"