MuopDB is a vector database for machine learning. Currently, it supports:
- Index type: HNSW, IVF, SPANN, Multi-user SPANN. All on-disk with mmap.
- Quantization: product quantization
MuopDB supports multiple users by default. What that means is, each user will have its own vector index, within the same collection. The use-case for this is to build memory for LLMs. Think of it as:
- Each user will have its own memory
- Each user can still search a shared knowledge base.
All users' indices will be stored in a few files, reducing operational complexity.
- Build MuopDB. Refer to this instruction.
- Prepare necessary
data
andindices
directories. On Mac, you might want to change these directories since root directory is read-only, i.e:~/mnt/muopdb/
.
mkdir -p /mnt/muopdb/indices
mkdir -p /mnt/muopdb/data
- Start MuopDB
index_server
with the directories we just prepared using one of these methods:
# Start server locally. This is recommended for Mac.
cd target/release
RUST_LOG=info ./index_server --node-id 0 --index-config-path /mnt/muopdb/indices --index-data-path /mnt/muopdb/data --port 9002
# Start server with Docker. Only use this option on Linux.
docker-compose up --build
- Now you have an up and running MuopDB
index_server
.- You can send gRPC requests to this server (possibly with Postman).
- You can use Server Reflection in Postman - it will automatically detect the RPCs for MuopDB.
- Create collection
![Screenshot 2025-01-16 at 11 14 23 AM](https://private-user-images.githubusercontent.com/793701/404010073-cadf00c4-199f-4756-8446-7fb08de2b0c0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwOTgzOTcsIm5iZiI6MTczOTA5ODA5NywicGF0aCI6Ii83OTM3MDEvNDA0MDEwMDczLWNhZGYwMGM0LTE5OWYtNDc1Ni04NDQ2LTdmYjA4ZGUyYjBjMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOVQxMDQ4MTdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lYzA4NjkzOWVmNDdiMDgwNWFkMzNmYmRkMzRjMzM0Mjc4MWUzNzU0NzM1MWM0N2RhZmRhZDU2Y2JmYWQwNTcyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.GD55DpcD2FW6Ew5O8N1cdsndYvQJkjmkxLhMBG7TC-M)
{
"collection_name": "test-collection-2",
"num_features": 10
}
- Insert some data
![Screenshot 2025-01-17 at 9 45 20 AM](https://private-user-images.githubusercontent.com/793701/404395884-ec15a3b7-3a0a-44a3-a929-29ac9b7a47fc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwOTgzOTcsIm5iZiI6MTczOTA5ODA5NywicGF0aCI6Ii83OTM3MDEvNDA0Mzk1ODg0LWVjMTVhM2I3LTNhMGEtNDRhMy1hOTI5LTI5YWM5YjdhNDdmYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOVQxMDQ4MTdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xODE3NWM5MWVlZjVhNTJlYjc4ZDk4MzMxYTVlZTZjZjFiMmMxY2E2N2E5ZTMyNGMyN2RkM2MwMjViNzg3NGMwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.XRrcDAjzF3V4cpWQlDfSZzTrA2pIC50NUwQXCJURvBw)
{
"collection_name": "test-collection-2",
"high_ids": [
0
],
"low_ids": [
4
],
"high_user_ids": [
0
],
"low_user_ids": [
0
],
"vectors": [
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0
]
}
- Flush
![Screenshot 2025-01-16 at 10 51 42 AM](https://private-user-images.githubusercontent.com/793701/404003985-83f0d12c-afde-47f5-9238-eedf31a4dad5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwOTgzOTcsIm5iZiI6MTczOTA5ODA5NywicGF0aCI6Ii83OTM3MDEvNDA0MDAzOTg1LTgzZjBkMTJjLWFmZGUtNDdmNS05MjM4LWVlZGYzMWE0ZGFkNS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOVQxMDQ4MTdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01NmI5ZmIxNmI3YTEyYWIxNmJiNTBhNTI5Nzk5MjhhMWQ5Zjc3OTIwNTkxMjgzMzhlZDAwMjQ2YjAzZDdhMTZjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.MXAIsOunaxrlxyvR3LbLkzsqoO1ptYMprMlAURZTdOU)
{
"collection_name": "test-collection-2",
}
- Query
![Screenshot 2025-01-17 at 9 45 31 AM](https://private-user-images.githubusercontent.com/793701/404395918-5859453b-5423-4321-a032-337a0a061ac1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwOTgzOTcsIm5iZiI6MTczOTA5ODA5NywicGF0aCI6Ii83OTM3MDEvNDA0Mzk1OTE4LTU4NTk0NTNiLTU0MjMtNDMyMS1hMDMyLTMzN2EwYTA2MWFjMS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOVQxMDQ4MTdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0wYTEyNTViNDkyODJlMjk4YjU1MGZmNjJjMmVjYjE0NjM1ODY5NTYyNDQwMzJjOTA0NGQyY2QyZWFmZThmNGY3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.Nut9_NYu2znumZnICSV5GydWemJ3vRkgkZFnaTrNEXk)
{
"collection_name": "test-collection-2",
"ef_construction": 100,
"record_metrics": false,
"top_k": 1,
"high_user_ids": [0],
"low_user_ids": [0],
"vector": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 9.0, 9.0, 9.0]
}
- Query path
- Vector similarity search
- Hierarchical Navigable Small Worlds (HNSW)
- Product Quantization (PQ)
- Indexing path
- Support periodic offline indexing
- Database Management
- Doc-sharding & query fan-out with aggregator-leaf architecture
- In-memory & disk-based storage with mmap
- Query & Indexing
- Inverted File (IVF)
- Improve locality for HNSW
- SPANN
- Query
- Multiple index segments
- L2 distance
- Index
- Optimizing index build time
- Elias-Fano encoding for IVF
- Multi-user SPANN index
- Features
- Delete vector from collection
- Database Management
- Segment optimizer framework
- Write-ahead-log
- Segments merger
- Segments vacuum
- Install prerequisites:
- Rust: https://www.rust-lang.org/tools/install
- Make sure you're on nightly:
rustup toolchain install nightly
- Libraries
# MacOS (using Homebrew)
brew install hdf5 protobuf openblas
# Linux (Arch-based)
# On Arch Linux (and its derivatives, such as EndeavourOS, CachyOS):
sudo pacman -Syu hdf5 protobuf openblas
# Linux (Debian-based)
sudo apt-get install libhdf5-dev libprotobuf-dev libopenblas-dev
- Build from Source:
git clone https://github.com/hicder/muopdb.git
cd muopdb
# Build
cargo build --release
# Run tests
cargo test --release
This project is done with TechCare Coaching. I am mentoring mentees who made contributions to this project.