Store anything anywhere. anystore
provides a high-level storage and retrieval interface for various supported store backends, such as redis
, sql
, file
, http
, cloud-storages and anything else supported by fsspec
.
Think of it as a key -> value
store, and anystore
acts as a cache backend. And when keys become filenames and values become byte blobs, anystore
becomes actually a file-like storage backend – but always with the same and interchangeable interface.
In our several data engineering projects we always wrote boilerplate code that handles the featureset of anystore
but not in a reusable way. This library shall be a stable foundation for data wrangling related python projects.
anystore -i ./local/foo.txt -o s3://mybucket/other.txt
echo "hello" | anystore -o sftp://user:password@host:/tmp/world.txt
anystore -i https://investigativedata.io > index.html
anystore --store sqlite:///db keys <prefix>
anystore --store redis://localhost put foo "bar"
anystore --store redis://localhost get foo # -> "bar"
from anystore import smart_read, smart_write
data = smart_read("s3://mybucket/data.txt")
smart_write(".local/data", data)
Use case: @anycache
is used for api view cache in ftmq-api
from anystore import get_store, anycache
cache = get_store("redis://localhost")
@anycache(store=cache, key_func=lambda q: f"api/list/{q.make_key()}", ttl=60)
def get_list_view(q: Query) -> Response:
result = ... # complex computing will be cached
return result
from anystore import get_store
source = get_store("https://example.org/documents/archive1") # directory listing
target = get_store("s3://mybucket/files", backend_config={"client_kwargs": {
"aws_access_key_id": "my-key",
"aws_secret_access_key": "***",
"endpoint_url": "https://s3.local"
}}) # can be configured via ENV as well
for path in source.iterate_keys():
# streaming copy:
with source.open(path) as i:
with target.open(path, "wb") as o:
i.write(o.read())
Find the docs at docs.investigraph.dev/lib/anystore
- ftmq, a query interface layer for followthemoney data
- investigraph, a framework to manage collections of structured followthemoney data
- ftmq-api, a simple api on top off
ftmq
built with FastApi - ftm-geocode, batch parse and geocode addresses from followthemoney entities
- leakrfc, a library to crawl, sync and move around document collections (in progress)
This package is using poetry for packaging and dependencies management, so first install it.
Clone this repository to a local destination.
Within the repo directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
anystore
uses pytest as the testing framework.
make test