Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a run-time start-up switch -s/--store-backend to choose the data store backend #251

Closed
wants to merge 4 commits into from

Conversation

c-blake
Copy link
Contributor

@c-blake c-blake commented Sep 15, 2022

With current default chunk sizes, a d:release build & big upload, SQLite uses 20% more space & 30% more time than files on a Linux tmpfs/Intel. So, SQLite may not be the hoped for performance win (and there may be other workloads where the perf delta is much worse).

More generally, different host devices/FSes will always have different ideal back-ends. So, some kind of choice makes baseline sense. (Indeed, any user choice raises the idea of a translation engine to re-format a repo (FS <-> SQLite) when re-hosting to a new device, but that is unlikely to be a near term concern.)

If the idea of a DataStore is switching backends behind its abstraction, this also lets us exercise said switchability. It also lets us more easily test any "above DataStore" quota abstraction -- e.g., some generic pre-space-allocating solution with multiple backends.

(I realize this needs some hook into our testing framework which I am still figuring out, but I wanted to start the PR for any design feedback.)

EDIT: Related #179 and #160 (review)

store backend.

Right now there is no detection of the format of an already made repo.
We should also add that, lest a change in a config "hide" data.
@michaelsbradleyjr
Copy link
Contributor

Nice work! I'll try it out with a local build in a little while.

@Bulat-Ziganshin
Copy link
Contributor

can you please add 3rd variant - "ram" that just assigns "localStore = cache"? it will useful for benchmarking the network part, without the slowdown caused by the disk storage.

please note that later you will have to add commit for README since it documents all switches

@c-blake
Copy link
Contributor Author

c-blake commented Sep 15, 2022

I think this is a great idea (I did a RAM fs on Linux myself), but should probably be a new DataStore fork maybe with just a Nim Table and then be "routed" to that backend via a new CLI enum value..maybe named "volatile" or "non-persistent" or something.

@Bulat-Ziganshin
Copy link
Contributor

For me, SQLite backend is still the best choice since it doesn't use too many files and the SQLite engine ensures and can check the integrity of DB. I feel it is a bit more reliable. Also, FS backend has a small problem on Windows (#237).

For me, 20-30% performance loss (with large SQLite pagesize) is manageable for the initial production version. In the long term, we may explore other DB engines, but first we need to establish requirements for the blockstore (i.e. implement other scenarios of its use, in particular usage quotas #159).

@Bulat-Ziganshin
Copy link
Contributor

@c-blake no, just "localStore = cache" will do it. The CacheStore by itself implements the BlockStore API.

@Bulat-Ziganshin
Copy link
Contributor

you just need to add 3rd case:

    of RAM:
      notice "Using RAM data store"
      cache

@c-blake
Copy link
Contributor Author

c-blake commented Sep 15, 2022

This of RAM: cache branch approach does work, but it is really fragile since the user must provide a big enough --cache size parameter.

Linux has tmpfs (or just /dev/shm) and Mac OSX has ramfs. This suggests it is not so hard on Windows. Maybe there is an easier way? This RAM disk way works with any back end and also allows one to kill & restart the server and so on. It seems a bit nicer overall. Others should maybe weigh in here in light of the fragility.

It sounds like we will still would want this CLI parameter (as well as a --quota) but that maybe Dmitriy's new Repo abstraction will take the StoreKind as a parameter to dispatch to various backends instead of my let..case. I agree that higher layer is the natural next step to enforce a quota (as mentioned in the top of this PR).

@Bulat-Ziganshin
Copy link
Contributor

  1. placing data on RAM disk still keeps an extra layer of code that may affect performance
  2. it also requires extra setup

@c-blake
Copy link
Contributor Author

c-blake commented Sep 15, 2022

Trade-offs. More set-up (|not if the expert user has it ready to go - on Linux /dev/shm is always already there), maybe a tiny amount of overhead (but likely small compared to all else we do), but extra fragility/size anticipation & bad failure modes (hanging on dload) if wrong.

I pushed commit, but I feel at least 1 other person should vote before I update the README with the new option and its valid values. Or -s=cached could also stay a "secret" value (i.e. only document FS & SQLite).

@dryajov
Copy link
Contributor

dryajov commented Sep 29, 2022

Thanks @c-blake, I'll abstain from merging this right now, but will integrate it with my work on repo limits. The reason is that there might be changes around the current stores that might affect this. Overall, great work tho!

@c-blake c-blake closed this by deleting the head repository Feb 15, 2023
@dryajov
Copy link
Contributor

dryajov commented Feb 15, 2023

I think this flag is still useful for experimentation purposes and we could expose either the FS datastore or the SQLite datastore, this would helping this backends.

@c-blake
Copy link
Contributor Author

c-blake commented Feb 15, 2023

Well, mostly I did not create a new branch for those commits out of an excess of optimism, but rather did them on my main fork. I was just cleaning up that old fork and making a new one to add backports & units stuff and other new client things.

I can / will revive the patch and PR here under a more clean git setup later and link back here which may help as we do more work on data store things. It probably needs re-basing care anyway. (I actually was not sure how all that was going to show up in github until I actually did it.. I was aware there are ways to move commits to a branch, but wasn't confident that would all have worked out..)

c-blake added a commit to c-blake/nim-codex that referenced this pull request Mar 10, 2023
    A) renaming to --repo-kind which makes more sense in the current code
and B) without a working cacheStore (yet)

Tagging related codex-storage#357
dryajov pushed a commit that referenced this pull request Mar 14, 2023
A) renaming to --repo-kind which makes more sense in the current code
and B) without a working cacheStore (yet)

Tagging related #357
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants