Add a run-time start-up switch -s/--store-backend to choose the data store backend #251

c-blake · 2022-09-15T12:30:30Z

With current default chunk sizes, a d:release build & big upload, SQLite uses 20% more space & 30% more time than files on a Linux tmpfs/Intel. So, SQLite may not be the hoped for performance win (and there may be other workloads where the perf delta is much worse).

More generally, different host devices/FSes will always have different ideal back-ends. So, some kind of choice makes baseline sense. (Indeed, any user choice raises the idea of a translation engine to re-format a repo (FS <-> SQLite) when re-hosting to a new device, but that is unlikely to be a near term concern.)

If the idea of a DataStore is switching backends behind its abstraction, this also lets us exercise said switchability. It also lets us more easily test any "above DataStore" quota abstraction -- e.g., some generic pre-space-allocating solution with multiple backends.

(I realize this needs some hook into our testing framework which I am still figuring out, but I wanted to start the PR for any design feedback.)

EDIT: Related #179 and #160 (review)

store backend. Right now there is no detection of the format of an already made repo. We should also add that, lest a change in a config "hide" data.

michaelsbradleyjr · 2022-09-15T12:36:24Z

Nice work! I'll try it out with a local build in a little while.

Bulat-Ziganshin · 2022-09-15T12:58:06Z

can you please add 3rd variant - "ram" that just assigns "localStore = cache"? it will useful for benchmarking the network part, without the slowdown caused by the disk storage.

please note that later you will have to add commit for README since it documents all switches

c-blake · 2022-09-15T13:09:56Z

I think this is a great idea (I did a RAM fs on Linux myself), but should probably be a new DataStore fork maybe with just a Nim Table and then be "routed" to that backend via a new CLI enum value..maybe named "volatile" or "non-persistent" or something.

Bulat-Ziganshin · 2022-09-15T13:18:27Z

For me, SQLite backend is still the best choice since it doesn't use too many files and the SQLite engine ensures and can check the integrity of DB. I feel it is a bit more reliable. Also, FS backend has a small problem on Windows (#237).

For me, 20-30% performance loss (with large SQLite pagesize) is manageable for the initial production version. In the long term, we may explore other DB engines, but first we need to establish requirements for the blockstore (i.e. implement other scenarios of its use, in particular usage quotas #159).

Bulat-Ziganshin · 2022-09-15T13:21:09Z

@c-blake no, just "localStore = cache" will do it. The CacheStore by itself implements the BlockStore API.

Bulat-Ziganshin · 2022-09-15T13:21:59Z

you just need to add 3rd case:

    of RAM:
      notice "Using RAM data store"
      cache

c-blake · 2022-09-15T15:10:26Z

This of RAM: cache branch approach does work, but it is really fragile since the user must provide a big enough --cache size parameter.

Linux has tmpfs (or just /dev/shm) and Mac OSX has ramfs. This suggests it is not so hard on Windows. Maybe there is an easier way? This RAM disk way works with any back end and also allows one to kill & restart the server and so on. It seems a bit nicer overall. Others should maybe weigh in here in light of the fragility.

It sounds like we will still would want this CLI parameter (as well as a --quota) but that maybe Dmitriy's new Repo abstraction will take the StoreKind as a parameter to dispatch to various backends instead of my let..case. I agree that higher layer is the natural next step to enforce a quota (as mentioned in the top of this PR).

Bulat-Ziganshin · 2022-09-15T15:21:29Z

placing data on RAM disk still keeps an extra layer of code that may affect performance
it also requires extra setup

c-blake · 2022-09-15T15:48:52Z

Trade-offs. More set-up (|not if the expert user has it ready to go - on Linux /dev/shm is always already there), maybe a tiny amount of overhead (but likely small compared to all else we do), but extra fragility/size anticipation & bad failure modes (hanging on dload) if wrong.

I pushed commit, but I feel at least 1 other person should vote before I update the README with the new option and its valid values. Or -s=cached could also stay a "secret" value (i.e. only document FS & SQLite).

…ls).

dryajov · 2022-09-29T17:23:51Z

Thanks @c-blake, I'll abstain from merging this right now, but will integrate it with my work on repo limits. The reason is that there might be changes around the current stores that might affect this. Overall, great work tho!

dryajov · 2023-02-15T20:03:40Z

I think this flag is still useful for experimentation purposes and we could expose either the FS datastore or the SQLite datastore, this would helping this backends.

c-blake · 2023-02-15T20:08:41Z

Well, mostly I did not create a new branch for those commits out of an excess of optimism, but rather did them on my main fork. I was just cleaning up that old fork and making a new one to add backports & units stuff and other new client things.

I can / will revive the patch and PR here under a more clean git setup later and link back here which may help as we do more work on data store things. It probably needs re-basing care anyway. (I actually was not sure how all that was going to show up in github until I actually did it.. I was aware there are ways to move commits to a branch, but wasn't confident that would all have worked out..)

A) renaming to --repo-kind which makes more sense in the current code and B) without a working cacheStore (yet) Tagging related codex-storage#357

A) renaming to --repo-kind which makes more sense in the current code and B) without a working cacheStore (yet) Tagging related #357

Add a run-time start-up switch -s/--store-backend to choose the data

8d1dd0e

store backend. Right now there is no detection of the format of an already made repo. We should also add that, lest a change in a config "hide" data.

Bulat-Ziganshin assigned c-blake Sep 15, 2022

Address #251 (comment)

e98e8ed

c-blake added 2 commits September 17, 2022 11:34

Add help row for -s/--store-backend to docs.

cf64617

Update to be more user-friendly (listed user-format names for enum va…

c61d8c2

…ls).

c-blake closed this by deleting the head repository Feb 15, 2023

c-blake added a commit to c-blake/nim-codex that referenced this pull request Mar 10, 2023

Resuscitate codex-storage#251 except for

48429ee

A) renaming to --repo-kind which makes more sense in the current code and B) without a working cacheStore (yet) Tagging related codex-storage#357

dryajov pushed a commit that referenced this pull request Mar 14, 2023

Resuscitate #251 except for (#358)

7cdd4f7

A) renaming to --repo-kind which makes more sense in the current code and B) without a working cacheStore (yet) Tagging related #357

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a run-time start-up switch -s/--store-backend to choose the data store backend #251

Add a run-time start-up switch -s/--store-backend to choose the data store backend #251

c-blake commented Sep 15, 2022 •

edited

Loading

michaelsbradleyjr commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

c-blake commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

c-blake commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

c-blake commented Sep 15, 2022

dryajov commented Sep 29, 2022

dryajov commented Feb 15, 2023

c-blake commented Feb 15, 2023

Add a run-time start-up switch -s/--store-backend to choose the data store backend #251

Add a run-time start-up switch -s/--store-backend to choose the data store backend #251

Conversation

c-blake commented Sep 15, 2022 • edited Loading

michaelsbradleyjr commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

c-blake commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

c-blake commented Sep 15, 2022

Bulat-Ziganshin commented Sep 15, 2022

c-blake commented Sep 15, 2022

dryajov commented Sep 29, 2022

dryajov commented Feb 15, 2023

c-blake commented Feb 15, 2023

c-blake commented Sep 15, 2022 •

edited

Loading