Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raftstore v2 #389

Merged
merged 18 commits into from
Oct 8, 2024
Merged

Raftstore v2 #389

merged 18 commits into from
Oct 8, 2024

Conversation

v01dstar
Copy link

@v01dstar v01dstar commented Oct 2, 2024

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has signed the dco. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 2, 2024
@v01dstar v01dstar force-pushed the raftstore-v2 branch 2 times, most recently from 4c52cc0 to d9a0ac2 Compare October 2, 2024 07:21
5kbpers and others added 16 commits October 2, 2024 15:24
* return sequence number of writes

Signed-off-by: 5kbpers <[email protected]>

* fix compile error

Signed-off-by: 5kbpers <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
* expose seqno for multi-batch-write

Signed-off-by: 5kbpers <[email protected]>

* format

Signed-off-by: 5kbpers <[email protected]>

Signed-off-by: 5kbpers <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
A callback that is called after write succeeds and changes have been applied to memtable.

Titan change: tikv/titan#270

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
* fix bug of using post write callback with empty batch

Signed-off-by: tabokie <[email protected]>

* fix nullptr

Signed-off-by: tabokie <[email protected]>

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Add support to merge multiple DBs that have no overlapping data (tombstone included).

Memtables are frozen and then referenced by the target DB. Table files are hard linked
with new file numbers into the target DB. After merge, the sequence numbers of memtables
and L0 files will appear out-of-order compared to a single DB. But for any given user
key, the ordering still holds because there will only be one unique source DB that
contains the key and the source DB's ordering is inherited by the target DB.

If source and target instances share the same block cache, target instance will be able
to reuse cache. This is done by cloning the table readers of source instances to the
target instance. Because the cache key is stored in table reader, reads after the merge
can still retrieve source instances' blocks via old cache key.

Under release build, it takes 8ms to merge a 25GB DB (500 files) into another.

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
…nFlushBegin event (tikv#300)

* add largest seqno of memtable

Signed-off-by: 5kbpers <[email protected]>

* add test

Signed-off-by: 5kbpers <[email protected]>

* address comment

Signed-off-by: 5kbpers <[email protected]>

* address comment

Signed-off-by: 5kbpers <[email protected]>

* format

Signed-off-by: 5kbpers <[email protected]>

* memtable info

Signed-off-by: 5kbpers <[email protected]>

Signed-off-by: 5kbpers <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Summary:

Modify existing write buffer manager to support multiple instances.

Previously, a flush is triggered before user writes if `ShouldFlush()` returns
true. But in the multiple-instance context, this will cause flushing for all
DBs that are undergoing writes.

In this patch, column families are registered to a shared linked list inside
the write buffer manager. When flush condition is triggered, the column family
with highest score from this list will be chosen and flushed. The score can be
either size or age.

The flush condition calculation is also changed to exclude immutable memtables.
This is because RocksDB schedules flush every time an immutable memtable is
generated. They will eventually be evicted from memory given the flush
bandwidth doesn't bottleneck.

Test plan:

- Unit test cases
  - Trigger flush of largest/oldest memtable in another DB
  - Resolve flush condition by destroy CF/DB
  - Dynamically change flush threshold
- Manual test insert, update, read-write workload, [script](https://gist.github.com/tabokie/d38d27dc3843946c7813ab7bafd0f753).

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
* add toggle

Signed-off-by: tabokie <[email protected]>

* protect underflow

Signed-off-by: tabokie <[email protected]>

* fix build

Signed-off-by: tabokie <[email protected]>

* remove deadline and add penalty for l0 files

Signed-off-by: tabokie <[email protected]>

* fix build

Signed-off-by: tabokie <[email protected]>

* consider compaction trigger

Signed-off-by: tabokie <[email protected]>

---------

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
* hook delete dir in encrypted env

Signed-off-by: tabokie <[email protected]>

* add a comment

Signed-off-by: tabokie <[email protected]>

---------

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Also added a new options to detect whether manual compaction is disabled. In practice we use this to avoid blocking on flushing a tablet that will be destroyed shortly after.

---------

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
…heckpoint (tikv#338)

* fix renaming encrypted directory

Signed-off-by: tabokie <[email protected]>

* fix build

Signed-off-by: tabokie <[email protected]>

* patch test manager

Signed-off-by: tabokie <[email protected]>

* fix build

Signed-off-by: tabokie <[email protected]>

* check compaction paused during checkpoint

Signed-off-by: tabokie <[email protected]>

* add comment

Signed-off-by: tabokie <[email protected]>

---------

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
And delay the buffer initialization of writable file to first actual write.

---------

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
@v01dstar v01dstar force-pushed the raftstore-v2 branch 3 times, most recently from 383bdfe to 72f0962 Compare October 4, 2024 06:39
@v01dstar
Copy link
Author

v01dstar commented Oct 4, 2024

/run-all-tests

@v01dstar
Copy link
Author

v01dstar commented Oct 5, 2024

/run-all-tests

Signed-off-by: Yang Zhang <[email protected]>
@v01dstar v01dstar marked this pull request as ready for review October 5, 2024 06:07
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2024
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Oct 8, 2024
Copy link

ti-chi-bot bot commented Oct 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Connor1996, LykxSassinator

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Connor1996,LykxSassinator]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Oct 8, 2024
Copy link

ti-chi-bot bot commented Oct 8, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-10-08 02:09:31.944648013 +0000 UTC m=+925527.364861025: ☑️ agreed by Connor1996.
  • 2024-10-08 02:18:16.381265941 +0000 UTC m=+926051.801478953: ☑️ agreed by LykxSassinator.

@ti-chi-bot ti-chi-bot bot merged commit 405de0e into tikv:8.10.tikv Oct 8, 2024
5 checks passed
@v01dstar v01dstar deleted the raftstore-v2 branch October 8, 2024 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants