Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/store/{nbs,types}: GC: Move the reference walk from types to nbs. #8752

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

reltuk
Copy link
Contributor

@reltuk reltuk commented Jan 15, 2025

Make the ChunkStore itself responsible for the reference walk, being given handles for walking references and excluding chunks as part of the GC process. This is an incremental step towards adding dependencies on read chunks during the GC process. The ChunkStore can better distinguish whether the read is part of the GC process itself or whether it came from the application layer. It also allows better management of cache impact and the potential for better memory usage.

This transformation gets rid of parallel reference walking and some manual batching which was present in the ValueStore implementation of reference walking. The parallel reference walking was necessary for reasonable performance in format LD_1, but it's actually not necessary in DOLT. For some use cases it's a slight win, but the simplification involved in getting rid of it is worth it for now.

Make the ChunkStore itself responsible for the reference walk, being given handles for walking references and excluding chunks as part of the GC process. This is an incremental step towards adding dependencies on read chunks during the GC process. The ChunkStore can better distinguish whether the read is part of the GC process itself or whether it came from the application layer. It also allows better management of cache impact and the potential for better memory usage.

This transformation gets rid of parallel reference walking and some manual batching which was present in the ValueStore implementation of reference walking. The parallel reference walking was necessary for reasonable performance in format __LD_1__, but it's actually not necessary in __DOLT__. For some use cases it's a slight win, but the simplification involved in getting rid of it is worth it for now.
@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
f5a69f5 ok 5937457
version total_tests
f5a69f5 5937457
correctness_percentage
100.0

Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the simplification is nice just a few related questions I noticed while getting up to speed

return nil, fmt.Errorf("NBS does not support copying garbage collection")
}

gcc, err := newGarbageCollectionCopier()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat unrelated, but if this embedded tfp their relationship might be clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I'll take a pass and potentially send out a separate PR :)

src NBSCompressedChunkStore
dest *NomsBlockStore
getAddrs chunks.GetAddrsCurry
filter chunks.HasManyFunc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm hazy on what filter does, when would we discard hashes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. It's used for generational GC. So, when we collect newgen -> oldgen, we're walking refs and we want to stop the walk anytime we walk into the old gen. Then, after those chunks are in the old gen, when we collect newgen -> newgen, we want to stop the walk once again anytime we walk into the old gen.

@@ -334,14 +334,18 @@ func TestNBSCopyGC(t *testing.T) {
require.NoError(t, err)
require.True(t, ok)

keepChan := make(chan []hash.Hash, numChunks)
require.NoError(t, st.BeginGC(nil))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is our GC testing this sparse? or do we have tests at other interface levels somewhere else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some tests further up at doltdb, and then we have bats tests and go-sql-server-driver tests. The coverage isn't fantastic currently though.

@reltuk reltuk merged commit 47d9ff7 into main Jan 16, 2025
34 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants