go/store/{nbs,types}: GC: Move the reference walk from types to nbs. #8752

reltuk · 2025-01-15T18:28:47Z

Make the ChunkStore itself responsible for the reference walk, being given handles for walking references and excluding chunks as part of the GC process. This is an incremental step towards adding dependencies on read chunks during the GC process. The ChunkStore can better distinguish whether the read is part of the GC process itself or whether it came from the application layer. It also allows better management of cache impact and the potential for better memory usage.

This transformation gets rid of parallel reference walking and some manual batching which was present in the ValueStore implementation of reference walking. The parallel reference walking was necessary for reasonable performance in format LD_1, but it's actually not necessary in DOLT. For some use cases it's a slight win, but the simplification involved in getting rid of it is worth it for now.

Make the ChunkStore itself responsible for the reference walk, being given handles for walking references and excluding chunks as part of the GC process. This is an incremental step towards adding dependencies on read chunks during the GC process. The ChunkStore can better distinguish whether the read is part of the GC process itself or whether it came from the application layer. It also allows better management of cache impact and the potential for better memory usage. This transformation gets rid of parallel reference walking and some manual batching which was present in the ValueStore implementation of reference walking. The parallel reference walking was necessary for reasonable performance in format __LD_1__, but it's actually not necessary in __DOLT__. For some use cases it's a slight win, but the simplification involved in getting rid of it is worth it for now.

coffeegoddd · 2025-01-15T19:03:15Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`f5a69f5`	ok	5937457

version	total_tests
`f5a69f5`	5937457

correctness_percentage
100.0

max-hoffman

LGTM, the simplification is nice just a few related questions I noticed while getting up to speed

max-hoffman · 2025-01-15T19:01:43Z

go/store/nbs/store.go

+		return nil, fmt.Errorf("NBS does not support copying garbage collection")
+	}
+
+	gcc, err := newGarbageCollectionCopier()


somewhat unrelated, but if this embedded tfp their relationship might be clearer

Good suggestion! I'll take a pass and potentially send out a separate PR :)

max-hoffman · 2025-01-15T19:03:06Z

go/store/nbs/store.go

+	src      NBSCompressedChunkStore
+	dest     *NomsBlockStore
+	getAddrs chunks.GetAddrsCurry
+	filter   chunks.HasManyFunc


i'm hazy on what filter does, when would we discard hashes?

Great question. It's used for generational GC. So, when we collect newgen -> oldgen, we're walking refs and we want to stop the walk anytime we walk into the old gen. Then, after those chunks are in the old gen, when we collect newgen -> newgen, we want to stop the walk once again anytime we walk into the old gen.

max-hoffman · 2025-01-15T19:08:27Z

go/store/nbs/store_test.go

@@ -334,14 +334,18 @@ func TestNBSCopyGC(t *testing.T) {
 	require.NoError(t, err)
 	require.True(t, ok)

-	keepChan := make(chan []hash.Hash, numChunks)
+	require.NoError(t, st.BeginGC(nil))


is our GC testing this sparse? or do we have tests at other interface levels somewhere else

We have some tests further up at doltdb, and then we have bats tests and go-sql-server-driver tests. The coverage isn't fantastic currently though.

coffeegoddd added the correctness_approved label Jan 15, 2025

max-hoffman approved these changes Jan 15, 2025

View reviewed changes

reltuk merged commit 47d9ff7 into main Jan 16, 2025
34 of 35 checks passed

BrewTestBot mentioned this pull request Jan 18, 2025

dolt 1.47.1 Homebrew/homebrew-core#204689

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

go/store/{nbs,types}: GC: Move the reference walk from types to nbs. #8752

go/store/{nbs,types}: GC: Move the reference walk from types to nbs. #8752

reltuk commented Jan 15, 2025

coffeegoddd commented Jan 15, 2025

max-hoffman left a comment

max-hoffman Jan 15, 2025

reltuk Jan 16, 2025

max-hoffman Jan 15, 2025

reltuk Jan 16, 2025

max-hoffman Jan 15, 2025

reltuk Jan 16, 2025

go/store/{nbs,types}: GC: Move the reference walk from types to nbs. #8752

go/store/{nbs,types}: GC: Move the reference walk from types to nbs. #8752

Conversation

reltuk commented Jan 15, 2025

coffeegoddd commented Jan 15, 2025

max-hoffman left a comment

Choose a reason for hiding this comment

max-hoffman Jan 15, 2025

Choose a reason for hiding this comment

reltuk Jan 16, 2025

Choose a reason for hiding this comment

max-hoffman Jan 15, 2025

Choose a reason for hiding this comment

reltuk Jan 16, 2025

Choose a reason for hiding this comment

max-hoffman Jan 15, 2025

Choose a reason for hiding this comment

reltuk Jan 16, 2025

Choose a reason for hiding this comment