Stats auto refresh #7424

max-hoffman · 2024-01-27T01:15:58Z

Three new system variables to configure stats auto refresh, one for turning on, one for a timer of how often to check whether stats are fresh, and a third for a freshness threshold to trigger writing new stats. The system variables are global, and will apply to all databases in a server context. Adding or dropping databases, tables, and indexes are reflected by the statistics provider.

Statistic updates now funnel through a process where we preserve buckets for pre-existing chunks and create new buckets specifically for new chunk ordinals. The auto-refresh threshold compares the number of new and deleted buckets compared the the existing bucket count. So inserts, updates, and deletes all create new chunks or delete chunks relative to the previously saved statistics, and the fraction of (new+deleted)/previous count is compared against the threshold.

TODO:

how much logging is appropriate
go server tests
is stats dataset replication blocked by default?

other:

ANALYZE table and auto-refresh concurrency, last write wins? as long as auto-refresh thread doesn't die
drop database/ANALYZE concurrency, error for writing stats to a deleted folder

coffeegoddd · 2024-01-27T01:41:06Z

@max-hoffman DOLT

comparing_percentages
99.998653 to 99.998653

version	result	total
`340e43c`	not ok	80
`340e43c`	ok	5937377

version	total_tests
`340e43c`	5937457

correctness_percentage
99.998653

coffeegoddd · 2024-01-30T01:51:43Z

@max-hoffman DOLT

comparing_percentages
99.998653 to 99.998653

version	result	total
`f180d20`	not ok	80
`f180d20`	ok	5937377

version	total_tests
`f180d20`	5937457

correctness_percentage
99.998653

coffeegoddd · 2024-01-30T16:09:59Z

@max-hoffman DOLT

comparing_percentages
99.998653 to 99.998653

version	result	total
`97d8a04`	not ok	80
`97d8a04`	ok	5937377

version	total_tests
`97d8a04`	5937457

correctness_percentage
99.998653

…te.sh

coffeegoddd · 2024-01-30T19:34:19Z

@max-hoffman DOLT

comparing_percentages
99.998653 to 99.998653

version	result	total
`81dba2b`	not ok	80
`81dba2b`	ok	5937377

version	total_tests
`81dba2b`	5937457

correctness_percentage
99.998653

coffeegoddd · 2024-01-30T19:41:27Z

@coffeegoddd DOLT

comparing_percentages
99.998653 to 99.998653

version	result	total
`5edf046`	not ok	80
`5edf046`	ok	5937377

version	total_tests
`5edf046`	5937457

correctness_percentage
99.998653

…stats-refresh

coffeegoddd · 2024-01-30T23:44:52Z

@max-hoffman DOLT

comparing_percentages
99.998787 to 99.998653

version	result	total
`38392dd`	not ok	80
`38392dd`	ok	5937377

version	total_tests
`38392dd`	5937457

correctness_percentage
99.998653

nicktobey

Should we have bats tests for setting different values of threshold and interval?

nicktobey · 2024-01-31T22:37:12Z

go/libraries/doltcore/sqle/enginetest/stats_queries.go

 				Expected: []sql.Row{{uint64(6), uint64(6), uint64(0)}, {uint64(6), uint64(3), uint64(0)}},
 			},
 		},
 	},
+	{
+		Name: "stats updates",


What's being tested here? That the second analyze changes the stats? That the stats auto-update? The name should be more clear about what's being tested.

tested incrementally rewriting stats depending on which chunks were edited, rather than a full scan. changed test name

nicktobey · 2024-01-31T22:55:09Z

integration-tests/bats/stats.bats

+
+    run dolt sql -r csv -q "select count(*) from dolt_statistics"
+    [ "$status" -eq 0 ]
+    [ "${lines[1]}" = "8" ]


Why do we expect this to be 8?

nicktobey · 2024-01-31T22:56:08Z

integration-tests/bats/stats.bats

+
+    run dolt sql -r csv -q "select count(*) from dolt_statistics"
+    [ "$status" -eq 0 ]
+    [ "${lines[1]}" = "8" ]


I feel like this should be testing to confirm that something changed from the last select.

nicktobey · 2024-02-01T00:10:04Z

go/libraries/doltcore/sqle/stats/auto_refresh.go

+						}
+						curCnt := float64(len(curStat.active))
+						updateCnt := float64(len(idxMeta.updateChunks))
+						deleteCnt := float64(len(curStat.active) - len(idxMeta.preexisting))


Should this be reversed? The number deleted is the pre-existing minus the current count?

the naming here is somewhat rough and ready, but curStat is the existing statistics on disk, idxMeta is the update metadata. idxMeta.preexisting is the fraction of chunks in the update metadata overlapping with curStat's chunks. Will change idxMeta to indicate it's a candidate update

Yeah this is confusingly named. Pre-existing sounds like the old stats, and curStat sounds like the current (new) stats.

Maybe rename curStats to prevStats?

nicktobey · 2024-02-01T00:19:43Z

go/libraries/doltcore/sqle/stats/auto_refresh.go

+	var missingOffsets [][]uint64
+	var offset uint64
+	for _, n := range levelNodes {
+		// check if hash is in current list


Can you add a comment explaining what's going on in this block?

Added a comment. Basically partition the target level histogram hashes for the current tree into 1) preserved or 2) missing, and track scanning metadata for missing chunks

Okay I follow. Is there a particular reason we do it this way by storing all the hashes at some level of the tree, instead of just storing the root hash?

Maybe I'm misunderstanding the question, but each histogram bucket is the contents of a chunk at the target level, one-to-one equivalence. We do track the table hash to short-circuit the target level comparison if nothing has changed, but otherwise we use structural sharing to 1) get an estimate of how much has changed, and 2) avoid unnecessarily recomputing buckets that haven't changed.

Ah, okay. I missed that the chunks being tracked here mapped 1-1 with histogram buckets.

This makes sense.

WIil update docstrings to clarify

nicktobey · 2024-02-01T00:27:38Z

go/libraries/doltcore/sqle/stats/stats_provider.go

 )

 var ErrFailedToLoad = errors.New("failed to load statistics")

 type DoltStats struct {
 	level         int
 	chunks        []hash.Hash
+	active        map[hash.Hash]int


Can you add docstrings explaining the purpose of these fields? active in particular is unclear to me.

nicktobey · 2024-02-01T00:28:28Z

go/libraries/doltcore/sqle/stats/stats_provider.go

 }

 var _ sql.StatsProvider = (*Provider)(nil)

 // Init scans the statistics tables, populating the |stats| attribute.
 // Statistics are not available for reading until we've finished loading.
 func (p *Provider) Load(ctx *sql.Context, dbs []dsess.SqlDatabase) error {
-	p.mu.Lock()


Is it okay that this doesn't have a lock anymore?

rewrote how the concurrency control works, it's fairly straightforward but i might need to add some go tests for race detection

nicktobey · 2024-02-01T00:32:09Z

go/libraries/doltcore/sqle/stats/write.go

+		keyBuilder.PutString(0, qual.Database)
+		keyBuilder.PutString(1, qual.Table())
+		keyBuilder.PutString(2, qual.Index())
+		keyBuilder.PutInt64(3, 10000)


Add a comment for which column is being written here, and why it gets a hardcoded 10k.

added a maxBucketFanout constant to make clearer

nicktobey · 2024-02-01T19:52:19Z

go/libraries/doltcore/sqle/stats/update.go

+		if i < len(oldHist) && oldHist[i].Chunk == chunkAddr {
+			mergeHist = append(mergeHist, oldHist[i])
+			i++
+		} else if j < len(newHist) && newHist[j].Chunk == chunkAddr {


Possible bug: if an identical chunk appears in both the old and new histograms, then j will never be incremented and all the new chunks that come after it will be skipped.

yes this was a bug

nicktobey · 2024-02-01T19:53:20Z

go/libraries/doltcore/sqle/stats/update.go

+func firstRowForIndex(ctx *sql.Context, prollyMap prolly.Map, keyBuilder *val.TupleBuilder, prefixLen int) (sql.Row, error) {
+	buffPool := prollyMap.NodeStore().Pool()
+
+	firstIter, err := prollyMap.IterOrdinalRange(ctx, 0, 1)


Why are these values of start and stop used?

first row is ordinal 0

…te.sh

coffeegoddd · 2024-02-01T20:22:01Z

@max-hoffman DOLT

comparing_percentages
99.999074 to 99.998653

version	result	total
`b5c93ad`	not ok	80
`b5c93ad`	ok	5937377

version	total_tests
`b5c93ad`	5937457

correctness_percentage
99.998653

coffeegoddd · 2024-02-01T20:30:31Z

@coffeegoddd DOLT

comparing_percentages
99.999074 to 99.998653

version	result	total
`5288c1b`	not ok	80
`5288c1b`	ok	5937377

version	total_tests
`5288c1b`	5937457

correctness_percentage
99.998653

max-hoffman · 2024-02-01T21:07:44Z

I'm going to try to get some sort of tests that are race sensitive, which might be a better format for adding finer-grained parameter testing.

…stats-refresh

coffeegoddd · 2024-02-06T23:54:19Z

@max-hoffman DOLT

comparing_percentages
99.999562 to 99.999562

version	result	total
`7198173`	not ok	26
`7198173`	ok	5937431

version	total_tests
`7198173`	5937457

correctness_percentage
99.999562

…te.sh

coffeegoddd · 2024-02-07T05:02:59Z

@max-hoffman DOLT

comparing_percentages
99.999562 to 99.999562

version	result	total
`bbfbc36`	not ok	26
`bbfbc36`	ok	5937431

version	total_tests
`bbfbc36`	5937457

correctness_percentage
99.999562

coffeegoddd · 2024-02-07T05:14:41Z

@coffeegoddd DOLT

comparing_percentages
99.999562 to 99.999562

version	result	total
`18fd075`	not ok	26
`18fd075`	ok	5937431

version	total_tests
`18fd075`	5937457

correctness_percentage
99.999562

…stats-refresh

max-hoffman · 2024-02-07T16:34:41Z

@nicktobey I added some functions for controlling auto refresh and some minimalist testing to make sure they stop, restart, and drop stats. The intention is to give a bit more control when things go wrong. I had to refactor some of the stats provider lifecycle so that background threads and sessions have consistent root value views. I also added concurrency testing for some common cases. There's probably a lot more testing we could add, but I think this is a comfortable MVP, things mostly work and when they don't we mostly won't brick the database. Might be helpful to check TestStatsAutoRefreshConcurrency, enginetest.StatProcTests, and dprocedures/stat_funcs.go.

coffeegoddd · 2024-02-07T16:35:53Z

@max-hoffman DOLT

comparing_percentages
99.999562 to 99.999562

version	result	total
`be41ce9`	not ok	26
`be41ce9`	ok	5937431

version	total_tests
`be41ce9`	5937457

correctness_percentage
99.999562

nicktobey · 2024-02-07T19:34:19Z

go/libraries/doltcore/sqle/dprocedures/stats_funcs.go

+	"github.com/dolthub/dolt/go/libraries/doltcore/sqle/dsess"
+)
+
+func statsFunc(name string) func(ctx *sql.Context, args ...string) (sql.RowIter, error) {


Switching on the string here feels unnecessary. Can't DoltProcedures just do something like:

{Name: "dolt_stats_drop", Schema: doltMergeSchema, Function: statsFunc(statsDrop)},

yep good idea

nicktobey · 2024-02-07T19:37:04Z

go/libraries/doltcore/sqle/dprocedures/stats_funcs.go

+		}
+		return fmt.Sprintf("restarted stats collection: %s", ref.StatsRef{}.String()), nil
+	}
+	return nil, nil


Suggested change

return nil, nil

return nil, fmt.Errorf("provider does not implement AutoRefreshStatsProvider")

nicktobey · 2024-02-07T19:41:26Z

go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go

+
+	sqlDb, _ := harness.provider.BaseDatabase(harness.NewContext(), "mydb")
+
+	// auto-refresh with a 0-interval, constant loop


Make this more clear, ex: Setting an interval of 0 and a threshold of 0 will result in the stats being updated after every operation, etc

nicktobey · 2024-02-07T19:41:46Z

go/libraries/doltcore/sqle/enginetest/dolt_engine_test.go

+		_, iter, err := engine.Query(ctx, q)
+		require.NoError(t, err)
+		_, err = sql.RowIterToRows(ctx, iter)
+		//fmt.Printf("%s %d\n", tag, id)


Remove this commented out line.

nicktobey · 2024-02-07T20:22:43Z

go/libraries/doltcore/sqle/stats/stats_provider.go

+			if strings.EqualFold(db, qual.Database) && strings.EqualFold(table, qual.Table()) {
+				if s.AvgSize > avgSize {
+					avgSize = s.AvgSize
+				}


This should probably be returning avgSize, not 0.

nicktobey · 2024-02-07T20:23:03Z

go/libraries/doltcore/sqle/stats/stats_provider.go

+		for qual, s := range dbStat.stats {
+			if strings.EqualFold(db, qual.Database) && strings.EqualFold(table, qual.Table()) {
+				if s.AvgSize > avgSize {
+					avgSize = s.AvgSize


Why do we take the highest value for avgSize across all stats?

Changed this to match the primary index

nicktobey · 2024-02-07T20:24:03Z

go/libraries/doltcore/sqle/stats/write.go

 	"github.com/dolthub/dolt/go/store/prolly"
 	"github.com/dolthub/dolt/go/store/prolly/tree"
 	stypes "github.com/dolthub/dolt/go/store/types"
 	"github.com/dolthub/dolt/go/store/val"
 )

+const maxBucketFanout = 200 * 200


Add a comment explaining why this value was chosen.

coffeegoddd · 2024-02-08T00:18:14Z

@max-hoffman DOLT

comparing_percentages
99.999562 to 99.999562

version	result	total
`f09626b`	not ok	26
`f09626b`	ok	5937431

version	total_tests
`f09626b`	5937457

correctness_percentage
99.999562

max-hoffman added 3 commits January 26, 2024 12:40

Stats auto refresh prototype

cc82f99

fix bugs

0cf91c9

test stats update

340e43c

coffeegoddd added the correctness_approved label Jan 27, 2024

add tests, fix bugs

f180d20

max-hoffman added 3 commits January 29, 2024 20:29

fix test

b17bdcf

fmt

023cf59

delete test

97d8a04

max-hoffman and others added 3 commits January 30, 2024 10:18

more tests

49d78e9

delete table test

81dba2b

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

5edf046

…te.sh

max-hoffman added 3 commits January 30, 2024 15:16

add and delete stats hooks

adbcabc

fmt

641ca04

Merge branch 'max/stats-refresh' of github.com:dolthub/dolt into max/…

38392dd

…stats-refresh

max-hoffman requested a review from nicktobey January 30, 2024 23:34

coffeegoddd removed the correctness_approved label Jan 30, 2024

nicktobey reviewed Jan 31, 2024

View reviewed changes

nicktobey reviewed Feb 1, 2024

View reviewed changes

concurrency improvements

b5c93ad

nicktobey approved these changes Feb 1, 2024

View reviewed changes

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

5288c1b

…te.sh

write bug

bfcff4f

max-hoffman added 8 commits February 6, 2024 07:57

more tests and stats functions

9393f63

Merge branch 'max/stats-refresh' of github.com:dolthub/dolt into max/…

88690cb

…stats-refresh

Merge branch 'main' into max/stats-refresh

ad35847

working dolt_stat funcs

49b59b2

test fixes

1d8830f

merge main

e61f88e

bump

dedbf19

fmt

7198173

coffeegoddd added the correctness_approved label Feb 6, 2024

max-hoffman and others added 3 commits February 6, 2024 20:02

fix wg panic

1979f9b

merge main

bbfbc36

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

18fd075

…te.sh

max-hoffman added 2 commits February 7, 2024 08:10

merge main

f3e6243

Merge branch 'max/stats-refresh' of github.com:dolthub/dolt into max/…

be41ce9

…stats-refresh

nicktobey approved these changes Feb 7, 2024

View reviewed changes

nick comments

f09626b

max-hoffman merged commit d019cce into main Feb 8, 2024
21 checks passed

max-hoffman deleted the max/stats-refresh branch February 8, 2024 02:20

BrewTestBot mentioned this pull request Feb 10, 2024

dolt 1.34.0 Homebrew/homebrew-core#162234

Merged

	return nil, nil
	return nil, fmt.Errorf("provider does not implement AutoRefreshStatsProvider")


		sqlDb, _ := harness.provider.BaseDatabase(harness.NewContext(), "mydb")

		// auto-refresh with a 0-interval, constant loop

Stats auto refresh #7424

Stats auto refresh #7424

Conversation

max-hoffman commented Jan 27, 2024 • edited Loading

coffeegoddd commented Jan 27, 2024

coffeegoddd commented Jan 30, 2024

coffeegoddd commented Jan 30, 2024

coffeegoddd commented Jan 30, 2024

coffeegoddd commented Jan 30, 2024

coffeegoddd commented Jan 30, 2024

nicktobey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-hoffman Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coffeegoddd commented Feb 1, 2024

coffeegoddd commented Feb 1, 2024

max-hoffman commented Feb 1, 2024

coffeegoddd commented Feb 6, 2024

coffeegoddd commented Feb 7, 2024

coffeegoddd commented Feb 7, 2024

max-hoffman commented Feb 7, 2024

coffeegoddd commented Feb 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coffeegoddd commented Feb 8, 2024

max-hoffman commented Jan 27, 2024 •

edited

Loading

max-hoffman Feb 2, 2024 •

edited

Loading