-
-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats auto refresh #7424
Stats auto refresh #7424
Conversation
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
@max-hoffman DOLT
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have bats tests for setting different values of threshold and interval?
Expected: []sql.Row{{uint64(6), uint64(6), uint64(0)}, {uint64(6), uint64(3), uint64(0)}}, | ||
}, | ||
}, | ||
}, | ||
{ | ||
Name: "stats updates", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's being tested here? That the second analyze changes the stats? That the stats auto-update? The name should be more clear about what's being tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested incrementally rewriting stats depending on which chunks were edited, rather than a full scan. changed test name
|
||
run dolt sql -r csv -q "select count(*) from dolt_statistics" | ||
[ "$status" -eq 0 ] | ||
[ "${lines[1]}" = "8" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we expect this to be 8?
|
||
run dolt sql -r csv -q "select count(*) from dolt_statistics" | ||
[ "$status" -eq 0 ] | ||
[ "${lines[1]}" = "8" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be testing to confirm that something changed from the last select.
} | ||
curCnt := float64(len(curStat.active)) | ||
updateCnt := float64(len(idxMeta.updateChunks)) | ||
deleteCnt := float64(len(curStat.active) - len(idxMeta.preexisting)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be reversed? The number deleted is the pre-existing minus the current count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the naming here is somewhat rough and ready, but curStat
is the existing statistics on disk, idxMeta
is the update metadata. idxMeta.preexisting
is the fraction of chunks in the update metadata overlapping with curStat
's chunks. Will change idxMeta
to indicate it's a candidate update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is confusingly named. Pre-existing sounds like the old stats, and curStat sounds like the current (new) stats.
Maybe rename curStats
to prevStats
?
var missingOffsets [][]uint64 | ||
var offset uint64 | ||
for _, n := range levelNodes { | ||
// check if hash is in current list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining what's going on in this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. Basically partition the target level histogram hashes for the current tree into 1) preserved or 2) missing, and track scanning metadata for missing chunks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I follow. Is there a particular reason we do it this way by storing all the hashes at some level of the tree, instead of just storing the root hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding the question, but each histogram bucket is the contents of a chunk at the target level, one-to-one equivalence. We do track the table hash to short-circuit the target level comparison if nothing has changed, but otherwise we use structural sharing to 1) get an estimate of how much has changed, and 2) avoid unnecessarily recomputing buckets that haven't changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay. I missed that the chunks being tracked here mapped 1-1 with histogram buckets.
This makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIil update docstrings to clarify
) | ||
|
||
var ErrFailedToLoad = errors.New("failed to load statistics") | ||
|
||
type DoltStats struct { | ||
level int | ||
chunks []hash.Hash | ||
active map[hash.Hash]int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add docstrings explaining the purpose of these fields? active
in particular is unclear to me.
} | ||
|
||
var _ sql.StatsProvider = (*Provider)(nil) | ||
|
||
// Init scans the statistics tables, populating the |stats| attribute. | ||
// Statistics are not available for reading until we've finished loading. | ||
func (p *Provider) Load(ctx *sql.Context, dbs []dsess.SqlDatabase) error { | ||
p.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay that this doesn't have a lock anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrote how the concurrency control works, it's fairly straightforward but i might need to add some go tests for race detection
keyBuilder.PutString(0, qual.Database) | ||
keyBuilder.PutString(1, qual.Table()) | ||
keyBuilder.PutString(2, qual.Index()) | ||
keyBuilder.PutInt64(3, 10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment for which column is being written here, and why it gets a hardcoded 10k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a maxBucketFanout
constant to make clearer
if i < len(oldHist) && oldHist[i].Chunk == chunkAddr { | ||
mergeHist = append(mergeHist, oldHist[i]) | ||
i++ | ||
} else if j < len(newHist) && newHist[j].Chunk == chunkAddr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible bug: if an identical chunk appears in both the old and new histograms, then j will never be incremented and all the new chunks that come after it will be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this was a bug
func firstRowForIndex(ctx *sql.Context, prollyMap prolly.Map, keyBuilder *val.TupleBuilder, prefixLen int) (sql.Row, error) { | ||
buffPool := prollyMap.NodeStore().Pool() | ||
|
||
firstIter, err := prollyMap.IterOrdinalRange(ctx, 0, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these values of start and stop used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first row is ordinal 0
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
I'm going to try to get some sort of tests that are race sensitive, which might be a better format for adding finer-grained parameter testing. |
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
@nicktobey I added some functions for controlling auto refresh and some minimalist testing to make sure they stop, restart, and drop stats. The intention is to give a bit more control when things go wrong. I had to refactor some of the stats provider lifecycle so that background threads and sessions have consistent root value views. I also added concurrency testing for some common cases. There's probably a lot more testing we could add, but I think this is a comfortable MVP, things mostly work and when they don't we mostly won't brick the database. Might be helpful to check |
@max-hoffman DOLT
|
"github.com/dolthub/dolt/go/libraries/doltcore/sqle/dsess" | ||
) | ||
|
||
func statsFunc(name string) func(ctx *sql.Context, args ...string) (sql.RowIter, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching on the string here feels unnecessary. Can't DoltProcedures just do something like:
{Name: "dolt_stats_drop", Schema: doltMergeSchema, Function: statsFunc(statsDrop)},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep good idea
} | ||
return fmt.Sprintf("restarted stats collection: %s", ref.StatsRef{}.String()), nil | ||
} | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, nil | |
return nil, fmt.Errorf("provider does not implement AutoRefreshStatsProvider") |
|
||
sqlDb, _ := harness.provider.BaseDatabase(harness.NewContext(), "mydb") | ||
|
||
// auto-refresh with a 0-interval, constant loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this more clear, ex: Setting an interval of 0 and a threshold of 0 will result in the stats being updated after every operation
, etc
_, iter, err := engine.Query(ctx, q) | ||
require.NoError(t, err) | ||
_, err = sql.RowIterToRows(ctx, iter) | ||
//fmt.Printf("%s %d\n", tag, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this commented out line.
if strings.EqualFold(db, qual.Database) && strings.EqualFold(table, qual.Table()) { | ||
if s.AvgSize > avgSize { | ||
avgSize = s.AvgSize | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be returning avgSize, not 0.
for qual, s := range dbStat.stats { | ||
if strings.EqualFold(db, qual.Database) && strings.EqualFold(table, qual.Table()) { | ||
if s.AvgSize > avgSize { | ||
avgSize = s.AvgSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we take the highest value for avgSize across all stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to match the primary index
"github.com/dolthub/dolt/go/store/prolly" | ||
"github.com/dolthub/dolt/go/store/prolly/tree" | ||
stypes "github.com/dolthub/dolt/go/store/types" | ||
"github.com/dolthub/dolt/go/store/val" | ||
) | ||
|
||
const maxBucketFanout = 200 * 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining why this value was chosen.
@max-hoffman DOLT
|
Three new system variables to configure stats auto refresh, one for turning on, one for a timer of how often to check whether stats are fresh, and a third for a freshness threshold to trigger writing new stats. The system variables are global, and will apply to all databases in a server context. Adding or dropping databases, tables, and indexes are reflected by the statistics provider.
Statistic updates now funnel through a process where we preserve buckets for pre-existing chunks and create new buckets specifically for new chunk ordinals. The auto-refresh threshold compares the number of new and deleted buckets compared the the existing bucket count. So inserts, updates, and deletes all create new chunks or delete chunks relative to the previously saved statistics, and the fraction of (new+deleted)/previous count is compared against the threshold.
TODO:
other: