-
-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statistics for multiple branches #7558
Conversation
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
keyBuilder.PutInt64(3, maxBucketFanout+1) | ||
maxKey := keyBuilder.Build(pool) | ||
|
||
// there is a limit on the number of buckets for a given index, iter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the implications of this comment? Is this relevant for this function?
} | ||
|
||
dSess := dsess.DSessFromSess(loadCtx.Session) | ||
var branches []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the order of precedence here is:
- Global variable
- Current session branch
- Provider default branch
Is this documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added helper and docstring
func (p *Provider) Load(ctx *sql.Context, fs filesys.Filesys, db dsess.SqlDatabase, branches []string) error { | ||
// |statPath| is either file://./stat or mem://stat | ||
statsDb, err := p.sf.Init(ctx, db, p.pro, fs, env.GetCurrentUserHomeDir) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining why we don't propagate the error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, maybe i should kill the error return. Was thinking that a bad branch name probably shouldn't prevent the other branches from being loaded
"github.com/dolthub/dolt/go/store/hash" | ||
) | ||
|
||
type DoltStats struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring that explains why this exists in addition to sql.Statistic
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to reimplement this for TPC-C perf, I'm not quite sure what the right interface is yet. Real reason was just ease of implementing the histogram manipulation. The interface will probably have to have a bunch of interfaces for in-place updates so we don't have to copy this to sql.Statistic
return err | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same logic I commented on in Configure
? Let's move it to a common helper function.
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
@max-hoffman DOLT
|
@max-hoffman DOLT
|
@coffeegoddd DOLT
|
@coffeegoddd DOLT
|
Fork interface for how to store database statistics. Can either store in the original source database, in a separate database in
.dolt/stats
, or an alternative implementation like a lsm that will have easier append only semantics. The implementation for the noms chunkstore isn't particularly efficient, we will not deduplicate common chunks between branches.How the new code is organized:
statspro
has generic interfaces for how a Dolts-specific stats implementation works.statsnoms
is an implementation that uses a second database at.dolt/stats
to store statistic refs. The stats refs are the same, just now they are named by the branch they reference (ex:refs/statistics/main
). So storage is the concern of the concrete implementation. The common interface forces the implementation to handle branches. The branch switching instatsnoms
are just named refs.A high level of what's happening during certain operations: There are still two operations, load and update. Load now either initializes the stats database at
.dolt/stats
or loads existing stats. Update is the same, ANALYZE or auto refresh.Most of the changes are just forcing the logic through a generic interface. There were import cycle issues (
dtables
) and deadlock issues for creating a database (dolt provider takes a lock that prevents doing certain operation on the session in the stats provider) that motivated packaging decisions.