-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nereids] consider numNulls in filter estimation #29184
Conversation
run buildall |
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
double numNulls = stats.numNulls; | ||
double ndv = stats.ndv; | ||
if (numNulls > rowCount - ndv) { | ||
numNulls = rowCount - ndv > 0 ? rowCount - ndv : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"numNulls = rowCount-ndv" this is not true.
consider col values (1, 1, 1, null)
rowcount=4, ndv=1
then we get numNulls = 4-1=3, but it is 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current impl may lead to the inconsistent status between rowCount and ndv, so it will go into this handling logic unexpectly. I will comment it first and refine it in future.
5e004d0
to
0cad7be
Compare
run buildall |
1 similar comment
run buildall |
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
752dc13
to
8f80f8c
Compare
run buildall |
(From new machine)TeamCity pipeline, clickbench performance test result: |
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
@@ -117,6 +118,11 @@ public Statistics visitCompoundPredicate(CompoundPredicate predicate, Estimation | |||
colBuilder.setMinValue(union.getLow()).setMinExpr(union.getLowExpr()) | |||
.setMaxValue(union.getHigh()).setMaxExpr(union.getHighExpr()) | |||
.setNdv(union.getDistinctValues()); | |||
if (!(leftExpr instanceof IsNull || rightExpr instanceof IsNull)) { | |||
colBuilder.setNumNulls(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(A is null and B>0) or (A is not null and B>0)
=>A.numNulls = 0
this is not true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the max numNulls instead.
825932c
to
393db64
Compare
run buildall |
} else if (statsForRight.isUnKnown) { | ||
selectivity = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right is a literal, why it could be unknow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will remove it.
if (not.child().getInputSlots().size() == 1 && !(child instanceof IsNull)) { | ||
// only consider the single column numNull, otherwise, ignore | ||
rowCount = Math.max(rowCount - originColStats.numNulls, 1); | ||
statisticsBuilder.setRowCount(rowCount); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could cause heavy errors when the child is unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the precondition has been checked before.
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools
|
TPC-DS test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpcds-tools
|
@@ -568,10 +581,33 @@ public Statistics visitLike(Like like, EstimationContext context) { | |||
"col stats not found. slot=%s in %s", | |||
like.left().toSql(), like.toSql()); | |||
ColumnStatisticBuilder colBuilder = new ColumnStatisticBuilder(origin); | |||
colBuilder.setNdv(origin.ndv * DEFAULT_LIKE_COMPARISON_SELECTIVITY).setNumNulls(0); | |||
double selectivity = origin.ndv * DEFAULT_LIKE_COMPARISON_SELECTIVITY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is 'ndv' not selectivity
4cb01f3
to
0f692eb
Compare
run buildall |
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools
|
TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools
|
TPC-DS test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpcds-tools
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
//if (numNulls > rowCount - ndv) { | ||
// numNulls = rowCount - ndv > 0 ? rowCount - ndv : 0; | ||
//} | ||
double notNullSel = rowCount <= 1.0 ? 1.0 : 1 - getValidSelectivity(numNulls / rowCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if rowCount=0, notNullSel is NaN. And this NaN pollute following derivation.
PR approved by at least one committer and no changes requested. |
1 similar comment
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
1 similar comment
PR approved by anyone and no changes requested. |
consider numNulls in filter estimation
consider numNulls in filter estimation
Proposed changes
consider numNulls in filter estimation
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...