-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure a max limit for lgK values of HLL sketches #15516
Conversation
Lets start adding release notes to this running doc created by this PR : #15333 |
|
||
|Property| Description| Default | | ||
|--------|------------|------| | ||
|`druid.sketch.config.hllMaxLgK`| The maximum possible value of lgK that HLL sketches can be created with. Useful to limit the maximum lgK in sketches, to avoid the significant usage of resources used by sketches at higher values of lgK. An exception will be thrown if a query configures a lgK value higher than this. This property needs to be set on the broker and middle-manager/indexer. | 20 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an ingestion time property only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lgK can be specified during query time as well, wouldn't it need to be limited to avoid running into similar issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its more of a ingestion time check only since what we want to avoid is huge rows in the generated segments which increasing the segment sizes which in turn decrease query performance.
...ain/java/org/apache/druid/query/aggregation/datasketches/hll/HllSketchAggregatorFactory.java
Show resolved
Hide resolved
...ions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQDataSketchesTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
throw DruidException.forPersona(DruidException.Persona.USER) | ||
.ofCategory(DruidException.Category.INVALID_INPUT) | ||
.build( | ||
"LgK value [%s] for HLL sketch cannot be greater than [%s]. Reduce the lgK value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention the name as well.
This pull request has been marked as stale due to 60 days of inactivity. |
This pull request/issue has been closed due to lack of activity. If you think that |
When using HllSketches, the configured lgK directly relates to the accuracy and resources used. Large values of lgK uses significant amounts of memory for small increases in performance. This PR adds a configurable limit as a runtime property.
This would prevent values of lgK greater than this limit from being allowed.
This PR has: