Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure a max limit for lgK values of HLL sketches #15516

Closed
wants to merge 6 commits into from

Conversation

adarshsanjeev
Copy link
Contributor

When using HllSketches, the configured lgK directly relates to the accuracy and resources used. Large values of lgK uses significant amounts of memory for small increases in performance. This PR adds a configurable limit as a runtime property.
This would prevent values of lgK greater than this limit from being allowed.

  • Adds a runtime parameter to configure the maximum allowed value of lgK.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Documentation Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Dec 8, 2023
@cryptoe
Copy link
Contributor

cryptoe commented Dec 11, 2023

Lets start adding release notes to this running doc created by this PR : #15333


|Property| Description| Default |
|--------|------------|------|
|`druid.sketch.config.hllMaxLgK`| The maximum possible value of lgK that HLL sketches can be created with. Useful to limit the maximum lgK in sketches, to avoid the significant usage of resources used by sketches at higher values of lgK. An exception will be thrown if a query configures a lgK value higher than this. This property needs to be set on the broker and middle-manager/indexer. | 20 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an ingestion time property only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lgK can be specified during query time as well, wouldn't it need to be limited to avoid running into similar issues?

Copy link
Contributor

@cryptoe cryptoe Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its more of a ingestion time check only since what we want to avoid is huge rows in the generated segments which increasing the segment sizes which in turn decrease query performance.

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

throw DruidException.forPersona(DruidException.Persona.USER)
.ofCategory(DruidException.Category.INVALID_INPUT)
.build(
"LgK value [%s] for HLL sketch cannot be greater than [%s]. Reduce the lgK value"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention the name as well.

Copy link

github-actions bot commented Apr 2, 2024

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the [email protected] list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 2, 2024
Copy link

github-actions bot commented May 1, 2024

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Dependencies Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants