Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cpuTimeAcc from DataSource#createSegmentMapFunction's signature #17623

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kgyrtkirk
Copy link
Member

This patch remove cpuTimeAcc from DataSource#createSegmentMapFunction signature.

the method being executed doesn't need to know internally that they are being measured; it also complicates code-flow a bit - as it tries to avoid double counting.

@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jan 14, 2025
@clintropolis
Copy link
Member

added design review label since this is modifying the signature of an extension point

@kgyrtkirk
Copy link
Member Author

kgyrtkirk commented Jan 28, 2025

I don't think DataSource is api - its not marked with the annotations

I haven't seen any implementations of it outside the core modules (except one in msq)
git grep DataSource|fgrep -f <(echo extends;echo implements)|grep -v 'DataSourceMetadata'|grep -v /test/|less

in fact the current way things work; if someone would add a new datasource would have a pretty rough time running that on msq ; or having it work correctly in joins.

The issues I've seen can't be fixed without altering the Datasource api; unless I cheat with proxies/etc.

@clintropolis
Copy link
Member

I don't think DataSource is api - its not marked with the annotations

I haven't seen any implementations of it outside the core modules (except one in msq)
in fact the current way things work; if someone would add a new datasource would have a pretty rough time running that on msq ; or having it work correctly in joins.

The issues I've seen can't be fixed without altering the Datasource api; unless I cheat with proxies/etc.

Yea, i did this out of an abundance of caution on changing stuff on core APIs, you're right that DataSource isn't directly marked, but Query is, and queries have a DataSource so any custom query engines would likely need to be rebuilt to account for the signature change.

That said, I think this change makes sense to make, so will approve after I finish reviewing, we just need to be sure to call it out in the dev oriented release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying Design Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants