-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip: support for deleting segments using sql #13678
base: master
Are you sure you want to change the base?
Conversation
This PR is a work in progress. Feedback requested cc: @npawar @Jackie-Jiang |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13678 +/- ##
============================================
+ Coverage 61.75% 61.92% +0.17%
+ Complexity 207 198 -9
============================================
Files 2436 2560 +124
Lines 133233 140669 +7436
Branches 20636 21881 +1245
============================================
+ Hits 82274 87115 +4841
- Misses 44911 46910 +1999
- Partials 6048 6644 +596
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Overall I'm good with the parser part. Regarding the implementation for deletion, I feel it's ok to let controller handle it directly, since this is not a heavy or time consuming operation. Calling controller segment deletion API(HTTP endpoint) is good enough. |
|
||
@Override | ||
public ExecutionType getExecutionType() { | ||
return ExecutionType.MINION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to use HTTP to make the segment deletion requests synchronized.
Though underlying deleting segments from servers is still async
private final String _table; | ||
private final Map<String, String> _queryOptions; | ||
|
||
public DeleteSegmentStatement(String table, Map<String, String> queryOptions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say if later on I want to extend the usage to:
-
leverage segment metadata or other criteria for segment selection, e.g. I may want to delete all the segments with a time range [d1, d2] or segments with partition.id = 10
-
query matched segments using pinot query is the predicate is a real query, e.g.
DELETE FROM myTable WHERE uuid='xxx'
, which means you can send a query:SELECT distinct $segmentName FROM myTable WHERE uuid='xxx'
to pinot first to get all the segment list then drop them accordingly.
I feel if we can separate this to just table and a list of segments to delete, and before this, we can have different implementations for a SegmentSelector interface, e.g. NameBasedSegmentSelector
, MetadataBasedSegmentSelector
, QueryBasedSegmentSelector
.
This PR is a work in progress.
Motivation
This PR was motivated by a GitHub issue:
#13476
Modifications
DeleteSegmentStatement class
DeleteSegmentStatement implements Pinot's DataManipulationStatement interface.
This class uses ExecutionType.MINION
Feedback requested:
is ExecutionType.MINION the correct choice for DeleteSegmentStatement? The only other choice is ExecutionType.HTTP. I reviewed the source tree and I could not find any Pinot statements that use ExecutionType.HTTP
What is the correct way to delete segments? I have not worked with segments before and I could use some guidance.
Based on my current understanding, it looks like I need to add behavior to the "executeTask" method in SegmentDeletionTaskExecutor.java. I left a "TODO" comment in this file to indicate that the implementation is incomplete
Testing
Supporting document
This Google Doc has additional information about Pinot SQL endpoints and segment deletion:
https://docs.google.com/document/d/1bsg3QKeZiXiFh2__tDkor-v0eDw0QBKuPZiK3DWg_04/edit?usp=sharing