Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: support for deleting segments using sql #13678

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sullis
Copy link
Contributor

@sullis sullis commented Jul 22, 2024

This PR is a work in progress.

Motivation

This PR was motivated by a GitHub issue:
#13476

Modifications

  • add class: DeleteSegmentStatement
  • new logic in CalciteSqlParser
  • new logic DataManipulationStatementParser

DeleteSegmentStatement class

DeleteSegmentStatement implements Pinot's DataManipulationStatement interface.

This class uses ExecutionType.MINION

Feedback requested:

  • is ExecutionType.MINION the correct choice for DeleteSegmentStatement? The only other choice is ExecutionType.HTTP. I reviewed the source tree and I could not find any Pinot statements that use ExecutionType.HTTP

  • What is the correct way to delete segments? I have not worked with segments before and I could use some guidance.

  • Based on my current understanding, it looks like I need to add behavior to the "executeTask" method in SegmentDeletionTaskExecutor.java. I left a "TODO" comment in this file to indicate that the implementation is incomplete

Testing

  • added DeleteSegmentStatementTest.java
  • added DataManipulationStatementParserTest.java
  • added SegmentDeletionMinionClusterIntegrationTest.java
  • updated CalciteSqlCompilerTest.java

Supporting document

This Google Doc has additional information about Pinot SQL endpoints and segment deletion:

https://docs.google.com/document/d/1bsg3QKeZiXiFh2__tDkor-v0eDw0QBKuPZiK3DWg_04/edit?usp=sharing

@sullis
Copy link
Contributor Author

sullis commented Jul 22, 2024

This PR is a work in progress.

Feedback requested cc: @npawar @Jackie-Jiang

@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 30.76923% with 72 lines in your changes missing coverage. Please review.

Project coverage is 61.92%. Comparing base (59551e4) to head (54de3a7).
Report is 854 commits behind head on master.

Files Patch % Lines
...s/segmentdeletion/SegmentDeletionTaskExecutor.java 0.00% 23 Missing ⚠️
...n/tasks/segmentdeletion/SegmentDeletionResult.java 0.00% 21 Missing ⚠️
.../segmentdeletion/SegmentDeletionTaskGenerator.java 0.00% 9 Missing ⚠️
.../pinot/sql/parsers/dml/DeleteSegmentStatement.java 80.00% 3 Missing and 4 partials ⚠️
...ntdeletion/SegmentDeletionTaskExecutorFactory.java 0.00% 6 Missing ⚠️
...org/apache/pinot/sql/parsers/CalciteSqlParser.java 60.00% 1 Missing and 1 partial ⚠️
...on/SegmentDeletionTaskProgressObserverFactory.java 0.00% 2 Missing ⚠️
...l/parsers/dml/DataManipulationStatementParser.java 50.00% 0 Missing and 1 partial ⚠️
.../org/apache/pinot/core/common/MinionConstants.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13678      +/-   ##
============================================
+ Coverage     61.75%   61.92%   +0.17%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2560     +124     
  Lines        133233   140669    +7436     
  Branches      20636    21881    +1245     
============================================
+ Hits          82274    87115    +4841     
- Misses        44911    46910    +1999     
- Partials       6048     6644     +596     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 ?
java-11 61.87% <30.76%> (+0.16%) ⬆️
java-21 61.82% <30.76%> (+0.19%) ⬆️
skip-bytebuffers-false 61.91% <30.76%> (+0.16%) ⬆️
skip-bytebuffers-true 61.77% <30.76%> (+34.04%) ⬆️
temurin 61.92% <30.76%> (+0.17%) ⬆️
unittests 61.92% <30.76%> (+0.17%) ⬆️
unittests1 46.45% <74.41%> (-0.44%) ⬇️
unittests2 27.70% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xiangfu0
Copy link
Contributor

xiangfu0 commented Jul 31, 2024

Overall I'm good with the parser part.

Regarding the implementation for deletion, I feel it's ok to let controller handle it directly, since this is not a heavy or time consuming operation.

Calling controller segment deletion API(HTTP endpoint) is good enough.


@Override
public ExecutionType getExecutionType() {
return ExecutionType.MINION;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to use HTTP to make the segment deletion requests synchronized.
Though underlying deleting segments from servers is still async

private final String _table;
private final Map<String, String> _queryOptions;

public DeleteSegmentStatement(String table, Map<String, String> queryOptions) {
Copy link
Contributor

@xiangfu0 xiangfu0 Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say if later on I want to extend the usage to:

  1. leverage segment metadata or other criteria for segment selection, e.g. I may want to delete all the segments with a time range [d1, d2] or segments with partition.id = 10

  2. query matched segments using pinot query is the predicate is a real query, e.g.
    DELETE FROM myTable WHERE uuid='xxx' , which means you can send a query: SELECT distinct $segmentName FROM myTable WHERE uuid='xxx' to pinot first to get all the segment list then drop them accordingly.

I feel if we can separate this to just table and a list of segments to delete, and before this, we can have different implementations for a SegmentSelector interface, e.g. NameBasedSegmentSelector, MetadataBasedSegmentSelector, QueryBasedSegmentSelector.

@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants