Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Arrow file writer implementation for Mapper #13565

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

KKcorps
Copy link
Contributor

@KKcorps KKcorps commented Jul 9, 2024

Pending in this implementation:

  • Sorting each chunk
  • Putting non-sort column data and sort-column data in seperate dirs (currently in seperate files already)
  • Serialising file metadata map to disk
  • Add unit test
  • Support Map and Big decimal data type

The current data output looks like following
Screenshot 2024-07-11 at 1 23 16 PM

The chunk metadata is in the following format

{
    "arrow_dir/non_sort_columns/1.arrow": {
        "rowCount": 0,
        "byteCount": 0
    },
    "arrow_dir/sort_columns/1.arrow": {
        "rowCount": 0,
        "byteCount": 0
    },
    "arrow_dir/sort_columns/0.arrow": {
        "rowCount": 5,
        "byteCount": 130
    },
    "arrow_dir/non_sort_columns/0.arrow": {
        "rowCount": 5,
        "byteCount": 531
    }
}

@codecov-commenter
Copy link

codecov-commenter commented Jul 9, 2024

Codecov Report

Attention: Patch coverage is 0% with 392 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (59551e4) to head (43eefef).
Report is 799 commits behind head on master.

Files Patch % Lines
...ocessing/genericrow/GenericRowArrowFileWriter.java 0.00% 297 Missing ⚠️
.../segment/processing/genericrow/ArrowSortUtils.java 0.00% 95 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (59551e4) and HEAD (43eefef). Click for more details.

HEAD has 20 uploads less than BASE
Flag BASE (59551e4) HEAD (43eefef)
temurin 12 9
java-21 7 6
skip-bytebuffers-true 3 2
skip-bytebuffers-false 7 4
unittests 5 0
unittests1 2 0
java-11 5 3
unittests2 3 0
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #13565       +/-   ##
=============================================
- Coverage     61.75%    0.00%   -61.75%     
+ Complexity      207        6      -201     
=============================================
  Files          2436     2481       +45     
  Lines        133233   137270     +4037     
  Branches      20636    21379      +743     
=============================================
- Hits          82274        6    -82268     
- Misses        44911   137264    +92353     
+ Partials       6048        0     -6048     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 <0.01% <0.00%> (-61.71%) ⬇️
java-21 <0.01% <0.00%> (-61.63%) ⬇️
skip-bytebuffers-false <0.01% <0.00%> (-61.75%) ⬇️
skip-bytebuffers-true <0.01% <0.00%> (-27.73%) ⬇️
temurin <0.01% <0.00%> (-61.75%) ⬇️
unittests ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@KKcorps KKcorps force-pushed the arrow_file_write_impl branch from b107ed8 to 94e2ec3 Compare July 9, 2024 22:33
@KKcorps KKcorps force-pushed the arrow_file_write_impl branch from 6ca5f9a to c69e0c2 Compare July 10, 2024 06:24
@KKcorps KKcorps force-pushed the arrow_file_write_impl branch from 6c0c40d to 8dcb464 Compare July 11, 2024 06:09
@KKcorps KKcorps force-pushed the arrow_file_write_impl branch from e162a9e to 43eefef Compare July 24, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants