-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Decompress in CSV / NDJSON scan #17841
feat: Decompress in CSV / NDJSON scan #17841
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17841 +/- ##
==========================================
- Coverage 80.50% 80.49% -0.02%
==========================================
Files 1504 1505 +1
Lines 197152 197202 +50
Branches 2805 2806 +1
==========================================
+ Hits 158722 158737 +15
- Misses 37909 37944 +35
Partials 521 521 ☔ View full report in Codecov by Sentry. |
Can't we decompress into our file cache? |
It's possible, but I will need to think a bit about how we can do it. |
Yes, on second thought. Let's give it some thought. Eventually we'd like to stream decompressions. In bio-informatics they compress 100sGB's of csv. |
We can have support for this without too much trouble if we concede that we end up fully decompressing twice (once during IR conversion /
file_info
, and then again during the actual read) - I can open an issue afterwards for that.Before
After
Closes #7287