-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib-parquet: how to convert numeric dates #1093
Comments
hey! DateTimes should be automatically converted by the library to DateTimeImmutable objects, integers are just how parquet stores them.
and past me the output here?
This should give me a better view of your file structure so I might be able to identify the issue. |
I apologize, I had to specify it before. The parquet file comes from Amazon Redshift through the UNLOAD command. Please, find here attached the whole parquet file. It's very small. I use PHP 8.2.16 (cli) with snappy on Linux Ubuntu Server 22.0.4. |
Hey, so it seems that Amazon Redshift is saving datatetimes just as INT32 with converted type Support for Converted Type according to parquet format is deprecated and afaik when I was working on this implementation I wasn't sure if this will be needed or not. Give me a day or two and I will bring a support for converted type date to recognize them as date time objects properly. Thanks for bringing this up and providing everything I needed to quickly identify the issue, it's a great contribution to this library! |
It was easier than I expected, once this is merged it should properly return you DateTimeImmutable objects |
It seems to be working properly now: <?php
use Flow\Parquet\Reader;
require __DIR__ . '/../../vendor/autoload.php';
$reader = new Reader();
foreach ($reader->read(__DIR__ . '/0000_part_00.parquet')->values() as $row) {
var_dump(\json_encode($row));die;
die;
} output (in json) {"year":2014,"yearreportdate":{"date":"2014-06-06 00:00:00.000000","timezone_type":3,"timezone":"UTC"},"prioryearreportdate":{"date":"2013-06-08 00:00:00.000000","timezone_type":3,"timezone":"UTC"},"yeardayoffset":4294967294,"ltmstartdate":{"date":"2013-06-09 00:00:00.000000","timezone_type":3,"timezone":"UTC"},"ltmenddate":{"date":"2014-06-06 00:00:00.000000","timezone_type":3,"timezone":"UTC"},"currentfiscalyear":2024,"currentfiscalquarter":2,"currentfiscalquarterlabel":"24 Q2","currentfiscalquarterstartdate":{"date":"2024-03-31 00:00:00.000000","timezone_type":3,"timezone":"UTC"},"currentfiscalquarterenddate":{"date":"2024-06-29 00:00:00.000000","timezone_type":3,"timezone":"UTC"},"currentfiscalmonth":6,"currentfiscalquarterday":69,"currentfiscalquarterweek":10,"currentfiscalweek":23,"currentreportdate":{"date":"2024-06-07 00:00:00.000000","timezone_type":3,"timezone":"UTC"},"currentdayofyear":159,"yesterdaydate":{"date":"2024-06-07 00:00:00.000000","timezone_type":3,"timezone":"UTC"},"yesterdaydayofyear":159,"priorfiscalyear":2023,"priorfiscalyearquarterlabel":"23 Q2","priorfiscalquarterlabel":"24 Q1","nextfiscalquarterlabel":"24 Q3","cmrogflag":"N","currentforecastscenario":""}" I'm closing this issue, in case of any problems don't hesitate to reach out again! |
It works! You're a fine gentleman, thank you! I'll keep testing Amazon Parquet files and come back in case of other questions. |
Not really an issue but more a simple request for clarifications.
I have a parquet file that contains several date columns (in Y-m-d format). When I loop through the rows, I see these dates coming up as integer values (for example, 16227 instead of 2014-06-06).
Is there a way to convert them automatically? In case not, how can I correctly visualize the dates when I read the file? I tried the following but it looks not efficient at all (especially for files with millions of records). Thank you in advance!
The text was updated successfully, but these errors were encountered: