Support for Arrow #369

rafapereirabr · 2024-02-08T19:15:02Z

Opening this issue with the suggestion that we include support for Arrow in r5r.

As documented on their website, Arrow specifies a standardized language-independent columnar memory format for flat and hierarchical data. This would mean two most obvious advantages: (1) passing data from Java to R (from R5 to r5r) would become seamless, (2) saving outputs in .parquet format. Both of these advantages would probably make r5r substantially faster, with more efficiency gains for large scale analyses.

There are robust implementations of Arrow in Java, R and also in Python (in case we want to implement this in r5py).

I'm not sure this could be done entirely within the Java side of r5r or whether it would require some change to R5 upstream. In any case, this might be something that the @conveyal would be interested in, since this would speed improve up the process of passing R5 results to interactive visualization in Conveyal Analysis.

The text was updated successfully, but these errors were encountered:

botanize · 2024-03-08T12:03:57Z

I've been working with the csv output of travel_time_matrix for a large region and many of the csv files contain only a few lines, performance reading 30k files is poor. It would be a bit more work, but writing the matrices to parquet files, aggregating multiple from_ids would be hugely helpful.

rafapereirabr · 2024-03-08T12:25:31Z

yes, this could be a great improvment to te package. @botanize , if you're familiar with Java and would like to have a look at this, we would appreciate PR from collaborators

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Arrow #369

Support for Arrow #369

rafapereirabr commented Feb 8, 2024

botanize commented Mar 8, 2024

rafapereirabr commented Mar 8, 2024

Support for Arrow #369

Support for Arrow #369

Comments

rafapereirabr commented Feb 8, 2024

botanize commented Mar 8, 2024

rafapereirabr commented Mar 8, 2024