Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Arrow #369

Open
rafapereirabr opened this issue Feb 8, 2024 · 2 comments
Open

Support for Arrow #369

rafapereirabr opened this issue Feb 8, 2024 · 2 comments

Comments

@rafapereirabr
Copy link
Member

Opening this issue with the suggestion that we include support for Arrow in r5r.

As documented on their website, Arrow specifies a standardized language-independent columnar memory format for flat and hierarchical data. This would mean two most obvious advantages: (1) passing data from Java to R (from R5 to r5r) would become seamless, (2) saving outputs in .parquet format. Both of these advantages would probably make r5r substantially faster, with more efficiency gains for large scale analyses.

There are robust implementations of Arrow in Java, R and also in Python (in case we want to implement this in r5py).

I'm not sure this could be done entirely within the Java side of r5r or whether it would require some change to R5 upstream. In any case, this might be something that the @conveyal would be interested in, since this would speed improve up the process of passing R5 results to interactive visualization in Conveyal Analysis.

@botanize
Copy link

botanize commented Mar 8, 2024

I've been working with the csv output of travel_time_matrix for a large region and many of the csv files contain only a few lines, performance reading 30k files is poor. It would be a bit more work, but writing the matrices to parquet files, aggregating multiple from_ids would be hugely helpful.

@rafapereirabr
Copy link
Member Author

yes, this could be a great improvment to te package. @botanize , if you're familiar with Java and would like to have a look at this, we would appreciate PR from collaborators

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants