-
Notifications
You must be signed in to change notification settings - Fork 1k
Experiment: Use fast serialization library for serializing graph
This is based on issue #1506
I made tests on master with http://ruedigermoeller.github.io/fast-serialization/
According to current master (3. oct. 2014) Fast serialization doesn't improve build/read times and serialization size at least for New York.
All tests are made on Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz 64 bit with 8 GB of memory of which 50% is free (one FF and Netbeans needs 4 GB of RAM)
Graph build time master Slovenia
real 1m6.558s
user 1m29.557s
sys 0m4.947s
File size: 173006442 (165MB)
With OSM of Slovenia and GTFS of Maribor and without Elevation data
OSM Size 741854134 (708MB) .osm
GTFS Size 886354 (866K)
Graph build time master Maribor
real 0m13.609s
user 0m23.867s
sys 0m0.507s
File size: 19919430 (19MB)
With OSM Maribor and GTFS of Maribor and without Elevation data
OSM Size 62386091 (60MB) .osm
GTFS Size 886354 (866K)
Graph build time master fast serialization Slovenia
real 1m5.240s
user 1m31.330s
sys 0m3.573s
File size: 151043817 (145MB)
With OSM of Slovenia and GTFS of Maribor and without Elevation data
OSM Size 741854134 (708MB) .osm
GTFS Size 886354 (866K)
Graph build time master fast serialization Maribor
real 0m11.599s
user 0m22.997s
sys 0m0.330s
File size: 17311379 (17MB)
With OSM Maribor and GTFS of Maribor and without Elevation data
OSM Size 62386091 (60MB) .osm
GTFS Size 886354 (866K)
Graph build is negligible faster with fast serialization. But is 13% smaller in Slovenia and Maribor case. Fast serialization needs Java 1.7 but can be also made to work with 1.6 http://ruedigermoeller.github.io/fast-serialization/
This is made with drop in replacement Fast serialization, file size and speed can probably be improved because now reflection is used if there is no special FST serializers for classes and then Java serialization is used according to Fast serialization wiki
I tested largest 20 PlainStreets from Slovenia in microbenchmark serialization-benchmark
I added custom serializers for Vertex, IntersectionVertex, Coordinate and PlainStreetEdge
But even with this serializers results aren't much improved:
lib read (ns) write (ns) total (ns) size (bytes)
FST Vertex S Coor S 31138816 27979583 59118399 13702695
FST Vertex S 32457766 28876080 61333846 13702695
FSTSpeed 32287995 29663330 61951325 13578853
FST Coor S 31924599 30170905 62095504 13578853
FST 32561549 29855780 62417329 13578853
Java built in 37825031 48773887 86598918 14187397
We can see that just adding Fast_serializer which is couple of lines change improves writing+reading speed for 24ms on these plainStreets. Which is 27% improvement. Size wise improvements aren't so big just 4% improvement.
Adding custom serializer for Coordinates and/or Vertex improves read+writing speed a litle. (5.3%) from fast serializer Adding Vertex raises size a little. (1% worse) from normal fast_serializer. Adding PlainSerializer doesn't improve speed and size.
Adding coordinate and Vertex custom serializers could bring 32% improvement speed wise and (3.4%) improvement size wise.
Adding coordinate in full graph would be easy. Adding serializers for all different Vertexes would be hard. One reason is because each needs a Graph on creation. And in serializer class we don't have Graph object. So we would need to go over all vertexes after serialization and add Graph object to them.
I also tested registering classes. Where you just write which classes will be used and name is used only first time otherwise only number is used.
This lowers speed a little 2% and improves size (less then 1%). But this test case has only 20 Plainstreets and more classes there are larger are savings.
Interestingly largest PlainStreet Derived classes are AreaEdges. Largest PlainStreet in Slovenia is 16 264 bytes. (there are only two objects this size), but largest areaEdge has 74 877 bytes and there is at least 20 of them.
I tested FST serialization with current master (61d7f21d66d84853edc2c77) with New York pbf from geofabrik. (New geometry optimizations have reduced graph so much that I can build it on my PC).
Plain master: graph build time 9 seconds. and 262M (274557152)
Plain FST: graph build time 10 seconds. and 265M (277130075)
FST with custom serializers for Vertex and coordinate: graph build time 7 seconds and 271M (283903975)
FST with custom serializers for StreetEdge, Vertex and coordinate: graph build time 12 seconds and 420M (439935905)
Build time is time from Writing graph LOG output to Graph Written. Full build times are around 2m30s
According to these results FST serialization doesn't improve building speed or size with new geometry optimizations
I was planning to test protostuff but I don't think it would improve.
unless you are intentionally working with legacy versions of OpenTripPlanner. Please consult the current documentation at readthedocs