Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turning this into a general-purpose library #1

Open
dabreegster opened this issue Jul 1, 2021 · 31 comments
Open

Turning this into a general-purpose library #1

dabreegster opened this issue Jul 1, 2021 · 31 comments

Comments

@dabreegster
Copy link

Hi @Stunkymonkey, I'm currently using fast_paths in my A/B Street project. I stumbled across your code and out of curiosity, ran a quick benchmark for preparing a contraction hierarchy. I need to do more thorough tests, but initial results were promising -- a graph with 29,000 nodes takes 38 seconds with fast_paths to prepare, but only 18s with this code. I haven't dug into differences in implementation yet, so I have no idea why there's such a big difference -- maybe different heuristics for node ordering. I haven't looked at query speeds yet.

  1. I'd like to rearrange some code into a stand-alone crate that just has an API to prepare and query a CH. No OSM integration or grid lookups -- that could be layered on top of the base library. Does that sound like an architecture you'd be happy with?

  2. What's the license on this code? As long as it's Apache / BSD / something open source, that's fine by me.

Thanks for creating and publishing this work!

@dabreegster
Copy link
Author

I just had a look at your other repo, https://github.com/Stunkymonkey/prp and the paper there. Using personalized route planning to tune edge weights for agents that prefer cycling on quieter and flatter streets would be really nice to integrate. If you're open to it, I might experiment with using more of your code in A/B Street.

@Stunkymonkey
Copy link
Owner

i am happy to hear, that you tried my code. I would like to hear about advancements.

this project was programmed for a course at university of Stuttgart. therefore I decided to have a standalone application. I never thought about making a separate crate. But this should not be super hard. Everything (except the grid) after this line does the contraction.

About the license i will ask my supervisor, if there are decisions about licenses.

to find out why there is a speed differences is very hard. there are many things, that are different.

I do not really understand the heuristic he is using, but I can only tell you, that in my case the contraction is used for calculating the heuristic and calculated again when the node should get contracted. (It is faster, then storing the resulting shortcuts, then recalculation.)

another difference might be related to the way i store my graph during contraction. fast_paths stores edges in an array with an array for each node for storing the ways. This way the implementation is much simpler.
In my implementation i store all edges in a single array but have a separate offset array.
This way i only access 2 arrays, but fast_paths accesses one big array but for every other node another array. This way there are a lot of cache misses happening.
Downside: after every contracted node, the offset has to be recalculated again. To reduce these updates. I greedily calculate an independent set, which produces a number of nodes, these nodes can be contracted all at the same time in parallel.

another difference while contracting:

  • fast_paths calculates "is there a way without the contracting node"
  • osm_ch calculates "is there a shorter path then the existing one, when contracting" (produces slightly more shortcuts at the edges of the graph)

i also know one speed up for query-time that is not implemented yet in my implementation: rank-node-ordering.
Placing nodes with higher ranks at the end of the array, because walking forward in an array is faster, then walking backwards.

the query here is implemented very ugly and has a lot of redundancy. Better have a look at the prp-query-dijkstra-pch one.

The prp repo is my master-thesis and is still in development. try it, but the contraction takes much longer.

please also have a look at query times. i do not expect it to be faster, but lets see.

@Stunkymonkey
Copy link
Owner

maybe also reuse some code and move it to fast_paths instead of creating another library

@Stunkymonkey
Copy link
Owner

I just confirmed, that I can choose the license for this. And decided for MIT.

@dabreegster
Copy link
Author

I just confirmed, that I can choose the license for this. And decided for MIT.

Awesome, thank you! It'll be a while before I have time to work on turning the code into a standalone library, but I'll keep you posted when I start.

In my implementation i store all edges in a single array but have a separate offset array.
osm_ch calculates "is there a shorter path then the existing one, when contracting" (produces slightly more shortcuts at the edges of the graph)

@easbar, any thoughts on if either of these would be beneficial to use in fast_paths?

@easbar
Copy link

easbar commented Jul 7, 2021

@easbar, any thoughts on if either of these would be beneficial to use in fast_paths?

I'll have to look at the implementation here, but of course if there is anything we can improve in fast_paths we can do that.

another difference while contracting:
fast_paths calculates "is there a way without the contracting node"
osm_ch calculates "is there a shorter path then the existing one, when contracting" (produces slightly more shortcuts at the edges of the graph)

Didn't we try this here: easbar/fast_paths#16 ?

please also have a look at query times. i do not expect it to be faster, but lets see.

I did not spend too much time tuning the heuristics etc. but possibly I was compromising for faster queries rather than faster preparation in a few places. A direct comparison (speed and correctness) of this repo with fast_paths would certainly be interesting and if there are any tricks that can speed up the preparation by a factor of two I would be very interested.

a graph with 29,000 nodes takes 38 seconds with fast_paths to prepare, but only 18s with this code.

Honestly, just judging from the number of nodes this is very slow and I still don't know where this comes from. Say if you created a graph from OSM that simply connects points where OSM ways intersect, the contraction would be much faster (maybe like 2s for 30.000 nodes). c.f. this little table: https://github.com/easbar/fast_paths/#benchmarks
So something about the abstreet graph must be special and maybe the normal heuristics aren't working very well for the abstreet graph (and maybe the ones used here work better for some reason). I don't know, but this as well I would be very interested to find out. Knowing the preparation speed is crucial for abstreet and currently too slow would be a good motivation to find out :) So far I wasn't aware it is that critical tbh.

@easbar
Copy link

easbar commented Jul 7, 2021

but I can only tell you, that in my case the contraction is used for calculating the heuristic and calculated again when the node should get contracted. (It is faster, then storing the resulting shortcuts, then recalculation.)

I do this, too. To determine the contraction order the contraction is first simulated to e.g. count the shortcuts that would be created if a certain node was contracted. For example one simple heuristic is to contract nodes for which many shortcuts would be created are contracted last (otherwise the number of shortcuts might explode).

another difference might be related to the way i store my graph during contraction. fast_paths stores edges in an array with an array for each node for storing the ways. This way the implementation is much simpler.
In my implementation i store all edges in a single array but have a separate offset array.
This way i only access 2 arrays, but fast_paths accesses one big array but for every other node another array. This way there are a lot of cache misses happening.

This is interesting. One big advantage of the approach I am using is that more and more graph edges are removed during the contraction, so they do not have to be iterated in later witness searches (this is quite important for preparation speed). But maybe your implementation does this as well? Storing the graph in a single array and managing the offsets could be faster, yes. I never tried this. Updating the indices seems slow, you have to update all offsets for all nodes appearing after the node you just contracted, no?

Downside: after every contracted node, the offset has to be recalculated again. To reduce these updates. I greedily calculate an independent set, which produces a number of nodes, these nodes can be contracted all at the same time in parallel.

Note that nothing runs in parallel in fast_paths, to me it sounds like this could be the main difference. @Stunkymonkey do you just update the priorities for these nodes in parallel or do you run the actual contraction of some nodes in parallel? And for the latter, how do you make sure the nodes can be contracted independently? @dabreegster would it be an option for abstreet to run the preparation in parallel or do you have, for example, other processes running at the same time anyway and need the computational power for these already?

i also know one speed up for query-time that is not implemented yet in my implementation: rank-node-ordering.
Placing nodes with higher ranks at the end of the array, because walking forward in an array is faster, then walking backwards.

In fast_paths the nodes are ordered by rank.

@easbar
Copy link

easbar commented Jul 7, 2021

One more thing: The easiest way that I know of to trade preparation speed vs query speed is cancelling witness searches when a certain number of nodes have been explored. This way it can happen that more shortcuts will be introduced (slower queries), but the preparation will be faster, because some potentially long running witness searches will be cancelled. So @dabreegster if this is something you think would be useful we could implement a parameter that controls this, currently fast_paths does not have this parameter and only limits witness searches by maximum weight: https://github.com/easbar/fast_paths/blob/e40ab8383d56d5304a932013cd837513771bfcde/src/node_contractor.rs#L62

@Stunkymonkey do you do this as well?

@dabreegster
Copy link
Author

So something about the abstreet graph must be special and maybe the normal heuristics aren't working very well for the abstreet graph

It's quite possible! An overview of the graph for vehicles:

  • There's a node for each directed road segment. A road segment goes between exactly two intersections. Even for one-way roads, there's a node inserted for both directions, to support editing the road and adding a reverse lane later.
  • There are also a few nodes for "uber-turns", which are sequences of turns through a complex set of intersections. This is necessary for obeying multi-step turn restrictions in OSM, like https://www.openstreetmap.org/relation/4661067.
  • If a turn between two road segments is possible, there'll be an edge between the nodes.
  • The edge cost is a little complicated. For motor vehicles, it's distance / speed limit, with the distance capturing both the road segment and the turn. There are some penalties in there for unprotected left turns, all expressed in units of time.

Looking around a bit, the approach of making each road segment be a node (instead of intersections be nodes) is apparently called the "edge expanded model", with https://blog.mapbox.com/smart-directions-powered-by-osrms-enhanced-graph-model-3ae226974b2 and https://github.com/Project-OSRM/osrm-backend/wiki/Graph-representation explaining it.

One possible problem could be that the speed limit-based costs don't distinguish bigger roads enough. This is the area with 29,000 nodes (which are road segments and uber turns, as explained) with slow preparation:
Screenshot from 2021-07-08 09-23-11
Similar to OSM's color scheme, white roads are residential, and yellow/pink are arterials and highways. But Seattle has actually been setting low speed limits on arterials, so there's only about an 8 kmph difference between local roads and arterials. (Of course, people don't actually follow these in practice...) But if CHs and similar techniques work by finding the "important" roads, maybe part of the issue is that the important roads are hard to distinguish. Although that wouldn't explain why osm_ch had faster prep here.

Another issue might be the trick I'm playing to re-use node ordering later. When people edit a road, they might close it off to vehicles entirely, or maybe convert a one-way into a two-way. Since recalculating the CH from scratch is slow, I'm reusing the node ordering, and it's much faster. That's the reason why I'm inserting a node for every road segment in both directions, even if it's currently just a one-way.

Knowing the preparation speed is crucial for abstreet and currently too slow would be a good motivation to find out

It wasn't originally, because there weren't too many maps total to import, and I was prioritizing the user's experience (so querying matters). But since then, I've started importing hundreds of maps regularly, spreading out pathfinding queries over the course of the simulation (instead of doing all ~100k-1 million upfront), and working more with map edits (and so recalculating the CH with the node ordering). So preparation speed has become more of a focus.

would it be an option for abstreet to run the preparation in parallel or do you have, for example, other processes running at the same time anyway and need the computational power for these already?

Parallelism would be a great option to try. If we add it, I'd advocate for making it configurable, maybe with cargo features to not force all users to bring in the dependency of rayon or whatever else we use.

There's lots on my end I could experiment with:

  • finishing hooking up to osm_ch and testing query speeds
  • adjusting the fast_paths params
  • adjusting edge costs to penalize local roads more and accentuate arterials/highways

But no guarantee when I'll be able to get to them...

@dabreegster
Copy link
Author

I wound up refactoring abstreet's pathfinding code to cleanly separate the underlying graph implementation (petgraph, fast_paths, osm_ch) from the rest of the complexity. Everything works off of the fast_paths InputGraph, which can be easily transformed to other formats.

And the results for preparing the CH are crazy -- about a 2x speedup with osm_ch in a larger map: a-b-street/abstreet@3048075

The osm_ch fork I hacked together to try this: https://github.com/dabreegster/osm_ch/tree/prototype_lib

I've yet to dive into query performance or look at updating the CH and reusing node ordering. Only observation about why osm_ch might be faster so far is that the parallelism is indeed being used; my test machine has 16 cores, and all of them briefly lit up.

@Stunkymonkey
Copy link
Owner

Stunkymonkey commented Jul 11, 2021

Didn't we try this here: easbar/fast_paths#16 ?

yes

Note that nothing runs in parallel in fast_paths, to me it sounds like this could be the main difference. @Stunkymonkey do you just update the priorities for these nodes in parallel or do you run the actual contraction of some nodes in parallel? And for the latter, how do you make sure the nodes can be contracted independently? @dabreegster would it be an option for abstreet to run the preparation in parallel or do you have, for example, other processes running at the same time anyway and need the computational power for these already?

i calculate the priorities and the contraction all in parallel. But keep in mind, that you can only contract the nodes, which form an independent set. This means no neighboring nodes, can be contracted in one go/run. To get the nodes, that can be contracted: see here

In fast_paths the nodes are ordered by rank.

thats nice to know.

@Stunkymonkey do you do this as well?

I support aborting the contraction by percentage in the prp repo, which can easily be adapted. link

Similar to OSM's color scheme, white roads are residential, and yellow/pink are arterials and highways. But Seattle has actually been setting low speed limits on arterials, so there's only about an 8 kmph difference between local roads and arterials. (Of course, people don't actually follow these in practice...) But if CHs and similar techniques work by finding the "important" roads, maybe part of the issue is that the important roads are hard to distinguish. Although that wouldn't explain why osm_ch had faster prep here.

Not sure if this is correct.but maybe trying different heuristics could help

Another issue might be the trick I'm playing to re-use node ordering later. When people edit a road, they might close it off to vehicles entirely, or maybe convert a one-way into a two-way. Since recalculating the CH from scratch is slow, I'm reusing the node ordering, and it's much faster. That's the reason why I'm inserting a node for every road segment in both directions, even if it's currently just a one-way.

not contracting the whole graph would be advantageous here, because only a fraction would have to contract again.

Parallelism would be a great option to try. If we add it, I'd advocate for making it configurable, maybe with cargo features to not force all users to bring in the dependency of rayon or whatever else we use.

also keep in mind, that parallelism is not always faster (while testing on a 128 core machine, only using 4 cores was faster)

@easbar have a look at: https://github.com/Stunkymonkey/prp/blob/master/query/src/dijkstra/pch.rs This query code is much simpler, and easier to debug. Maybe you want to adapt it.

@Stunkymonkey
Copy link
Owner

This is interesting. One big advantage of the approach I am using is that more and more graph edges are removed during the contraction, so they do not have to be iterated in later witness searches (this is quite important for preparation speed). But maybe your implementation does this as well? Storing the graph in a single array and managing the offsets could be faster, yes. I never tried this. Updating the indices seems slow, you have to update all offsets for all nodes appearing after the node you just contracted, no?

while i am contracting there are two graphs in memory. One which has the current graph (with many shortcuts) and another one, which contains all the edges, to resolve the shortcuts from the other one. So i guess we are doing the same. After the contraction of nodes, all the connected edges, are moved to the second graph, and the shortcuts are inserted into the first one.

I do this, too. To determine the contraction order the contraction is first simulated to e.g. count the shortcuts that would be created if a certain node was contracted. For example one simple heuristic is to contract nodes for which many shortcuts would be created are contracted last (otherwise the number of shortcuts might explode).

i do the same, but also add the amount of contracted_neighbors. This way the graph is contracted in a more balanced way.

And the results for preparing the CH are crazy -- about a 2x speedup with osm_ch in a larger map: a-b-street/abstreet@3048075

thanks for testing. how big is you graph? i only tested my code with more then 1 million nodes.

The osm_ch fork I hacked together to try this: https://github.com/dabreegster/osm_ch/tree/prototype_lib

very nice, for testing. Maybe we can figure out the differences and adapt these changes to fast_path instead of making another library that does the same thing. @easbar is open to improvements.

@dabreegster
Copy link
Author

how big is you graph?

Tiny in comparison -- 30,000 nodes, with 50,000 edges

Maybe we can figure out the differences and adapt these changes to fast_path instead of making another library that does the same thing

Agreed! fast_paths already has a nice standalone API and supports queries with multiple start and end nodes, which abstreet also needs. I'd just love to get the performance boost too.

@dabreegster
Copy link
Author

Another motivating result: on my largest map (112,000 nodes), fast_paths preparation takes 245s, and osm_ch 91s.

@easbar, are you interested in porting over some of the osm_ch techniques? Would it help if I serialize and send some of these larger InputGraphs that I've been using to test?

@easbar
Copy link

easbar commented Jul 12, 2021

I created a little binary to compare fast_paths vs osm_ch here: easbar/fast_paths@7532c50

I am using @dabreegster's osm_ch_pre crate for this (I used this commit for the tests here: b622444). I ran the binary on my laptop for the test maps I included in the fast_paths repository. I also ran it on this NYC map. I ran the binary like this: cargo run --release main meta/test_maps/graph_ballard.gr.

First of all, the good news is that apparently fast_paths and osm_ch produce the same results for all these maps 👍!

map prep fast_paths (ms) query fast_paths (μs) prep osm_ch (ms) query osm_ch (μs) prep osm_ch single-core (ms) query osm_ch single-core (μs)
ballard 5654 44 4953 148 119 15249 146 123
23rd 698 19 513 54 46 1538 56 45
bremen_dist 385 14 554 140 26 1049 141 26
bremen_time 5621 10 5462 144 23 14781 137 25
NYC 11601 52 8570 1201 117 20455 1219 118

osm_ch's preparation is faster on four out of the five maps (it is actually slower for bremen_dist it seems). The routing queries are executed much faster by fast_paths, though. For NYC the fast_paths queries were more than x20 times faster. Update: The osm_ch query times are probably slowed down because the Dijkstra instance is re-created for every request! Update2: Yes, they were and I fixed this here: easbar/fast_paths@e16c8ab and updated the table. osm_ch queries are still slower than fast_paths, but especially on the larger maps they are much faster than what I measured first, which makes sense because in this case the memory allocation is most critical.

@dabreegster, @Stunkymonkey can you repeat the same experiment so we can agree on these findings? Also I'd like to repeat the experiment and control the number of threads osm_ch uses, i.e. I'd like to run osm_ch on a single thread to see how much of a difference the parallelization makes. How should we do this?

The last column shows the osm_ch preparation time when using a single thread. I used the RAYON_NUM_THREADS environment variable to control this. On a single thread fast_paths preparation is faster, but also this comparison is not 'fair', because osm_ch calculates the 'independent set' without taking advantage of it in this case. @Stunkymonkey can we disable the independent_set search and run this again to make a better single-core comparison?

@dabreegster can you share the abstreet maps you've been testing with recently? Maybe your biggest one? Ideally put the map into a text file using the same format we've been using so far (I think you did this for 23rd and ballard already back then): https://github.com/easbar/fast_paths/blob/e40ab8383d56d5304a932013cd837513771bfcde/src/input_graph.rs#L184-L209

i calculate the priorities and the contraction all in parallel. But keep in mind, that you can only contract the nodes, which form an independent set. This means no neighboring nodes, can be contracted in one go/run. To get the nodes, that can be contracted: see here

This is very interesting and I will definitely have a look at this.

I support aborting the contraction by percentage in the prp repo, which can easily be adapted. link

You mean you allow contracting only a certain fraction of all nodes? How does this affect query time? In my experience not contracting all nodes can speed up preparation but will slow down queries. Anyway this is also an interesting parameter that can be added easily. Actually, I rather meant cancelling witness searches once a certain number of nodes have been explored: #1 (comment)

not contracting the whole graph would be advantageous here, because only a fraction would have to contract again.

What do you mean here? We have to re-run the contraction after editing the weights unless we know in advance which edges won't be edited and e.g. do not contract the corresponding nodes in the first place, right?

@easbar have a look at: https://github.com/Stunkymonkey/prp/blob/master/query/src/dijkstra/pch.rs This query code is much simpler, and easier to debug. Maybe you want to adapt it.

What do you think is simpler about this code? You mean simpler than fast_paths query code? Anyway thanks for the pointer, I'll take a look. I remember I duplicated some of the code in fast_paths for the forward/backward, mostly because I did not know how to do this better in Rust.

while i am contracting there are two graphs in memory. One which has the current graph (with many shortcuts) and another one, which contains all the edges, to resolve the shortcuts from the other one. So i guess we are doing the same. After the contraction of nodes, all the connected edges, are moved to the second graph, and the shortcuts are inserted into the first one.

Yes, I do the same.

i do the same, but also add the amount of contracted_neighbors. This way the graph is contracted in a more balanced way.

Ok, I think I tried this before releasing the first version of fast_paths, but sure we could try this again.

@easbar, are you interested in porting over some of the osm_ch techniques? Would it help if I serialize and send some of these larger InputGraphs that I've been using to test?

Yes, totally! See above regarding the input graphs and I will try to include some of the possible differences/improvements we already identified. And obviously if you guys can figure out something I'm more than happy to hear about this as well.

An overview of the graph for vehicles:

Thanks for this summary explaining your graph model 👍 I will also try to find out why abstreet maps yield such a slow preparation, it's still a mystery to me.

But since then, I've started importing hundreds of maps regularly, spreading out pathfinding queries over the course of the simulation.

Does this mean you are preparing multiple maps simultaneously? If that's the case would it still be helpful to parallelize the preparation of a single map?

I hope this won't make this discussion harder to follow, but I created a separate issue in fast_paths to keep track of possible improvements we could/should try: easbar/fast_paths#33 And please leave a comment in case I forgot something.

@easbar
Copy link

easbar commented Jul 12, 2021

I updated my above results as I repeated the experiment using a single thread for osm_ch (see above).

@Stunkymonkey
Copy link
Owner

contracting the set of nodes in parallel has the downside of not following the optimal heuristic. But I thought since this is an heuristic it does not matter much.

When looking at the query times, I think something is wrong with my code. It would be interesting if your query could be used with the contraction osm_ch is producing. This way we can easily compare only the contraction. In the end we could decide on one method and form an easier decision.

I believe for such small graphs having parallelization is not the best idea. It simply produces a big overhead.

You mean you allow contracting only a certain fraction of all nodes? How does this affect query time? In my experience not contracting all nodes can speed up preparation but will slow down queries. Anyway this is also an interesting parameter that can be added easily. Actually, I rather meant cancelling witness searches once a certain number of nodes have been explored: #1 (comment)

yes only a certain fraction. I do not know the effect on query time versus your maximum weight. It is simply a trade-off between memory-space and query-time. When not contracting all nodes, the query times gets worse. But on the other side the size of the resulting graph is much smaller.

Also keep in mind, when you have a very big graph and want to contract all nodes, the query-time can decrease if there are nodes, with a couple of thousand edges. Maybe this could also be a parameter for not-contracting, if the number of edges (in and out) exceed a certain threshold?

@easbar
Copy link

easbar commented Jul 12, 2021

contracting the set of nodes in parallel has the downside of not following the optimal heuristic. But I thought since this is an heuristic it does not matter much.

Ah right the parallelization also changes the order in which nodes get contracted. So I should have included the query times when I set the number of threads to one. I'll update my table again. Update: I included the query times I measured when using a single-core and they seem to be not affected by a possibly different contraction order when using a multiple threads.

contracting the set of nodes in parallel has the downside of not following the optimal heuristic. But I thought since this is an heuristic it does not matter much.

Yes, using the same query code on the different contractions would be useful indeed. Maybe we can simply export the node ordering that osm_ch finds and use this with fast paths. fast_paths already has an api to create a contraction for a fixed node ordering.

I believe for such small graphs having parallelization is not the best idea. It simply produces a big overhead.

Yes, we should try with a bigger map as well.

When not contracting all nodes, the query times gets worse. But on the other side the size of the resulting graph is much smaller.

Yes.

Also keep in mind, when you have a very big graph and want to contract all nodes, the query-time can decrease if there are nodes, with a couple of thousand edges. Maybe this could also be a parameter for not-contracting, if the number of edges (in and out) exceed a certain threshold?

Maybe, yes.

@easbar
Copy link

easbar commented Jul 12, 2021

When looking at the query times, I think something is wrong with my code.

Could it be that osm_ch allocates memory for every query? fast_paths provides this PathCalculator API that allows re-using the same data structures for multiple sequential queries: https://github.com/easbar/fast_paths/#batch-wise-shortest-path-calculation

Update: Yes, I think this is the problem: https://github.com/dabreegster/osm_ch/blob/ce6d83de83dc9c8250f7391084b0de35637da834/pre/src/lib.rs#L67-L70

@Stunkymonkey
Copy link
Owner

Could it be that osm_ch allocates memory for every query?

I though preventing this. Maybe there is something wrong.

Update: Yes, I think this is the problem: https://github.com/dabreegster/osm_ch/blob/ce6d83de83dc9c8250f7391084b0de35637da834/pre/src/lib.rs#L67-L70

this is @dabreegster code. and yes this is very bad for query-performance reallocating the distance-array over and over again.

@easbar
Copy link

easbar commented Jul 12, 2021

this is @dabreegster code. and yes this is very bad for query-performance reallocating the distance-array over and over again.

Ok, yes. This way the query time comparison is probably useless. @dabreegster can you fix this and maybe also make the fields in Output public as it would be interesting to log some details about the created preparation.

@Stunkymonkey
Copy link
Owner

i will test the new query-design by not having duplicate code from my prp-repo.

@easbar
Copy link

easbar commented Jul 12, 2021

I fixed the unnecessary memory allocation and now the osm_ch queries are a lot faster, but still slower than fast_paths. To do this I forked @dabreegster's osm_ch_pre module and modified it here: https://github.com/easbar/osm_ch/tree/prototype_lib

I also update the above table accordingly.

@dabreegster
Copy link
Author

can you repeat the same experiment so we can agree on these findings?

I can run the other graphs later today, but for ballard:

  • fp preparation: 5087 ms

  • osm_ch preparation with 16 cores: 1974 ms

  • osm_ch preparation with 1 core: 11402 ms

  • fp total query time: 3805 ms

  • osm_ch total query time: 9424 ms

I believe that matches the ordering of your results.

can you share the abstreet maps you've been testing with recently? Maybe your biggest one?

I put some of the larger maps at https://www.dropbox.com/sh/h8gtszpcq46l31l/AAB1WBqJi6dcCP6V8XcGDfPTa?dl=0. We can check these into the fp repo if you want, but some of the files are getting bigger.

Does this mean you are preparing multiple maps simultaneously? If that's the case would it still be helpful to parallelize the preparation of a single map?

I'm still preparing the maps sequentially, because I've had trouble parallelizing some of the async code involved in importing. And regardless, adding parallelism inside CH preparation would help when updating the graphs after a user edits the map.

Could it be that osm_ch allocates memory for every query?

Yes, I didn't report any query time findings yet because I hadn't implemented that part yet. Thanks for fixing it!

@easbar
Copy link

easbar commented Jul 14, 2021

I put some of the larger maps at https://www.dropbox.com/sh/h8gtszpcq46l31l/AAB1WBqJi6dcCP6V8XcGDfPTa?dl=0. We can check these into the fp repo if you want, but some of the files are getting bigger.

Thanks, I will try.

I'm still preparing the maps sequentially, because I've had trouble parallelizing some of the async code involved in importing. And regardless, adding parallelism inside CH preparation would help when updating the graphs after a user edits the map.

Ok I see. But when you are updating the graphs do you re-use the node-ordering? I'm not sure if the parallelization will still yield a speed up when the node-ordering is already fixed.

Assuming you would accept a slower query time when in turn the preparation was faster, how much slower would be acceptable?

@dabreegster
Copy link
Author

But when you are updating the graphs do you re-use the node-ordering?

Yes, because it's faster than starting from scratch. I understand that any parallelization work would only help initial map import, not changing the map.

how much slower would be acceptable?

I don't have a hard cutoff here -- let's say 2-4 times slower queries might be OK. The rationale is that queries are spread out (as different trips start in the simulation), but editing the map happens at one time; the simulation can't resume until the CHs are updated and ready to serve queries again. So it might be worth the slowdown.

@easbar
Copy link

easbar commented Jul 14, 2021

Yes, because it's faster than starting from scratch. I understand that any parallelization work would only help initial map import, not changing the map.

I don't know. I just think it is less likely to yield a speed up than for the initial map, because there already is less work to be done per node, but hard to say without trying it.

@dabreegster
Copy link
Author

hard to say without trying it.

I'll try later today to reuse node ordering with osm_ch as it exists currently to see if the parallelization inside matters much or not, and report back

@easbar
Copy link

easbar commented Jul 14, 2021

I'll try later today to reuse node ordering with osm_ch as it exists currently to see if the parallelization inside matters much or not, and report back

Also maybe try running osm_ch on a single core but disable the 'independent set' search as this is adds overhead and is not necessary when there is only one thread.

pub fn get_independent_set(

I also opened a fast_paths issue where I'd like to find out why abstreet map preparation is especially slow: easbar/fast_paths#34. I added the South Seattle map to the repository for this.

@Stunkymonkey
Copy link
Owner

the independet_set search does not yield a speedup itself. But keep in mind, that the graph is not updated until the whole set is contracted. thereby the graph has to be updated one time for one independent_set. And the graph updating could yield an improvement.

@Stunkymonkey
Copy link
Owner

Sorry for not testing everything you guys talk about, but my master-thesis needs all my time i currently have.

what to test:

  • having same fixed ordering: what creates more shortcuts (without preparation time measurements) & has better query time
    • fast_paths calculates "is there a way without the contracting node"
    • osm_ch calculates "is there a shorter path then the existing one, when contracting"
  • using independent-set: how much worse are the preparation/query-times, but use the same dijkstra method found above. The diffucult thing here is, to decide how much is good at which part. It would be best to use the same query-code and have same export-orderings of (nodes & edges).
    I thing we can all agree, that the offset structure (osm_ch uses) does only make sense if we use the independent set. Otherwise recalculating the offset for the whole graph for a single node will be slower.
    • fast_paths: contracting single node, then update graph
    • osm_ch: contracting multiple nodes, then update graph-offsets
      • additionally this hard coded number / 4 could also be used to only consider /10 -> contraction would need longer, because the indepenedet set gets build by only looking at the lowest thenth of the heuristics. And thereby be closer to contracting single_nodes.
      • Also adjust this_constant which sets when the whole heuristic should be considered (when contraction is close to finish).
  • then the last thing: compare the query algos: have same export-orderings of (nodes & edges)
    • fast_paths: uses stall_on_demand
    • osm_ch: uses stall_on_demand but very redundant code
    • prp: nice deduplicated code, but stall_on_demand is not implemented, because for personalized route planing it is slower. But adding this would be not very hard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants