-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turning this into a general-purpose library #1
Comments
I just had a look at your other repo, https://github.com/Stunkymonkey/prp and the paper there. Using personalized route planning to tune edge weights for agents that prefer cycling on quieter and flatter streets would be really nice to integrate. If you're open to it, I might experiment with using more of your code in A/B Street. |
i am happy to hear, that you tried my code. I would like to hear about advancements. this project was programmed for a course at university of Stuttgart. therefore I decided to have a standalone application. I never thought about making a separate crate. But this should not be super hard. Everything (except the grid) after this line does the contraction. About the license i will ask my supervisor, if there are decisions about licenses. to find out why there is a speed differences is very hard. there are many things, that are different. I do not really understand the heuristic he is using, but I can only tell you, that in my case the contraction is used for calculating the heuristic and calculated again when the node should get contracted. (It is faster, then storing the resulting shortcuts, then recalculation.) another difference might be related to the way i store my graph during contraction. another difference while contracting:
i also know one speed up for query-time that is not implemented yet in my implementation: rank-node-ordering. the query here is implemented very ugly and has a lot of redundancy. Better have a look at the The please also have a look at query times. i do not expect it to be faster, but lets see. |
maybe also reuse some code and move it to |
I just confirmed, that I can choose the license for this. And decided for |
Awesome, thank you! It'll be a while before I have time to work on turning the code into a standalone library, but I'll keep you posted when I start.
@easbar, any thoughts on if either of these would be beneficial to use in |
I'll have to look at the implementation here, but of course if there is anything we can improve in fast_paths we can do that.
Didn't we try this here: easbar/fast_paths#16 ?
I did not spend too much time tuning the heuristics etc. but possibly I was compromising for faster queries rather than faster preparation in a few places. A direct comparison (speed and correctness) of this repo with fast_paths would certainly be interesting and if there are any tricks that can speed up the preparation by a factor of two I would be very interested.
Honestly, just judging from the number of nodes this is very slow and I still don't know where this comes from. Say if you created a graph from OSM that simply connects points where OSM ways intersect, the contraction would be much faster (maybe like 2s for 30.000 nodes). c.f. this little table: https://github.com/easbar/fast_paths/#benchmarks |
I do this, too. To determine the contraction order the contraction is first simulated to e.g. count the shortcuts that would be created if a certain node was contracted. For example one simple heuristic is to contract nodes for which many shortcuts would be created are contracted last (otherwise the number of shortcuts might explode).
This is interesting. One big advantage of the approach I am using is that more and more graph edges are removed during the contraction, so they do not have to be iterated in later witness searches (this is quite important for preparation speed). But maybe your implementation does this as well? Storing the graph in a single array and managing the offsets could be faster, yes. I never tried this. Updating the indices seems slow, you have to update all offsets for all nodes appearing after the node you just contracted, no?
Note that nothing runs in parallel in fast_paths, to me it sounds like this could be the main difference. @Stunkymonkey do you just update the priorities for these nodes in parallel or do you run the actual contraction of some nodes in parallel? And for the latter, how do you make sure the nodes can be contracted independently? @dabreegster would it be an option for abstreet to run the preparation in parallel or do you have, for example, other processes running at the same time anyway and need the computational power for these already?
In fast_paths the nodes are ordered by rank. |
One more thing: The easiest way that I know of to trade preparation speed vs query speed is cancelling witness searches when a certain number of nodes have been explored. This way it can happen that more shortcuts will be introduced (slower queries), but the preparation will be faster, because some potentially long running witness searches will be cancelled. So @dabreegster if this is something you think would be useful we could implement a parameter that controls this, currently fast_paths does not have this parameter and only limits witness searches by maximum weight: https://github.com/easbar/fast_paths/blob/e40ab8383d56d5304a932013cd837513771bfcde/src/node_contractor.rs#L62 @Stunkymonkey do you do this as well? |
It's quite possible! An overview of the graph for vehicles:
Looking around a bit, the approach of making each road segment be a node (instead of intersections be nodes) is apparently called the "edge expanded model", with https://blog.mapbox.com/smart-directions-powered-by-osrms-enhanced-graph-model-3ae226974b2 and https://github.com/Project-OSRM/osrm-backend/wiki/Graph-representation explaining it. One possible problem could be that the speed limit-based costs don't distinguish bigger roads enough. This is the area with 29,000 nodes (which are road segments and uber turns, as explained) with slow preparation: Another issue might be the trick I'm playing to re-use node ordering later. When people edit a road, they might close it off to vehicles entirely, or maybe convert a one-way into a two-way. Since recalculating the CH from scratch is slow, I'm reusing the node ordering, and it's much faster. That's the reason why I'm inserting a node for every road segment in both directions, even if it's currently just a one-way.
It wasn't originally, because there weren't too many maps total to import, and I was prioritizing the user's experience (so querying matters). But since then, I've started importing hundreds of maps regularly, spreading out pathfinding queries over the course of the simulation (instead of doing all ~100k-1 million upfront), and working more with map edits (and so recalculating the CH with the node ordering). So preparation speed has become more of a focus.
Parallelism would be a great option to try. If we add it, I'd advocate for making it configurable, maybe with cargo features to not force all users to bring in the dependency of There's lots on my end I could experiment with:
But no guarantee when I'll be able to get to them... |
I wound up refactoring abstreet's pathfinding code to cleanly separate the underlying graph implementation (petgraph, fast_paths, osm_ch) from the rest of the complexity. Everything works off of the fast_paths And the results for preparing the CH are crazy -- about a 2x speedup with osm_ch in a larger map: a-b-street/abstreet@3048075 The osm_ch fork I hacked together to try this: https://github.com/dabreegster/osm_ch/tree/prototype_lib I've yet to dive into query performance or look at updating the CH and reusing node ordering. Only observation about why osm_ch might be faster so far is that the parallelism is indeed being used; my test machine has 16 cores, and all of them briefly lit up. |
yes
i calculate the priorities and the contraction all in parallel. But keep in mind, that you can only contract the nodes, which form an independent set. This means no neighboring nodes, can be contracted in one go/run. To get the nodes, that can be contracted: see here
thats nice to know.
I support aborting the contraction by percentage in the
Not sure if this is correct.but maybe trying different heuristics could help
not contracting the whole graph would be advantageous here, because only a fraction would have to contract again.
also keep in mind, that parallelism is not always faster (while testing on a 128 core machine, only using 4 cores was faster) @easbar have a look at: https://github.com/Stunkymonkey/prp/blob/master/query/src/dijkstra/pch.rs This query code is much simpler, and easier to debug. Maybe you want to adapt it. |
while i am contracting there are two graphs in memory. One which has the current graph (with many shortcuts) and another one, which contains all the edges, to resolve the shortcuts from the other one. So i guess we are doing the same. After the contraction of nodes, all the connected edges, are moved to the second graph, and the shortcuts are inserted into the first one.
i do the same, but also add the amount of contracted_neighbors. This way the graph is contracted in a more balanced way.
thanks for testing. how big is you graph? i only tested my code with more then 1 million nodes.
very nice, for testing. Maybe we can figure out the differences and adapt these changes to |
Tiny in comparison -- 30,000 nodes, with 50,000 edges
Agreed! |
Another motivating result: on my largest map (112,000 nodes), fast_paths preparation takes 245s, and osm_ch 91s. @easbar, are you interested in porting over some of the osm_ch techniques? Would it help if I serialize and send some of these larger |
I created a little binary to compare fast_paths vs osm_ch here: easbar/fast_paths@7532c50 I am using @dabreegster's osm_ch_pre crate for this (I used this commit for the tests here: b622444). I ran the binary on my laptop for the test maps I included in the fast_paths repository. I also ran it on this NYC map. I ran the binary like this: First of all, the good news is that apparently fast_paths and osm_ch produce the same results for all these maps 👍!
osm_ch's preparation is faster on four out of the five maps (it is actually slower for bremen_dist it seems). The routing queries are executed much faster by fast_paths, though. For NYC the fast_paths queries were more than x20 times faster. Update: The osm_ch query times are probably slowed down because the Dijkstra instance is re-created for every request! Update2: Yes, they were and I fixed this here: easbar/fast_paths@e16c8ab and updated the table. osm_ch queries are still slower than fast_paths, but especially on the larger maps they are much faster than what I measured first, which makes sense because in this case the memory allocation is most critical. @dabreegster, @Stunkymonkey can you repeat the same experiment so we can agree on these findings? The last column shows the osm_ch preparation time when using a single thread. I used the @dabreegster can you share the abstreet maps you've been testing with recently? Maybe your biggest one? Ideally put the map into a text file using the same format we've been using so far (I think you did this for 23rd and ballard already back then): https://github.com/easbar/fast_paths/blob/e40ab8383d56d5304a932013cd837513771bfcde/src/input_graph.rs#L184-L209
This is very interesting and I will definitely have a look at this.
You mean you allow contracting only a certain fraction of all nodes? How does this affect query time? In my experience not contracting all nodes can speed up preparation but will slow down queries. Anyway this is also an interesting parameter that can be added easily. Actually, I rather meant cancelling witness searches once a certain number of nodes have been explored: #1 (comment)
What do you mean here? We have to re-run the contraction after editing the weights unless we know in advance which edges won't be edited and e.g. do not contract the corresponding nodes in the first place, right?
What do you think is simpler about this code? You mean simpler than fast_paths query code? Anyway thanks for the pointer, I'll take a look. I remember I duplicated some of the code in fast_paths for the forward/backward, mostly because I did not know how to do this better in Rust.
Yes, I do the same.
Ok, I think I tried this before releasing the first version of fast_paths, but sure we could try this again.
Yes, totally! See above regarding the input graphs and I will try to include some of the possible differences/improvements we already identified. And obviously if you guys can figure out something I'm more than happy to hear about this as well.
Thanks for this summary explaining your graph model 👍 I will also try to find out why abstreet maps yield such a slow preparation, it's still a mystery to me.
Does this mean you are preparing multiple maps simultaneously? If that's the case would it still be helpful to parallelize the preparation of a single map? I hope this won't make this discussion harder to follow, but I created a separate issue in fast_paths to keep track of possible improvements we could/should try: easbar/fast_paths#33 And please leave a comment in case I forgot something. |
I updated my above results as I repeated the experiment using a single thread for osm_ch (see above). |
contracting the set of nodes in parallel has the downside of not following the optimal heuristic. But I thought since this is an heuristic it does not matter much. When looking at the query times, I think something is wrong with my code. It would be interesting if your query could be used with the contraction I believe for such small graphs having parallelization is not the best idea. It simply produces a big overhead.
yes only a certain fraction. I do not know the effect on query time versus your maximum weight. It is simply a trade-off between memory-space and query-time. When not contracting all nodes, the query times gets worse. But on the other side the size of the resulting graph is much smaller. Also keep in mind, when you have a very big graph and want to contract all nodes, the query-time can decrease if there are nodes, with a couple of thousand edges. Maybe this could also be a parameter for not-contracting, if the number of edges (in and out) exceed a certain threshold? |
Ah right the parallelization also changes the order in which nodes get contracted. So I should have included the query times when I set the number of threads to one.
Yes, using the same query code on the different contractions would be useful indeed. Maybe we can simply export the node ordering that osm_ch finds and use this with fast paths. fast_paths already has an api to create a contraction for a fixed node ordering.
Yes, we should try with a bigger map as well.
Yes.
Maybe, yes. |
Could it be that osm_ch allocates memory for every query? fast_paths provides this Update: Yes, I think this is the problem: https://github.com/dabreegster/osm_ch/blob/ce6d83de83dc9c8250f7391084b0de35637da834/pre/src/lib.rs#L67-L70 |
I though preventing this. Maybe there is something wrong.
this is @dabreegster code. and yes this is very bad for query-performance reallocating the distance-array over and over again. |
Ok, yes. This way the query time comparison is probably useless. @dabreegster can you fix this and maybe also make the fields in |
i will test the new query-design by not having duplicate code from my |
I fixed the unnecessary memory allocation and now the osm_ch queries are a lot faster, but still slower than fast_paths. To do this I forked @dabreegster's osm_ch_pre module and modified it here: https://github.com/easbar/osm_ch/tree/prototype_lib I also update the above table accordingly. |
I can run the other graphs later today, but for ballard:
I believe that matches the ordering of your results.
I put some of the larger maps at https://www.dropbox.com/sh/h8gtszpcq46l31l/AAB1WBqJi6dcCP6V8XcGDfPTa?dl=0. We can check these into the fp repo if you want, but some of the files are getting bigger.
I'm still preparing the maps sequentially, because I've had trouble parallelizing some of the async code involved in importing. And regardless, adding parallelism inside CH preparation would help when updating the graphs after a user edits the map.
Yes, I didn't report any query time findings yet because I hadn't implemented that part yet. Thanks for fixing it! |
Thanks, I will try.
Ok I see. But when you are updating the graphs do you re-use the node-ordering? I'm not sure if the parallelization will still yield a speed up when the node-ordering is already fixed. Assuming you would accept a slower query time when in turn the preparation was faster, how much slower would be acceptable? |
Yes, because it's faster than starting from scratch. I understand that any parallelization work would only help initial map import, not changing the map.
I don't have a hard cutoff here -- let's say 2-4 times slower queries might be OK. The rationale is that queries are spread out (as different trips start in the simulation), but editing the map happens at one time; the simulation can't resume until the CHs are updated and ready to serve queries again. So it might be worth the slowdown. |
I don't know. I just think it is less likely to yield a speed up than for the initial map, because there already is less work to be done per node, but hard to say without trying it. |
I'll try later today to reuse node ordering with osm_ch as it exists currently to see if the parallelization inside matters much or not, and report back |
Also maybe try running osm_ch on a single core but disable the 'independent set' search as this is adds overhead and is not necessary when there is only one thread. Line 149 in 9b5cc85
I also opened a fast_paths issue where I'd like to find out why abstreet map preparation is especially slow: easbar/fast_paths#34. I added the South Seattle map to the repository for this. |
the |
Sorry for not testing everything you guys talk about, but my master-thesis needs all my time i currently have. what to test:
|
Hi @Stunkymonkey, I'm currently using fast_paths in my A/B Street project. I stumbled across your code and out of curiosity, ran a quick benchmark for preparing a contraction hierarchy. I need to do more thorough tests, but initial results were promising -- a graph with 29,000 nodes takes 38 seconds with fast_paths to prepare, but only 18s with this code. I haven't dug into differences in implementation yet, so I have no idea why there's such a big difference -- maybe different heuristics for node ordering. I haven't looked at query speeds yet.
I'd like to rearrange some code into a stand-alone crate that just has an API to prepare and query a CH. No OSM integration or grid lookups -- that could be layered on top of the base library. Does that sound like an architecture you'd be happy with?
What's the license on this code? As long as it's Apache / BSD / something open source, that's fine by me.
Thanks for creating and publishing this work!
The text was updated successfully, but these errors were encountered: