Routing in areas with little data is very slow #443

jbruechert · 2024-02-17T21:36:24Z

Queries in some areas are very slow (as in, multiple minutes).
As far as I can tell, this happens when there is a lot less data at the destination than at the start.

Reproducible examples on europe.motis-project.org include

Warsaw, Metro Marymont -> Riga Bus Station (They are on the same bus route)
München Hbf -> Verona Porta Nuova (Again, there is a nightjet covering the whole route)

Is there possibly some threshold when motis stops searching for other connections, that is too high in such cases?

felixguendling · 2024-02-18T12:32:25Z

TL;DR: it's not trivial.

To understand the problem (and fix it), it's good to know how MOTIS executes PreTrip queries (also called range query in the literature).

A range query asks for all optimal connections within a departure interval ¹. In contrast to the literature, we additionally require results to be not dominated by all journeys that start outside the interval (e.g. to not display a journey 10:00-17:00 if there's also 11:00-17:00 starting outside the interval). This is achieved by having an additional OnTrip (aka earliest arrival) search at "end of the interval" + 1. The results of this additional OnTrip request are only used to dominate non-optimal results from inside the interval but otherwise ignored. So the range query how we define it in MOTIS is "Give me all globally Pareto-optimal journeys that depart within interval [t1, t2]".

Now, there's trick how to make range-queries fast. From the initial RAPTOR paper ² Section 4.2 Range Queries: rRAPTOR:

[...] we order Ψ from latest to earliest, and then run RAPTOR for every τ ∈ Ψ in order. However, we keep the labels τk(p) between rounds instead of reinitializing them. To see why this is correct, note that this value of τk(p) corresponds to an intermediate journey departing from ps no earlier than journeys computed in the current run (recall that Ψ is ordered). Thus, if τk(p) is smaller, we also know how to reach p earlier. Hence, we can safely prune the current journey.

This means, we can reuse all labels from later departure times if we iterate departures in the interval from late to early, starting with the first earliest arrival query at t2+1 as discussed before to eliminate journeys that are only locally Pareto-optimal but are dominated by journeys that depart outside the interval.

As you might have noticed, the actual PreTrip request in MOTIS has even more parameters. Namely extend_interval_earlier, extend_interval_later (booleans) and min_connection_count (unsigned integer). Those parameters can be set to require a minimum number of connections to be returned if possible. Therefore, the search interval needs to be extended if the initial search interval did not contain more than min_connection_count connections. Interval extension is steered by the flags extend_interval_earlier and extend_interval_later to enable frontends to support earlier/later search without searching for connections that are already there. The search result returned by MOTIS contains the searched interval ³. This way, we can make sure that connection sets from different search requests cannot overlap.

Now, we come closer to the actual problem: in your case the search goes from an area where you have a Pareto-optimal connection maybe every 8 hours (on average). The initial search interval is 2h and it gets extended by 1h in each (possible) direction. The default setting for min_connection_count in the UI is 5. So to find 5 connections with a density of one very 8h you need to search ~40h of timetable. As you are searching from a very dense area where you have maybe one departure every second minute on average, you need to search 40h*30/h = 1200 earliest arrival searches.

Currently, the interval gets extended symmetrically, one hour in each direction. This has the big disadvantage that we need to reset all arrival labels - which is the main reason the search becomes so slow. The "trick" from the paper works as long as we search for earlier connections. In practice, only extending the interval to earlier (e.g. by setting extend_interval_later = false and extend_interval_earlier = true) is not a good option because most users expect to find only results that depart later than their given departure time.

Solutions:

To prevent the problem of extending the interval in the "wrong" direction (to "later"), we could make a guess how long the actual interval would need to be. This could be done by doing a simple regression with input parameters distance, number of departures at the source and output parameter "search interval size". Having a "interval guessing machine" to make sure we almost never need to extend the interval in the "wrong" direction would drastically speedup search in this case because we can always use the "trick" from the paper to keep the arrival labels.
Another option would be to swap sides in this case. If for example the next 24h in the destination area has less less arrivals than the initial interval has departures in the source areas, this could be used as a heuristic to search from destination to start instead of start to destination.
Another option that would probably speedup the range query in MOTIS in general would be to use something I would call "adhoc transfer patterns". Based on the transfer patterns paper ⁴ we can store transfer patterns (a directed acyclic graph (DAG) containing only transfer stations) while iterating the search interval. So for each departure time, we first try to find a connection based on the transfer-pattern DAG. The resulting arrival times (one for each number of transfers) can then be used to dominate when doing the "real" search. So we can cut the search once results are worse than what we were able to find using the adhoc transfer pattern. Since public transport has often repeating patterns, this technique has potential for a significant speedup.
Another option would be to just not use the PreTrip query type at all. The OnTrip queries do not suffer from this problem and should therefore be very fast.

Note that the opposite direction with an arrival interval works the same but basically everything is inverted since you search for the latest departure at the journey origin for each transfer, instead of the earliest arrival at the journey target. ↩
[PDF] Delling, Daniel, Thomas Pajor, and Renato F. Werneck. "Round-based public transit routing." Transportation Science 49.3 (2015): 591-604. ↩
The returned search interval is always the same as the queried search if extend_interval_earlier == false && extend_interval_later == false or min_connection_count=0 or if the initial interval already contained >= min_connection_count globally Pareto-optimal journeys. ↩
[PDF] Bast, Hannah, et al. "Fast routing in very large public transportation networks using transfer patterns." Algorithms–ESA 2010: 18th Annual European Symposium, Liverpool, UK, September 6-8, 2010. Proceedings, Part I 18. Springer Berlin Heidelberg, 2010. ↩

jbruechert · 2024-02-18T13:45:51Z

Thank you for the detailed writeup!

I'm not sure if I will be able to fix this myself, but from the few tests I made, setting min_connection_count: 1 gives fairly good results.

I will try to make motis abort the query if the connection gets closed (by a reverse proxy timeout) and / or add a timeout to motis itself for now, so it's not possible to overload a public instance with such requests.

felixguendling · 2024-02-18T13:53:03Z

I will try to make motis abort the query if the connection gets closed

I guess this will not be easy as a lot of code would be involved. You would need to pass something like a "stop token" to each operation. This token then needs to be checked regularly by the operation itself. So at least the net, ctx and nigiri dependencies would be involved as well as code in base/module.

jbruechert mentioned this issue Feb 18, 2024

Allow to set a timeout for searches #445

Merged

felixguendling mentioned this issue Feb 24, 2024

MOTIS Potential #452

Open

26 tasks

mority self-assigned this Mar 5, 2024

jbruechert mentioned this issue Mar 8, 2024

Collecting situations in which routing is slow #471

Open

mority removed their assignment May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Routing in areas with little data is very slow #443

Routing in areas with little data is very slow #443

jbruechert commented Feb 17, 2024

felixguendling commented Feb 18, 2024 •

edited

Loading

jbruechert commented Feb 18, 2024 •

edited

Loading

felixguendling commented Feb 18, 2024

Routing in areas with little data is very slow #443

Routing in areas with little data is very slow #443

Comments

jbruechert commented Feb 17, 2024

felixguendling commented Feb 18, 2024 • edited Loading

Footnotes

jbruechert commented Feb 18, 2024 • edited Loading

felixguendling commented Feb 18, 2024

felixguendling commented Feb 18, 2024 •

edited

Loading

jbruechert commented Feb 18, 2024 •

edited

Loading