-
Notifications
You must be signed in to change notification settings - Fork 273
suboptimal (?) transition probability metric #86
Comments
Not really, because the transition metric also uses the linear distance between both GPS positions (see also the discussion here and below). So the transition metric favors straight connections between both GPS coordinates. This makes sense because people usually don't make detours during a trip.
We are already computing fastest routes using fastest weighting instead of shortest routes. So in the map matched route, highways are actually preferred over side roads.
Only the transition metric is scaled by the square of the time difference and this only affects the transition probability but not the emission probability. The intention was to compensate for the fact the non-normalized transition metric is usually larger for larger time intervals between GPS coordinates but in practice this is usually not needed. Hence, I would recommend to use the non-normalized metric instead . |
Doesn't that illustrate my point? I.e. it favours straight connections, not the shortest/fastest connection. I'd assume that (unless they're particularly concerned about their tyres) individuals take the fastest/shortest route without regard for how straight it is.
Correct - but we compute the transition using distance metrics not time. So, while for a given candidate pair we may (depending on user configuration) pick the fastest route (i.e. highway vs side road), when comparing different candidate pairs (i.e. the transition probability in viterbi), we only use distance (so e.g. a side road may be favoured over the faster highway route).
OK - but let's continue the discussion above/below first. As a proposal, what about this? The transition probability is:
|
Hmm, but that's an assumption about the behavior of people (a behavioral prior), whereas the current transition probability only makes an assumption about how movement constrained by a road networks looks piecewise between observations (!) compared to straight lines, i.e. they are similar in length, and more so the fewer time has passed. That's a rather big theoretical difference. I'm not saying it's a bad idea, I'm just saying it opens up new issues. The argument is the same as with U-turns: People do take unlikely routes, and there's no immediate reason why map-matching should be biased towards 'likely' routes in a behavioral sense. I think the current transition probability is designed to minimize this bias. Maybe my vehicle is a snow plough and passes through all roads with uniform prior probability. |
Maybe what we are doing is wrong and we should completely admit all kinds of U-turns (as we agree, if it's only about distance, entering a link and going right back is continuous with not doing it, it's just epsilon away), and rather do some "behavioral smoothing" as a post-processing step? Something like, if a piece of the final account of what happened (the route) contains very little information (accounts for very little of the matching error) compared to its complexity (turning into a new street and coming right back), we drop it. |
(On second thought, maybe the "penalize turns, in general" idea which someone already had would do much the same but easier.) |
I just wouldn't think of it as "turns are unlikely", but as "turn instructions in the result must pay rent in increased accuracy." |
Maybe a general solution is already to let the user specify the costs (time?) for u-turns at measurements. Avoiding/allowing u-turns inside a route should be considered a separate problem.
What do you mean here? |
But why? What's the real-world reason why a U-turn at a measurement should be more or less likely than anywhere else? Having that parameter would look like an artifact of the algorithm/implementation. :-( |
U-turns for simple A to B routes are often meaningless or should be forbidden because e.g. turn restrictions could be tricked and it is not easy to decide whether a u-turn makes sense or is allowed on a road. (Or to explain it differently: we just have not the necessary data from OSM to judge about this good enough IMO) And for measurements or 'via point' in general it is a lot more likely to have u-turns at those points because it could be a 'not continuing' route like for GPS traces of taxis picking up passengers and using the opposite direction for the next route. If we would have good rules (and all the necessary good street data!) when to introduce U-turns in the general routing then probably we could avoid this separate handling and could even derive the more exact place of the happened U-turn instead of assuming this from the measurements (hopefully frequent enough). |
I agree that we make an assumption about people (and, as above, we could allow people to choose their behaviour e.g. fastest/shortest/straightest). However, as above, I think 'people travel the fastest route' is a much more reasonable assumption than 'people travel in straight lines'. My guess is that the original intention was to favour shorter routes, but as above, this can fail (as it prefers straighter routes over shorter ones). In addition, as above, 'the fewer time has passed' is clearly wrong - there is no direct link between the straightness of a path (as determined this way) and the time taken to travel it.
I believe our algorithm should give the closest result to reality, in as many cases as possible (optionally with some configuration to specify the case, e.g. "I'm in a car", or "I prefer shortest distance" or "I'm a hyper-miler", etc.). To say that something unlikely can happen with equal likelihood as something more likely is (obviously) not going to help. Otherwise, we might as well give up - if we try to allow unlikely cases (e.g. your example of people driving back and forth just so they can see it on a map), then we'll never get anywhere. Put another way, in this context the most unbiased solution would be to return the entire road network as a solution (i.e. 'you could have gone anywhere at anytime'), or even the original GPX track (i.e. 'you travelled somewhere that gave you these GPX points'). That's clearly not useful to users. As to bias - we currently have a systematic bias toward unlikely events occurring more than in reality. That's, well, biased = )
We are already doing some 'behavioural smoothing' - we assume that people travel 'sensibly' on roads. It's not explicitly called it in the code, but that's the intention of most of it. That said, our smoothing is only between candidates. We can try to add some behavioural smoothing across candidates (good work @stefanholder on U-turns), though we're limited in that we're currently a first order HMM solved by standard viterbi (i.e. we only know about the previous candidate). If e.g. we were second order (i.e. know two previous candidates) then we could more easily penalize (in a configurable way) U-turns and the like.
@karussell's already explained this, but to continue in my terminology - we already have a systematic bias toward allowing U-turns at measurements as opposed to within a route i.e. they are already an 'artifact of the algorithm/implementation'. |
Favoring straight connections is equivalent to favoring shortest routes because the linear distance between A and B is a lower bound for the length of any path between A and B.
When comparing different candidate pairs, we are actually comparing the distances of the fastest routes between the candidate pairs. But I agree that this is somewhat inconsistent. If we have low frequency GPS data and timestamps are given for GPS measurements (which we should not assume in general), we could compute the absolute difference between Another important point is if/how to normalize the transition metric. In this online map matching paper, the transition metric is normalized with the path length. Another alternative would be to normalize with the linear distance of the candidate pair. Either way, normalizing by a distance instead of normalizing by a time has the advantage that we don't even need timestamps. Before implementing a new transition metric, we should verify its probability distribution empirically as done by Newson & Krumm (Figure 5).
I agree.
I think that the (first-order) HMM is a powerful tool, which allows us to try out all kinds of the emission/transition metrics and probability distributions. So far I don't see why we should use any addition processing steps.
IMO a first-order HMM is sufficient to penalize U-turns or other turns but I would be happy to be proven wrong (ideally with a sample GPX trace). Moreover, using directed candidates with a first-order HMM was rather easy to do. For me the complicated part was figuring out how to use the Graphhopper internals. Having said this, I would be happy for any contributions to https://github.com/bmwcarit/hmm-lib. |
BTW: Thanks for doing the necessary work! And always appreciated to improve stuff or make it more clear :) ! |
I understand that, but it only applies when you're comparing distances between fixed end points A and B. In our case we have a fixed start point but multiple (candidate) end points. See my first bullet point above for an example.
Except then it may become unrealistic (without traffic data). E.g. if you're on a highway and there's a traffic jam (and hence the time difference is large), it might prefer a longer side road. Again, I still feel the 'safest' is 'favour the fastest candidate pair'. This does fail (as sometimes people take slower routes) but I suspect less so than others. Of course, #89 would help. As an aside - are we trying to cover the case of no timestamp info? If so, it seems to me it should need a different algorithm - adding timestamp info adds more information, so we should be able to do better than without it (and hence we need different approaches).
I'm still not fully understanding why we need a normalization - is this so that transitions probabilities are always comparable to emission probabilities, regardless of path length? (E.g. if transition was just Is it possible to normalize such that all transition possibilities (for a given to/from step pair) sum to one? If we also do the same with emission probabilities, then it might be OK.
Agreed.
I still can't figure it out, so I'll have a look at your code, and see if that makes it clearer. Finally - does it make more sense to put this aside and work on #89 first? Otherwise all we have are our opinions (biased by our use cases etc.). In addition, other work is being doing (e.g. #88) which has changed the transition metric anyway. |
The non-normalized transition metric is Math.abs(linearDistance - routeLength), where
Assuming that linearDistance < routeLength, which is usually the case, then for any two candidate pairs the one with shorter route length will have a higher transition probability. The same also holds for the normalized transition metric. This means that in your example the candidate that is only 10m away would be favored.
Not sure if I get your point, but just using the time of the fastest route or the length of the shortest route might also work.
I think that the user should be able to specify if timestamps should be used. In some cases there are just no (correct) timestamps available. The non-normalized metric (see above) actually does not require any timestamps. I don't think that we need a totally different approach if timestamps are available, we can then just perform some addition validity checks. If there are ideas for totally different approaches with timestamps we can try them out with #89.
Yes but I would say regardless of the time interval between GPX entries.
The problem I see with this is that transition probabilities would be higher for time steps with few transitions and lower for time steps with many transitions. This is bad because transition probabilities of different time steps also need to be balanced.
Agreed but it was also good to do some brainstorming on possible transition metrics. |
Ah, that's the key point I missed - for my use case I've had to use snapped coordinates (for other reasons), so it was my fault all along! I now (finally) see that it's effectively favouring the spatially shortest route between events. Thanks for taking the time to get it through to me. To your above point about time point - yes, we could use that, but we could also use spacetime distance. So we can give the user options, and use #89 as you say.
My point is that, as a general rule, if you add data to a model, it gets better. So while we can have one model that works without timestamp data, I suspect a model specifically built to include timestamp data may be better.
Are you saying it would give different results if e.g. the transition probabilities for A -> B were scaled by a factor of a 100 (arbitrarily) vs those from B -> C? I.e. absolute size across different timesteps matters? I had assumed this wasn't the case. |
As an aside, I'm having reasonable success with just using a scaled route distance (e.g. To be fair, my use case is quite special - I've refactored to route using only real nodes (as mentioned here), and I generally have long separation between events (e.g. kilometres), and my linear distance is between candidates (not GPX tracks, as above). So it definitely won't apply in general. However, it's much less 'noisy' - I was seeing quite a lot of cases where e.g. a long route was preferred instead of the 'obvious' short one, or some silly back-and-forth ones, and they're largely gone. |
I was actually wrong. Transition and emission probabilities for individual time steps can be scaled arbitrarily without changing the Viterbi results. However, this also means that scaling emission/transition probabilities cannot be used to normalize the transition metric. So scaling/normalizing only makes sense for the transition metric itself but not for the transition probability. Note that subtracting the linear distance from the route length is also a form of normalization and maybe this suffices.
You wrote that you just used |
Sorry, I'm not following why we can't just normalize the transition probabilities (for each step), regardless of metric? Actually, should transition/emission probabilities be normalized (i.e. sum to 1) by definition? I'm not an expert, but e.g. the Wikipedia examples use this property. Anyway, the additional benefit of this is that we could (depending how we do it) remove sigma/beta parameters completely. E.g. transition probability for A -> B as
And we can easily change
There isn't - it was more that if I use the existing metric (and linear distance) it doesn't work as well as it does for other matching tasks (where linear distance is that between GPX points). |
We could do this but this wouldn't change anything. The Viterbi algorithm finds the candidate sequence with highest probability and the probability of a candidate sequence is defined as the product of all emission and transition probabilities on this sequence. If we multiply all transition probabilities for a given to/from time step pair with a normalizing constant then the probability of each candidate sequence is multiplied with this normalizing constant and the order of all candidate sequences from highest to lowest probability stays the same. So the effect of this is similar to multiplying the numerator and denominator of a fraction with the same number.
We use the Gaussian/exponential distribution to convert the emission/transition metrics to probabilities. Since both these probability distributions have continuous random variables (the metrics), individual emission and transition probabilities (or more precisely probability densities) may even be greater than 1. Another way to see this is that we can have arbitrarily many candidates at each time step by making the search radius bigger and bigger.
Not sure what you mean exactly but the Viterbi algorithm usually just uses maximum and product to compute the probability of the most likely sequence. In my Viterbi implementation I add log probabilities instead of multiplying probabilities for better numerical stability but this is just an implementation detail.
The purpose of the Gaussian and exponential distribution we use is not just to convert a metric into a normalized number but to model the real distribution of the emission and transition metric. In Newson & Krumm, Figure 5 you can see that the exponential distribution is indeed a good model for our transition metric. |
What you describe is what I was hoping to hear = ) Doesn't it show that there's no need to normalize? That said, I think there is a need to normalize such that emisison and transition probabilities both contribute materially to the result. (E.g. if emission were constantly near 1, and transition near 0.001, then the match could end up similar to standard waypoint graphhoppering.) Hence, if we do normalize, why not use something simple like the sum equals one? Then we can set a single parameter for deciding how much to favour emission versus transition (which currently isn't really possible).
I don't put too much weight in that. They've plotted a contrived variable against another contrived variable (which isn't normalized by time?), and have achieved an approximate alignment. That doesn't say much to me, especially as I'm guessing you can get very similar distributions many different ways. In addition, that only applies for their very specific data set, and one would really need to vary the data sets (and sampling rates) to decide this was best. Finally, we don't actually care about what the best model for transition/emission probabilities is - we care about what gives the best map-matching results. So, we're back to #89 - though discussing this sort of thing beforehand is good. |
Depends on which sum you want to be equals to one. As I wrote before, making all transition probabilities of each to/from time step sum to one doesn't make sense as it doesn't change the final scoring of candidate sequences.
Currently, only beta should be used for this. Sigma models the GPS noise and should be adjusted based on the GPS accuracy alone. Hence, we also have only one parameter for balancing emission vs. transition probabilities currently. I still see your point that you want to have only one parameter left and you're welcome to show with #89 if/how this can be achieved. |
That's what I mean = ) But we're repeating ourselves, so yes, let's wait until #89. @karussell is planning to merge map-matching into graphhopper soon, so I'll probably wait for that. |
Currently, our transition probability is higher for transitions which are close to a straight line. I see a few problems with this:
That all said, I suspect most of these will only have noticeable differences when routing over larger distances (e.g. hundreds of meters).
Thoughts @stefanholder ?
The text was updated successfully, but these errors were encountered: