-
Notifications
You must be signed in to change notification settings - Fork 273
WIP: handling viterbi breaks as multiple sequences #87
Conversation
@@ -166,6 +166,8 @@ public MatchResult doWork(List<GPXEntry> gpxList) { | |||
// Compute all candidates first. | |||
// TODO: Generate candidates on-the-fly within computeViterbiSequence() if this does not | |||
// degrade performance. | |||
// @kodonnell: it's not currently possible as that'd mean calling queryGraph.lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karussell / @stefanholder - happy for me to remove the above TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanholder I probably do not understand this yet deep enough, but why do you think this would mean calling queryGraph.lookup multiple times?
Furthermore: creating a QueryGraph is cheap so we could call it then multiple times but the resulting QueryResults are independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you meant me - but I think it requires calling queryGraph.lookup multiple times because we're wanting to get the candidates at each step. E.g. first step: get candidates, lookup, do viterbi next step; second step: get candidates, lookup, do viterbi next step, ...
If we create multiple queryGraphs, then we'll get duplicate virtual edge IDs, etc., which might be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that looking up all candidates at once might be a problem for map matching very long GPS traces, e.g. from Munich to Hamburg. But maybe it's better to create an issue for this than having this TODO in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @stefanholder - hopefully #90 will suffice.
// recent step does not get added i.e. 'computeMostLikelySequence' returns the most | ||
// likely sequence without this (breaking) step added. Hence we can use it to start | ||
// the next one: | ||
// TODO: check the above is true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanholder - I think this is true, but can you confirm? I guess I could check for it to some degree ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your comment is true.
final Path path = algo.calcPath(from.getQueryResult().getClosestNode(), | ||
to.getQueryResult().getClosestNode()); | ||
if (path.isFound()) { | ||
timeStep.addRoadPath(from, to, path); | ||
final double transitionLogProbability = probabilities | ||
.transitionLogProbability(path.getDistance(), linearDistance, timeDiff); | ||
timeStep.addTransitionLogProbability(from, to, transitionLogProbability); | ||
} else { | ||
// TODO: can we remove maxVisitedNodes completely and just set to infinity? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karussell - what is the purpose of having maxVisitedNodes in map-matching? In theory we can more easily prevent large graph traversals by limiting linear distance ... and then the user doesn't need to worry about configuring this option (and we have a 'physical' reason for throwing an error or finding no candidate - there were no transitions found within a given radius).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing an unlimited search is not good, especially not if you want roughly predicted maximum processing times. Linear distance is easier but graphs are often disconnected due to rivers etc where a distance limitation still can result in very costly traversals. Maybe we should use some good defaults and split the result instead of throwing an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. However, if I'm a user, what do I do if I get the above exception? If I want a map match, there's nothing I can do except increase maxVisitedNodes. So we might as well set it large by default anyway. If the user wants to handle the exceptions themselves (I can't really see why) then they can override the maxVisitedNodes to something smaller.
As an aside - the default GH behaviour is presumably to have no limit on maxVisitedNodes? Same argument applies there, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I want a map match, there's nothing I can do except increase maxVisitedNodes
If you have full control of the library&server then just use a maximum value, but defaults should be safe so that not a single request could occupy one CPU for dozens of seconds.
As an aside - the default GH behaviour is presumably to have no limit on maxVisitedNodes?
No. We have a max nodes limits too for non-CH routes. But the limits are much bigger because the number of points are typically a lot less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll leave it throwing an error for now. In future, we could maybe think of adding e.g. matchResult.isFound()
similar to path.isFound()
.
} | ||
// TODO: can we somewhere record that this route failed? Currently all viterbi | ||
// knows is that there's no transition possible, but not why. This is useful | ||
// for e.g. explaining why a sequence broke. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we split into multiple MatchResult (one per each sequence) then maybe we could add a 'sequenceStartReason' property? For the first sequence it will be 'first sequence' and for the rest it might be 'no candidates found' or 'no transitions found' etc. @karussell / @stefanholder ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, but I'm not sure if this is helpful for the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me neither. My thinking was that breaking into sequences is better than throwing an exception. However, I then thought that the user may want to know why the sequence broke. And it might be useful for diagnostics etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Having separated sequences is the way to go IMO. And if user requests specific reasons we can also fine tune this later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - will leave as a TODO for now.
private MatchResult computeMatchedEdges( | ||
List<List<SequenceState<GPXExtension, GPXEntry, Path>>> sequences, | ||
Map<String, EdgeIteratorState> virtualEdgesMap) { | ||
// TODO: remove gpx extensions and just add time at start/end of edge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karussell - can I do this? I'm not sure why else we save all the gpxExtensions to the given edge ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show what you want to do :) ? Currently not sure what you mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I want to remove all GPX extensions from edge matches (and hence match result), and add only the to/from time to the edge (which will be inferred from the GPX points and route times, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that in the future it should (correctly) associate all original GPXEntry's to one edge. See e.g. https://discuss.graphhopper.com/t/map-matchresult-to-gpxentry/977
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that should (hopefully) be easy - I need to find the matching GPX entry to infer the start/end time anyway. Separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate PR?
Would be welcome :) ! But you would need to use some additional data and so we keep the GPXExtensions in the EdgeMatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure - will discuss in the (hopefully soon) PR.
Yes, this makes sense.
I think that |
Ah, that's a good idea. So, to clarify, we basically want one MapMatch result which includes the final route, as well as all the original GPX entries. Each entry can be inspected to know:
I'd suggest we also add a more general status flag to the MapMatch result e.g. 'contains sequence breaks', so users can easily check for this. And maybe we can warn the user too. (Otherwise if it's used blindly, we may end up with lots of issues submitted asking why there is a 'break' in the matched route!) Anyway, with this, we should be able to describe all situations. However, as you say, most users will probably only be interested in the final route. @stefanholder - does that sound good to you? |
Yes, sounds good. I would put it like this: Each match entry contains:
|
Great, sounds good. I'll look over your other PR first, and then have a play with all of the comments above. |
You can ignore the code for now (it's still unfinished), but I'd like some feedback on progress thus far. Specifically:
The main changes remaining are
|
I would prefer this alternative because it's simpler and I think the user should have the option to disable the filtering.
I think it's better to have a separate class (
If there are no candidates for a GPX entry then |
Thanks @stefanholder - I will follow your first two preferences.
More if e.g. we have a sequence A->C then an HMM break so start a new sequence D->F. In this case, there's no (explicit) information about what happened between C and D - we could leave it to the user to interpret, or I could add an 'unknown'/'empty' sequence C->D which really only contains the information about the start/end points (including time) but e.g. the route between them will be 'unknown'. Does that make sense? |
To keep PR from getting even bigger, I created #92 for allowing customisable filtering. Otherwise, this PR is stabilising somewhat. I've renamed a lot of the entities - the names were misleading, and were confusing me (let alone new developers). E.g. GPXExtension -> Candidate, TimeStep -> ViterbiMatchEntry. I've updated the list above with some more tasks remaining too.
For now I'll leave this out - it's non-essential, and isn't too hard to add later if it's desired. |
If an HMM break happens because there is no transition between C and D then we can set the HMM break status for the |
No, I didn't look at it closely yet either. |
public final long toTime; | ||
|
||
public MatchEdge(EdgeIteratorState edge, long fromTime, long toTime) { | ||
assert edge != null : "edge should not be null"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a precondition of a public method is violated, an exception should rather be thrown (NullPointerException in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - may be a few more of those. Will look now.
* Time (inclusive, in milliseconds) when first began on this sequence. -1 if not set. | ||
* TODO: is in inclusive? | ||
*/ | ||
private long fromTime = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer using Long and setting this to null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it final instead of long - it should never be null or -1 now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is even better.
* Time (inclusive, in milliseconds) when last was on this sequence. -1 if not set. | ||
* TODO: is in inclusive? | ||
*/ | ||
private long toTime = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
locked = true; | ||
} | ||
|
||
protected void saveMatchingState(int sequenceIdx, int sequenceMatchEdgeIdx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is called externally (e.g. here) and I don't want it to be available to users - I could be missing something, but isn't this the purpose of 'protected'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case you should use package-private (no modifier) instead of protected. Protected should only be used if you plan to access the method in derived classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, and thanks for pointing it out.
this.gpxEntry = gpxEntry; | ||
} | ||
|
||
protected void markAsNotUsedForMatching() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this protected, do you plan to inherit from this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
@@ -67,7 +67,7 @@ public int compare(QueryResult o1, QueryResult o2) { | |||
} | |||
}; | |||
|
|||
public TimeStep(MatchEntry matchEntry) { | |||
public ViterbiMatchEntry(MatchEntry matchEntry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ViterbiMatchEntry
sounds like this class represents the output of the Viterbi matching but it is actually the input of the Viterbi algorithm. How about HmmTimeStep
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember now why I made that change = ) Anyway, I agree with you.
// PathMerger pathMerger = new PathMerger(). | ||
// setDouglasPeucker(peucker). | ||
// setSimplifyResponse(wayPointMaxDistance > 0); | ||
// pathMerger.doWork(matchGHRsp, Collections.singletonList(path), tr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this commented out? If the code is really not used anymore it should be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I didn't know how to get it working - I'll update that when I get the web stuff working.
Assert.assertEquals("wpts[1].x should exists", -38.9999, link.getJSONArray("wpts").getJSONObject(1).get("x")); | ||
|
||
} | ||
// protected List<MatchEdge> getEdgeMatch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason you gave up on this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
* | ||
* A MatchEdge is an edge on a MatchSequence - i.e. one of the edges making up the final | ||
* map-matched route. This includes the time at which the edge was travelled. | ||
* | ||
* @author kodonnell | ||
*/ | ||
public class MatchEdge { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about MatchedEdge instead of MatchEdge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was trying to be consistent with (the original) MatchResult
as opposed to MatchedResult
- happy to change all if you wish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that consistency is needed here, so I would be happy with MatchedEdge
(the matched edge) and MatchResult
(the result is not matched, so I think this is fine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
README.md
Outdated
@@ -1,125 +0,0 @@ | |||
## Map Matching based on GraphHopper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that README.md was accidentally removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what I did there ...
* | ||
* @author Peter Karich | ||
*/ | ||
public class LocationIndexMatch extends LocationIndexTree { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have this logic in a separate class since this is a feature that could also be used for other purposes besides map matching. I think this class should actually be moved to the Graphhopper core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense for the method findCandidateLocations
to be part of HmmTimeStep
as it's currently laid out - it felt cleaner to me (mainly because the code is only used for this one purpose). However, if Graphhopper had this index logic implemented, then yes, it'd be nicer, and the findCandidateLocations
method would be much simpler. I'm not too worried though - if you want me to change it, I'm happy to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karussell, do you think that LocationIndexMatch should be moved to the Graphhopper core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
(probably either with different name or separate class method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it
/** | ||
* The matched sequence, as returned from viterbi.computeMostLikelySequence(). | ||
*/ | ||
public final List<SequenceState<Candidate, MatchEntry, Path>> matchedSequence; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the map matching result should not contain SequenceState objects since these include internal data that is not relevant for the user. Hence, the relevant information should be extracted from SequenceState into separate result objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could just make it private? The user information (edges, original points in sequence, etc.) can be obtained by the user without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to be package-private as it's used in MapMatching.java for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not review all changes in depth because there were just so many changes by refactorings.
Because I was reviewing and annotating this PR commit by commit, some of my recent comments are shown as outdated. I think they are still valid.
What about the remaining non-checked items?
- sequence-specific tests.
- refactoring the web GUI to handle sequences.
- update all doco
They're yet to be completed. Are you happy with the sequence-specific stuff of this PR (logic and API)? If so, I'll work on those additional items. |
I'm happy with it. But since the API has changed a lot, I think it's important that @karussell also has a look. |
Great. I'll review your other comments, and work on the remaining items. When that's done - yes, I think it'd be good to have @karussell review. |
Wow, getting the GUI to handle multiple paths was much more painful than I thought. Anyway: That's using this gpx. I also had to tweak locationIndexTree.findNClosest to make it not always return a result (within tiles but outside radius) - otherwise I couldn't break it into sequences. (See here.) Yes!! As mentioned, the GUI thus far is hacky (and not fully-functioning) - I almost think it'd be better to rewrite from scratch than to continue hacking around it. However, then I'd risk making breaking changes (i.e. features I don't understand) ... so I'm not sure. The unit tests are probably more important - I discovered some bugs even just testing this. We're starting to build our house, so this is becoming less of a priority for me. I might still do it, but my I shouldn't make promises I can't keep ... |
Cool, thanks a lot :) !
Sure :) ! |
Did these changes make into the latest code base? We're finding very few matches when inspecting GPXExensions of match result. We have hundreds of input GPX entries but GPXExtension contains in many cases only a single entry. |
@INRIX-Trang-Nguyen not that I know of. I'm closing this as I have no intention of actively working on it. |
Objective
If the viterbi sequence breaks (e.g. no candidates found for a given GPX entry, or no transitions between candidates) then instead of throwing an exception, split into a new sequence. As far as I'm aware there's currently no easy way for the user to do this manually, let alone easily.
Questions
Cc @karussell @stefanholder