-
Notifications
You must be signed in to change notification settings - Fork 273
WIP: handling viterbi breaks as multiple sequences #87
Changes from 2 commits
2c1a010
d979ffd
6b856f1
ade4037
4b24cbc
5832623
b6c3af4
c0e5574
2d184a2
791e53c
eed78bf
ac105e7
d6bf213
4e217d0
f629883
9e6cc60
9d6f84b
7f45557
4a7420a
13707e1
efb7b57
956e7d0
1dd88e8
56df0ab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -166,6 +166,8 @@ public MatchResult doWork(List<GPXEntry> gpxList) { | |
// Compute all candidates first. | ||
// TODO: Generate candidates on-the-fly within computeViterbiSequence() if this does not | ||
// degrade performance. | ||
// @kodonnell: it's not currently possible as that'd mean calling queryGraph.lookup | ||
// multiple times, which isn't supported. So, remove TODO? | ||
final List<QueryResult> allCandidates = new ArrayList<>(); | ||
List<TimeStep<GPXExtension, GPXEntry, Path>> timeSteps = createTimeSteps(gpxList, | ||
edgeFilter, allCandidates); | ||
|
@@ -182,11 +184,11 @@ public MatchResult doWork(List<GPXEntry> gpxList) { | |
final QueryGraph queryGraph = new QueryGraph(routingGraph).setUseEdgeExplorerCache(true); | ||
queryGraph.lookup(allCandidates); | ||
|
||
List<SequenceState<GPXExtension, GPXEntry, Path>> seq = computeViterbiSequence(timeSteps, | ||
gpxList, queryGraph); | ||
List<List<SequenceState<GPXExtension, GPXEntry, Path>>> sequences = | ||
computeViterbiSequence(timeSteps, gpxList, queryGraph); | ||
|
||
final EdgeExplorer explorer = queryGraph.createEdgeExplorer(edgeFilter); | ||
MatchResult matchResult = computeMatchResult(seq, gpxList, allCandidates, explorer); | ||
MatchResult matchResult = computeMatchResult(sequences, gpxList, allCandidates, explorer); | ||
|
||
return matchResult; | ||
} | ||
|
@@ -229,52 +231,77 @@ private List<TimeStep<GPXExtension, GPXEntry, Path>> createTimeSteps(List<GPXEnt | |
return timeSteps; | ||
} | ||
|
||
private List<SequenceState<GPXExtension, GPXEntry, Path>> computeViterbiSequence( | ||
/* | ||
* Run the viterbi algorithm on our HMM model. Note that viterbi breaks can occur (e.g. if no | ||
* candidates are found for a given timestep), and we handle these by return a list of complete | ||
* sequences (each of which is unbroken). It is possible that a sequence contains only a single | ||
* timestep. | ||
* | ||
* Note: we only break sequences with 'physical' reasons (e.g. no candidates nearby) and not | ||
* algorithmic ones (e.g. maxVisitedNodes exceeded) - the latter should throw errors. | ||
*/ | ||
private List<List<SequenceState<GPXExtension, GPXEntry, Path>>> computeViterbiSequence( | ||
List<TimeStep<GPXExtension, GPXEntry, Path>> timeSteps, List<GPXEntry> gpxList, | ||
final QueryGraph queryGraph) { | ||
final HmmProbabilities probabilities | ||
= new HmmProbabilities(measurementErrorSigma, transitionProbabilityBeta); | ||
final ViterbiAlgorithm<GPXExtension, GPXEntry, Path> viterbi = new ViterbiAlgorithm<>(); | ||
|
||
int timeStepCounter = 0; | ||
TimeStep<GPXExtension, GPXEntry, Path> prevTimeStep = null; | ||
final HmmProbabilities probabilities = new HmmProbabilities(measurementErrorSigma, | ||
transitionProbabilityBeta); | ||
ViterbiAlgorithm<GPXExtension, GPXEntry, Path> viterbi = new ViterbiAlgorithm<>(); | ||
final List<List<SequenceState<GPXExtension, GPXEntry, Path>>> sequences = | ||
new ArrayList<List<SequenceState<GPXExtension, GPXEntry, Path>>>(); | ||
TimeStep<GPXExtension, GPXEntry, Path> seqPrevTimeStep = null; | ||
for (TimeStep<GPXExtension, GPXEntry, Path> timeStep : timeSteps) { | ||
|
||
// if sequence is broken, then close it off and create a new viterbi: | ||
if (viterbi.isBroken()) { | ||
sequences.add(viterbi.computeMostLikelySequence()); | ||
seqPrevTimeStep = null; | ||
viterbi = new ViterbiAlgorithm<>(); | ||
} | ||
|
||
// always calculate emission probabilities regardless of place in sequence: | ||
computeEmissionProbabilities(timeStep, probabilities); | ||
|
||
if (prevTimeStep == null) { | ||
if (seqPrevTimeStep == null) { | ||
// first step of a sequence, so initialise viterbi: | ||
viterbi.startWithInitialObservation(timeStep.observation, timeStep.candidates, | ||
timeStep.emissionLogProbabilities); | ||
// it is possible viterbi is immediately broken here (e.g. no candidates) - this | ||
// will be caught by the first test in this loop. | ||
} else { | ||
computeTransitionProbabilities(prevTimeStep, timeStep, probabilities, queryGraph); | ||
// add this step to current sequence: | ||
computeTransitionProbabilities(seqPrevTimeStep, timeStep, probabilities, queryGraph); | ||
viterbi.nextStep(timeStep.observation, timeStep.candidates, | ||
timeStep.emissionLogProbabilities, timeStep.transitionLogProbabilities, | ||
timeStep.roadPaths); | ||
} | ||
if (viterbi.isBroken()) { | ||
String likelyReasonStr = ""; | ||
if (prevTimeStep != null) { | ||
GPXEntry prevGPXE = prevTimeStep.observation; | ||
GPXEntry gpxe = timeStep.observation; | ||
double dist = distanceCalc.calcDist(prevGPXE.lat, prevGPXE.lon, | ||
gpxe.lat, gpxe.lon); | ||
if (dist > 2000) { | ||
likelyReasonStr = "Too long distance to previous measurement? " | ||
+ Math.round(dist) + "m, "; | ||
} | ||
// if broken, then close off this sequence and create a new one starting with this | ||
// timestep. Note that we rely on the fact that if the viterbi breaks the most | ||
// recent step does not get added i.e. 'computeMostLikelySequence' returns the most | ||
// likely sequence without this (breaking) step added. Hence we can use it to start | ||
// the next one: | ||
// TODO: check the above is true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stefanholder - I think this is true, but can you confirm? I guess I could check for it to some degree ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, your comment is true. |
||
if (viterbi.isBroken()) { | ||
sequences.add(viterbi.computeMostLikelySequence()); | ||
viterbi = new ViterbiAlgorithm<>(); | ||
viterbi.startWithInitialObservation(timeStep.observation, timeStep.candidates, | ||
timeStep.emissionLogProbabilities); | ||
// as above, it is possible viterbi is immediately broken here (e.g. no | ||
// candidates) - this will be caught by the first test in this loop. | ||
} | ||
|
||
throw new RuntimeException("Sequence is broken for submitted track at time step " | ||
+ timeStepCounter + " (" + gpxList.size() + " points). " + likelyReasonStr | ||
+ "observation:" + timeStep.observation + ", " | ||
+ timeStep.candidates.size() + " candidates: " + getSnappedCandidates(timeStep.candidates) | ||
+ ". If a match is expected consider increasing max_visited_nodes."); | ||
} | ||
|
||
timeStepCounter++; | ||
prevTimeStep = timeStep; | ||
seqPrevTimeStep = timeStep; | ||
} | ||
|
||
return viterbi.computeMostLikelySequence(); | ||
// add the final sequence: | ||
sequences.add(viterbi.computeMostLikelySequence()); | ||
|
||
// check sequence lengths: | ||
int sequenceSizeSum = 0; | ||
for (List<SequenceState<GPXExtension, GPXEntry, Path>> sequence : sequences) { | ||
sequenceSizeSum += sequence.size(); | ||
} | ||
assert sequenceSizeSum == timeSteps.size(); | ||
|
||
return sequences; | ||
} | ||
|
||
private void computeEmissionProbabilities(TimeStep<GPXExtension, GPXEntry, Path> timeStep, | ||
|
@@ -301,70 +328,92 @@ private void computeTransitionProbabilities(TimeStep<GPXExtension, GPXEntry, Pat | |
for (GPXExtension from : prevTimeStep.candidates) { | ||
for (GPXExtension to : timeStep.candidates) { | ||
RoutingAlgorithm algo = algoFactory.createAlgo(queryGraph, algoOptions); | ||
// System.out.println("algo " + algo.getName()); | ||
final Path path = algo.calcPath(from.getQueryResult().getClosestNode(), | ||
to.getQueryResult().getClosestNode()); | ||
if (path.isFound()) { | ||
timeStep.addRoadPath(from, to, path); | ||
final double transitionLogProbability = probabilities | ||
.transitionLogProbability(path.getDistance(), linearDistance, timeDiff); | ||
timeStep.addTransitionLogProbability(from, to, transitionLogProbability); | ||
} else { | ||
// TODO: can we remove maxVisitedNodes completely and just set to infinity? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @karussell - what is the purpose of having maxVisitedNodes in map-matching? In theory we can more easily prevent large graph traversals by limiting linear distance ... and then the user doesn't need to worry about configuring this option (and we have a 'physical' reason for throwing an error or finding no candidate - there were no transitions found within a given radius). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doing an unlimited search is not good, especially not if you want roughly predicted maximum processing times. Linear distance is easier but graphs are often disconnected due to rivers etc where a distance limitation still can result in very costly traversals. Maybe we should use some good defaults and split the result instead of throwing an exception. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair point. However, if I'm a user, what do I do if I get the above exception? If I want a map match, there's nothing I can do except increase maxVisitedNodes. So we might as well set it large by default anyway. If the user wants to handle the exceptions themselves (I can't really see why) then they can override the maxVisitedNodes to something smaller. As an aside - the default GH behaviour is presumably to have no limit on maxVisitedNodes? Same argument applies there, I think. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If you have full control of the library&server then just use a maximum value, but defaults should be safe so that not a single request could occupy one CPU for dozens of seconds.
No. We have a max nodes limits too for non-CH routes. But the limits are much bigger because the number of points are typically a lot less. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I'll leave it throwing an error for now. In future, we could maybe think of adding e.g. |
||
if (algo.getVisitedNodes() > algoOptions.getMaxVisitedNodes()) { | ||
throw new RuntimeException( | ||
"couldn't compute transition probabilities as routing failed due to too small maxVisitedNodes (" | ||
+ algoOptions.getMaxVisitedNodes() + ")"); | ||
} | ||
// TODO: can we somewhere record that this route failed? Currently all viterbi | ||
// knows is that there's no transition possible, but not why. This is useful | ||
// for e.g. explaining why a sequence broke. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we split into multiple MatchResult (one per each sequence) then maybe we could add a 'sequenceStartReason' property? For the first sequence it will be 'first sequence' and for the rest it might be 'no candidates found' or 'no transitions found' etc. @karussell / @stefanholder ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good to me, but I'm not sure if this is helpful for the user? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Me neither. My thinking was that breaking into sequences is better than throwing an exception. However, I then thought that the user may want to know why the sequence broke. And it might be useful for diagnostics etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree. Having separated sequences is the way to go IMO. And if user requests specific reasons we can also fine tune this later on. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point - will leave as a TODO for now. |
||
} | ||
} | ||
} | ||
} | ||
|
||
private MatchResult computeMatchResult(List<SequenceState<GPXExtension, GPXEntry, Path>> seq, | ||
List<GPXEntry> gpxList, List<QueryResult> allCandidates, | ||
EdgeExplorer explorer) { | ||
private MatchResult computeMatchResult( | ||
List<List<SequenceState<GPXExtension, GPXEntry, Path>>> sequences, | ||
List<GPXEntry> gpxList, List<QueryResult> allCandidates, EdgeExplorer explorer) { | ||
// every virtual edge maps to its real edge where the orientation is already correct! | ||
// TODO use traversal key instead of string! | ||
final Map<String, EdgeIteratorState> virtualEdgesMap = new HashMap<>(); | ||
for (QueryResult candidate : allCandidates) { | ||
fillVirtualEdges(virtualEdgesMap, explorer, candidate); | ||
} | ||
|
||
MatchResult matchResult = computeMatchedEdges(seq, virtualEdgesMap); | ||
// TODO: sequences may be only single timesteps, or disconnected. So we need to support: | ||
// - fake edges e.g. 'got from X to Y' but we don't know how ... | ||
// - single points e.g. 'was at X from t0 to t1' | ||
MatchResult matchResult = computeMatchedEdges(sequences, virtualEdgesMap); | ||
computeGpxStats(gpxList, matchResult); | ||
|
||
return matchResult; | ||
} | ||
|
||
private MatchResult computeMatchedEdges(List<SequenceState<GPXExtension, GPXEntry, Path>> seq, | ||
Map<String, EdgeIteratorState> virtualEdgesMap) { | ||
List<EdgeMatch> edgeMatches = new ArrayList<>(); | ||
private MatchResult computeMatchedEdges( | ||
List<List<SequenceState<GPXExtension, GPXEntry, Path>>> sequences, | ||
Map<String, EdgeIteratorState> virtualEdgesMap) { | ||
// TODO: remove gpx extensions and just add time at start/end of edge. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @karussell - can I do this? I'm not sure why else we save all the gpxExtensions to the given edge .. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you show what you want to do :) ? Currently not sure what you mean There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry - I want to remove all GPX extensions from edge matches (and hence match result), and add only the to/from time to the edge (which will be inferred from the GPX points and route times, etc.). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea is that in the future it should (correctly) associate all original GPXEntry's to one edge. See e.g. https://discuss.graphhopper.com/t/map-matchresult-to-gpxentry/977 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, that should (hopefully) be easy - I need to find the matching GPX entry to infer the start/end time anyway. Separate PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Would be welcome :) ! But you would need to use some additional data and so we keep the GPXExtensions in the EdgeMatch? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure - will discuss in the (hopefully soon) PR. |
||
double distance = 0.0; | ||
long time = 0; | ||
EdgeIteratorState currentEdge = null; | ||
List<EdgeMatch> edgeMatches = new ArrayList<>(); | ||
List<GPXExtension> gpxExtensions = new ArrayList<>(); | ||
GPXExtension queryResult = seq.get(0).state; | ||
gpxExtensions.add(queryResult); | ||
for (int j = 1; j < seq.size(); j++) { | ||
queryResult = seq.get(j).state; | ||
Path path = seq.get(j).transitionDescriptor; | ||
distance += path.getDistance(); | ||
time += path.getTime(); | ||
for (EdgeIteratorState edgeIteratorState : path.calcEdges()) { | ||
EdgeIteratorState directedRealEdge = resolveToRealEdge(virtualEdgesMap, | ||
edgeIteratorState); | ||
if (directedRealEdge == null) { | ||
throw new RuntimeException("Did not find real edge for " | ||
+ edgeIteratorState.getEdge()); | ||
} | ||
if (currentEdge == null || !equalEdges(directedRealEdge, currentEdge)) { | ||
if (currentEdge != null) { | ||
EdgeMatch edgeMatch = new EdgeMatch(currentEdge, gpxExtensions); | ||
edgeMatches.add(edgeMatch); | ||
gpxExtensions = new ArrayList<>(); | ||
EdgeIteratorState currentEdge = null; | ||
for (List<SequenceState<GPXExtension, GPXEntry, Path>> sequence : sequences) { | ||
GPXExtension queryResult = sequence.get(0).state; | ||
gpxExtensions.add(queryResult); | ||
for (int j = 1; j < sequence.size(); j++) { | ||
queryResult = sequence.get(j).state; | ||
Path path = sequence.get(j).transitionDescriptor; | ||
distance += path.getDistance(); | ||
time += path.getTime(); | ||
for (EdgeIteratorState edgeIteratorState : path.calcEdges()) { | ||
EdgeIteratorState directedRealEdge = | ||
resolveToRealEdge(virtualEdgesMap, edgeIteratorState); | ||
if (directedRealEdge == null) { | ||
throw new RuntimeException( | ||
"Did not find real edge for " + edgeIteratorState.getEdge()); | ||
} | ||
if (currentEdge == null || !equalEdges(directedRealEdge, currentEdge)) { | ||
if (currentEdge != null) { | ||
EdgeMatch edgeMatch = new EdgeMatch(currentEdge, gpxExtensions); | ||
edgeMatches.add(edgeMatch); | ||
gpxExtensions = new ArrayList<>(); | ||
} | ||
currentEdge = directedRealEdge; | ||
} | ||
currentEdge = directedRealEdge; | ||
} | ||
gpxExtensions.add(queryResult); | ||
} | ||
gpxExtensions.add(queryResult); | ||
} | ||
|
||
// we should have some edge matches: | ||
if (edgeMatches.isEmpty()) { | ||
String sequenceSizes = ""; | ||
for (List<SequenceState<GPXExtension, GPXEntry, Path>> sequence : sequences) { | ||
sequenceSizes += sequence.size() + ","; | ||
} | ||
throw new IllegalStateException( | ||
"No edge matches found for path. Too short? Sequence size " + seq.size()); | ||
"No edge matches found for path. Only single-size sequences? Sequence sizes: " | ||
+ sequenceSizes.substring(0, sequenceSizes.length() - 1)); | ||
} | ||
EdgeMatch lastEdgeMatch = edgeMatches.get(edgeMatches.size() - 1); | ||
if (!gpxExtensions.isEmpty() && !equalEdges(currentEdge, lastEdgeMatch.getEdgeState())) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karussell / @stefanholder - happy for me to remove the above TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanholder I probably do not understand this yet deep enough, but why do you think this would mean calling queryGraph.lookup multiple times?
Furthermore: creating a QueryGraph is cheap so we could call it then multiple times but the resulting QueryResults are independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you meant me - but I think it requires calling queryGraph.lookup multiple times because we're wanting to get the candidates at each step. E.g. first step: get candidates, lookup, do viterbi next step; second step: get candidates, lookup, do viterbi next step, ...
If we create multiple queryGraphs, then we'll get duplicate virtual edge IDs, etc., which might be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that looking up all candidates at once might be a problem for map matching very long GPS traces, e.g. from Munich to Hamburg. But maybe it's better to create an issue for this than having this TODO in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @stefanholder - hopefully #90 will suffice.