Skip to content
This repository has been archived by the owner on Mar 1, 2021. It is now read-only.

WIP: handling viterbi breaks as multiple sequences #87

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
2c1a010
initial work for handling viterbi breaks as multiple sequences
kodonnell Dec 11, 2016
d979ffd
handle case where viterbi breaks immediately after initialization
kodonnell Dec 12, 2016
6b856f1
merge U-turn work
kodonnell Dec 30, 2016
ade4037
refactoring sequences
kodonnell Jan 13, 2017
4b24cbc
use MatchEntry internally instead of GPXEntry
kodonnell Jan 14, 2017
5832623
debug and fix tests
kodonnell Jan 15, 2017
b6c3af4
rename timestep -> viterbimatchentry
kodonnell Jan 15, 2017
c0e5574
fix other tests
kodonnell Jan 15, 2017
2d184a2
tidying calcpath and gpxfile/main
kodonnell Jan 15, 2017
791e53c
web stuff ...
kodonnell Jan 15, 2017
eed78bf
giving up on that test ...
kodonnell Jan 15, 2017
ac105e7
woops, don't need that anymore ...
kodonnell Jan 15, 2017
d6bf213
refactor + tidy + all tests passing
kodonnell Jan 31, 2017
4e217d0
contiguous sequences
kodonnell Jan 31, 2017
f629883
undo test change to fix test change
kodonnell Feb 1, 2017
9e6cc60
add logging in again as per @stefanholder's request
kodonnell Feb 6, 2017
9d6f84b
Merge branch 'master' into sequences
kodonnell Feb 26, 2017
7f45557
note funny bug ...
kodonnell Feb 26, 2017
4a7420a
some changes as per @stefanholder
kodonnell Mar 20, 2017
13707e1
bringing back the missing readme
kodonnell Mar 20, 2017
efb7b57
a few more tidyups
kodonnell Mar 20, 2017
956e7d0
more tidy-ups
kodonnell Mar 20, 2017
1dd88e8
utilise LocationIndexTree.findWithinRadius
kodonnell Mar 20, 2017
56df0ab
ugly hacky gui ...
kodonnell Mar 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 115 additions & 66 deletions matching-core/src/main/java/com/graphhopper/matching/MapMatching.java
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,8 @@ public MatchResult doWork(List<GPXEntry> gpxList) {
// Compute all candidates first.
// TODO: Generate candidates on-the-fly within computeViterbiSequence() if this does not
// degrade performance.
// @kodonnell: it's not currently possible as that'd mean calling queryGraph.lookup
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karussell / @stefanholder - happy for me to remove the above TODO?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanholder I probably do not understand this yet deep enough, but why do you think this would mean calling queryGraph.lookup multiple times?

Furthermore: creating a QueryGraph is cheap so we could call it then multiple times but the resulting QueryResults are independent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you meant me - but I think it requires calling queryGraph.lookup multiple times because we're wanting to get the candidates at each step. E.g. first step: get candidates, lookup, do viterbi next step; second step: get candidates, lookup, do viterbi next step, ...

If we create multiple queryGraphs, then we'll get duplicate virtual edge IDs, etc., which might be a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that looking up all candidates at once might be a problem for map matching very long GPS traces, e.g. from Munich to Hamburg. But maybe it's better to create an issue for this than having this TODO in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @stefanholder - hopefully #90 will suffice.

// multiple times, which isn't supported. So, remove TODO?
final List<QueryResult> allCandidates = new ArrayList<>();
List<TimeStep<GPXExtension, GPXEntry, Path>> timeSteps = createTimeSteps(gpxList,
edgeFilter, allCandidates);
Expand All @@ -182,11 +184,11 @@ public MatchResult doWork(List<GPXEntry> gpxList) {
final QueryGraph queryGraph = new QueryGraph(routingGraph).setUseEdgeExplorerCache(true);
queryGraph.lookup(allCandidates);

List<SequenceState<GPXExtension, GPXEntry, Path>> seq = computeViterbiSequence(timeSteps,
gpxList, queryGraph);
List<List<SequenceState<GPXExtension, GPXEntry, Path>>> sequences =
computeViterbiSequence(timeSteps, gpxList, queryGraph);

final EdgeExplorer explorer = queryGraph.createEdgeExplorer(edgeFilter);
MatchResult matchResult = computeMatchResult(seq, gpxList, allCandidates, explorer);
MatchResult matchResult = computeMatchResult(sequences, gpxList, allCandidates, explorer);

return matchResult;
}
Expand Down Expand Up @@ -229,52 +231,77 @@ private List<TimeStep<GPXExtension, GPXEntry, Path>> createTimeSteps(List<GPXEnt
return timeSteps;
}

private List<SequenceState<GPXExtension, GPXEntry, Path>> computeViterbiSequence(
/*
* Run the viterbi algorithm on our HMM model. Note that viterbi breaks can occur (e.g. if no
* candidates are found for a given timestep), and we handle these by return a list of complete
* sequences (each of which is unbroken). It is possible that a sequence contains only a single
* timestep.
*
* Note: we only break sequences with 'physical' reasons (e.g. no candidates nearby) and not
* algorithmic ones (e.g. maxVisitedNodes exceeded) - the latter should throw errors.
*/
private List<List<SequenceState<GPXExtension, GPXEntry, Path>>> computeViterbiSequence(
List<TimeStep<GPXExtension, GPXEntry, Path>> timeSteps, List<GPXEntry> gpxList,
final QueryGraph queryGraph) {
final HmmProbabilities probabilities
= new HmmProbabilities(measurementErrorSigma, transitionProbabilityBeta);
final ViterbiAlgorithm<GPXExtension, GPXEntry, Path> viterbi = new ViterbiAlgorithm<>();

int timeStepCounter = 0;
TimeStep<GPXExtension, GPXEntry, Path> prevTimeStep = null;
final HmmProbabilities probabilities = new HmmProbabilities(measurementErrorSigma,
transitionProbabilityBeta);
ViterbiAlgorithm<GPXExtension, GPXEntry, Path> viterbi = new ViterbiAlgorithm<>();
final List<List<SequenceState<GPXExtension, GPXEntry, Path>>> sequences =
new ArrayList<List<SequenceState<GPXExtension, GPXEntry, Path>>>();
TimeStep<GPXExtension, GPXEntry, Path> seqPrevTimeStep = null;
for (TimeStep<GPXExtension, GPXEntry, Path> timeStep : timeSteps) {

// if sequence is broken, then close it off and create a new viterbi:
if (viterbi.isBroken()) {
sequences.add(viterbi.computeMostLikelySequence());
seqPrevTimeStep = null;
viterbi = new ViterbiAlgorithm<>();
}

// always calculate emission probabilities regardless of place in sequence:
computeEmissionProbabilities(timeStep, probabilities);

if (prevTimeStep == null) {
if (seqPrevTimeStep == null) {
// first step of a sequence, so initialise viterbi:
viterbi.startWithInitialObservation(timeStep.observation, timeStep.candidates,
timeStep.emissionLogProbabilities);
// it is possible viterbi is immediately broken here (e.g. no candidates) - this
// will be caught by the first test in this loop.
} else {
computeTransitionProbabilities(prevTimeStep, timeStep, probabilities, queryGraph);
// add this step to current sequence:
computeTransitionProbabilities(seqPrevTimeStep, timeStep, probabilities, queryGraph);
viterbi.nextStep(timeStep.observation, timeStep.candidates,
timeStep.emissionLogProbabilities, timeStep.transitionLogProbabilities,
timeStep.roadPaths);
}
if (viterbi.isBroken()) {
String likelyReasonStr = "";
if (prevTimeStep != null) {
GPXEntry prevGPXE = prevTimeStep.observation;
GPXEntry gpxe = timeStep.observation;
double dist = distanceCalc.calcDist(prevGPXE.lat, prevGPXE.lon,
gpxe.lat, gpxe.lon);
if (dist > 2000) {
likelyReasonStr = "Too long distance to previous measurement? "
+ Math.round(dist) + "m, ";
}
// if broken, then close off this sequence and create a new one starting with this
// timestep. Note that we rely on the fact that if the viterbi breaks the most
// recent step does not get added i.e. 'computeMostLikelySequence' returns the most
// likely sequence without this (breaking) step added. Hence we can use it to start
// the next one:
// TODO: check the above is true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanholder - I think this is true, but can you confirm? I guess I could check for it to some degree ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your comment is true.

if (viterbi.isBroken()) {
sequences.add(viterbi.computeMostLikelySequence());
viterbi = new ViterbiAlgorithm<>();
viterbi.startWithInitialObservation(timeStep.observation, timeStep.candidates,
timeStep.emissionLogProbabilities);
// as above, it is possible viterbi is immediately broken here (e.g. no
// candidates) - this will be caught by the first test in this loop.
}

throw new RuntimeException("Sequence is broken for submitted track at time step "
+ timeStepCounter + " (" + gpxList.size() + " points). " + likelyReasonStr
+ "observation:" + timeStep.observation + ", "
+ timeStep.candidates.size() + " candidates: " + getSnappedCandidates(timeStep.candidates)
+ ". If a match is expected consider increasing max_visited_nodes.");
}

timeStepCounter++;
prevTimeStep = timeStep;
seqPrevTimeStep = timeStep;
}

return viterbi.computeMostLikelySequence();
// add the final sequence:
sequences.add(viterbi.computeMostLikelySequence());

// check sequence lengths:
int sequenceSizeSum = 0;
for (List<SequenceState<GPXExtension, GPXEntry, Path>> sequence : sequences) {
sequenceSizeSum += sequence.size();
}
assert sequenceSizeSum == timeSteps.size();

return sequences;
}

private void computeEmissionProbabilities(TimeStep<GPXExtension, GPXEntry, Path> timeStep,
Expand All @@ -301,70 +328,92 @@ private void computeTransitionProbabilities(TimeStep<GPXExtension, GPXEntry, Pat
for (GPXExtension from : prevTimeStep.candidates) {
for (GPXExtension to : timeStep.candidates) {
RoutingAlgorithm algo = algoFactory.createAlgo(queryGraph, algoOptions);
// System.out.println("algo " + algo.getName());
final Path path = algo.calcPath(from.getQueryResult().getClosestNode(),
to.getQueryResult().getClosestNode());
if (path.isFound()) {
timeStep.addRoadPath(from, to, path);
final double transitionLogProbability = probabilities
.transitionLogProbability(path.getDistance(), linearDistance, timeDiff);
timeStep.addTransitionLogProbability(from, to, transitionLogProbability);
} else {
// TODO: can we remove maxVisitedNodes completely and just set to infinity?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karussell - what is the purpose of having maxVisitedNodes in map-matching? In theory we can more easily prevent large graph traversals by limiting linear distance ... and then the user doesn't need to worry about configuring this option (and we have a 'physical' reason for throwing an error or finding no candidate - there were no transitions found within a given radius).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing an unlimited search is not good, especially not if you want roughly predicted maximum processing times. Linear distance is easier but graphs are often disconnected due to rivers etc where a distance limitation still can result in very costly traversals. Maybe we should use some good defaults and split the result instead of throwing an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. However, if I'm a user, what do I do if I get the above exception? If I want a map match, there's nothing I can do except increase maxVisitedNodes. So we might as well set it large by default anyway. If the user wants to handle the exceptions themselves (I can't really see why) then they can override the maxVisitedNodes to something smaller.

As an aside - the default GH behaviour is presumably to have no limit on maxVisitedNodes? Same argument applies there, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I want a map match, there's nothing I can do except increase maxVisitedNodes

If you have full control of the library&server then just use a maximum value, but defaults should be safe so that not a single request could occupy one CPU for dozens of seconds.

As an aside - the default GH behaviour is presumably to have no limit on maxVisitedNodes?

No. We have a max nodes limits too for non-CH routes. But the limits are much bigger because the number of points are typically a lot less.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll leave it throwing an error for now. In future, we could maybe think of adding e.g. matchResult.isFound() similar to path.isFound().

if (algo.getVisitedNodes() > algoOptions.getMaxVisitedNodes()) {
throw new RuntimeException(
"couldn't compute transition probabilities as routing failed due to too small maxVisitedNodes ("
+ algoOptions.getMaxVisitedNodes() + ")");
}
// TODO: can we somewhere record that this route failed? Currently all viterbi
// knows is that there's no transition possible, but not why. This is useful
// for e.g. explaining why a sequence broke.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we split into multiple MatchResult (one per each sequence) then maybe we could add a 'sequenceStartReason' property? For the first sequence it will be 'first sequence' and for the rest it might be 'no candidates found' or 'no transitions found' etc. @karussell / @stefanholder ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, but I'm not sure if this is helpful for the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither. My thinking was that breaking into sequences is better than throwing an exception. However, I then thought that the user may want to know why the sequence broke. And it might be useful for diagnostics etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Having separated sequences is the way to go IMO. And if user requests specific reasons we can also fine tune this later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - will leave as a TODO for now.

}
}
}
}

private MatchResult computeMatchResult(List<SequenceState<GPXExtension, GPXEntry, Path>> seq,
List<GPXEntry> gpxList, List<QueryResult> allCandidates,
EdgeExplorer explorer) {
private MatchResult computeMatchResult(
List<List<SequenceState<GPXExtension, GPXEntry, Path>>> sequences,
List<GPXEntry> gpxList, List<QueryResult> allCandidates, EdgeExplorer explorer) {
// every virtual edge maps to its real edge where the orientation is already correct!
// TODO use traversal key instead of string!
final Map<String, EdgeIteratorState> virtualEdgesMap = new HashMap<>();
for (QueryResult candidate : allCandidates) {
fillVirtualEdges(virtualEdgesMap, explorer, candidate);
}

MatchResult matchResult = computeMatchedEdges(seq, virtualEdgesMap);
// TODO: sequences may be only single timesteps, or disconnected. So we need to support:
// - fake edges e.g. 'got from X to Y' but we don't know how ...
// - single points e.g. 'was at X from t0 to t1'
MatchResult matchResult = computeMatchedEdges(sequences, virtualEdgesMap);
computeGpxStats(gpxList, matchResult);

return matchResult;
}

private MatchResult computeMatchedEdges(List<SequenceState<GPXExtension, GPXEntry, Path>> seq,
Map<String, EdgeIteratorState> virtualEdgesMap) {
List<EdgeMatch> edgeMatches = new ArrayList<>();
private MatchResult computeMatchedEdges(
List<List<SequenceState<GPXExtension, GPXEntry, Path>>> sequences,
Map<String, EdgeIteratorState> virtualEdgesMap) {
// TODO: remove gpx extensions and just add time at start/end of edge.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karussell - can I do this? I'm not sure why else we save all the gpxExtensions to the given edge ..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show what you want to do :) ? Currently not sure what you mean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I want to remove all GPX extensions from edge matches (and hence match result), and add only the to/from time to the edge (which will be inferred from the GPX points and route times, etc.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that in the future it should (correctly) associate all original GPXEntry's to one edge. See e.g. https://discuss.graphhopper.com/t/map-matchresult-to-gpxentry/977

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that should (hopefully) be easy - I need to find the matching GPX entry to infer the start/end time anyway. Separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR?

Would be welcome :) ! But you would need to use some additional data and so we keep the GPXExtensions in the EdgeMatch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - will discuss in the (hopefully soon) PR.

double distance = 0.0;
long time = 0;
EdgeIteratorState currentEdge = null;
List<EdgeMatch> edgeMatches = new ArrayList<>();
List<GPXExtension> gpxExtensions = new ArrayList<>();
GPXExtension queryResult = seq.get(0).state;
gpxExtensions.add(queryResult);
for (int j = 1; j < seq.size(); j++) {
queryResult = seq.get(j).state;
Path path = seq.get(j).transitionDescriptor;
distance += path.getDistance();
time += path.getTime();
for (EdgeIteratorState edgeIteratorState : path.calcEdges()) {
EdgeIteratorState directedRealEdge = resolveToRealEdge(virtualEdgesMap,
edgeIteratorState);
if (directedRealEdge == null) {
throw new RuntimeException("Did not find real edge for "
+ edgeIteratorState.getEdge());
}
if (currentEdge == null || !equalEdges(directedRealEdge, currentEdge)) {
if (currentEdge != null) {
EdgeMatch edgeMatch = new EdgeMatch(currentEdge, gpxExtensions);
edgeMatches.add(edgeMatch);
gpxExtensions = new ArrayList<>();
EdgeIteratorState currentEdge = null;
for (List<SequenceState<GPXExtension, GPXEntry, Path>> sequence : sequences) {
GPXExtension queryResult = sequence.get(0).state;
gpxExtensions.add(queryResult);
for (int j = 1; j < sequence.size(); j++) {
queryResult = sequence.get(j).state;
Path path = sequence.get(j).transitionDescriptor;
distance += path.getDistance();
time += path.getTime();
for (EdgeIteratorState edgeIteratorState : path.calcEdges()) {
EdgeIteratorState directedRealEdge =
resolveToRealEdge(virtualEdgesMap, edgeIteratorState);
if (directedRealEdge == null) {
throw new RuntimeException(
"Did not find real edge for " + edgeIteratorState.getEdge());
}
if (currentEdge == null || !equalEdges(directedRealEdge, currentEdge)) {
if (currentEdge != null) {
EdgeMatch edgeMatch = new EdgeMatch(currentEdge, gpxExtensions);
edgeMatches.add(edgeMatch);
gpxExtensions = new ArrayList<>();
}
currentEdge = directedRealEdge;
}
currentEdge = directedRealEdge;
}
gpxExtensions.add(queryResult);
}
gpxExtensions.add(queryResult);
}

// we should have some edge matches:
if (edgeMatches.isEmpty()) {
String sequenceSizes = "";
for (List<SequenceState<GPXExtension, GPXEntry, Path>> sequence : sequences) {
sequenceSizes += sequence.size() + ",";
}
throw new IllegalStateException(
"No edge matches found for path. Too short? Sequence size " + seq.size());
"No edge matches found for path. Only single-size sequences? Sequence sizes: "
+ sequenceSizes.substring(0, sequenceSizes.length() - 1));
}
EdgeMatch lastEdgeMatch = edgeMatches.get(edgeMatches.size() - 1);
if (!gpxExtensions.isEmpty() && !equalEdges(currentEdge, lastEdgeMatch.getEdgeState())) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;
import static org.junit.Assert.fail;

import org.junit.Ignore;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
Expand Down Expand Up @@ -82,7 +84,7 @@ public static Collection<Object[]> algoOptions() {

// force CH
AlgorithmOptions chOpts = AlgorithmOptions.start()
.maxVisitedNodes(40)
.maxVisitedNodes(1000)
.hints(new PMap().put(Parameters.CH.DISABLE, false))
.build();

Expand Down Expand Up @@ -172,27 +174,55 @@ public void testDoWork() {
*/
@Test
public void testDistantPoints() {

// OK with 1000 visited nodes:
MapMatching mapMatching = new MapMatching(hopper, algoOptions);
List<GPXEntry> inputGPXEntries = createRandomGPXEntries(
new GHPoint(51.23, 12.18),
new GHPoint(51.45, 12.59));
MatchResult mr = mapMatching.doWork(inputGPXEntries);
assertEquals(mr.getMatchLength(), 57653, 1);
assertEquals(mr.getMatchMillis(), 2748186, 1);
}

// not OK when we only allow a small number of visited nodes:
/**
* This test is to check that an exception occurs if maxVisitedNodes is exceeded.
*/
@Test
public void testMaxVisitedNodesExceeded() {
AlgorithmOptions opts = AlgorithmOptions.start(algoOptions).maxVisitedNodes(1).build();
mapMatching = new MapMatching(hopper, opts);
MapMatching mapMatching = new MapMatching(hopper, opts);
List<GPXEntry> inputGPXEntries = createRandomGPXEntries(
new GHPoint(51.23, 12.18),
new GHPoint(51.45, 12.59));
try {
mr = mapMatching.doWork(inputGPXEntries);
fail("Expected sequence to be broken due to maxVisitedNodes being too small");
mapMatching.doWork(inputGPXEntries);
fail("Expected sequence to be broken due to maxVisitedNodes being exceeded");
} catch (RuntimeException e) {
assertTrue(e.getMessage().startsWith("Sequence is broken for submitted track"));
assertTrue(e.getMessage().startsWith(
"couldn't compute transition probabilities as routing failed due to too small"
+ " maxVisitedNodes"));
}
}

/**
* This test is to check that the track is broken into two sequences when an internal point
* has no candidates:
*/
@Test
@Ignore
public void testSequenceBreaksWhenNoCandidates() {
fail("not yet implemented");
}

/**
* This test is to check that the track is broken into two sequences when there are no possible
* routes between two timesteps
*/
@Test
@Ignore
public void testSequenceBreaksWithUnconnectedSteps() {
fail("not yet implemented");
}

/**
* This test is to check what happens when two GPX entries are on one edge
* which is longer than 'separatedSearchDistance' - which is always 66m. GPX
Expand Down