Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In working with the extended usarray data set I discovered a gap in our implementation of the
OriginTimeMatcher
class. I tried to run thebulk_normalize
function on a database with around 3 million wf documents. The aim was to produce a clean database with "channel_id" and "source_id" set so I could use id matching for processing this large data set. It turned outbulk_normalize
requiredOriginTimeMatcher
to implement the "find_doc" method. It did not have that previously.This revision removes the find_doc deficiency in
OriginTimeMatcher
but I went one step further. That is, I realized that a generic version offind_doc
was possible in the base cassBasicMatcher
. I implemented that. However, I had to also create and override of the generic method inOriginTimeMatcher
due to several issues in that class that did not mesh with the simple concepts of the generic method. I ended up also overriding find_one inOriginTimeMatcher
after writing the newfind_doc
method. The previous version did not handle a common issue with this matcher. That is, a time interval match is soft and if there was a far from zero probability that a time interval in a match contained multiple earthquakes. find_one and find_doc now both contain a resolution to that ambiguity that is the obvious choice: select for the unique match the one that comes closest to matching the time projected from the waveform start time - the basic idea behind this class.