-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Detector::Journal class #58
Conversation
a59d220
to
5da76f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an exciting step forward! I agree that partial_term_match
is not likely to scale well, but it does feel like it will be useful. I'd like to leave it in for now and consider refactoring later if it becomes an important detector method but causes performance issues.
# Detectors are classes that implement various algorithms that allow us to identify patterns | ||
# within search terms. | ||
module Detector | ||
def self.table_name_prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't seen this convention before! Thanks for teaching me a new Rails trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't either! The rails generator made this syntax and made me realize I should fixup how I did the namespace for Metrics :)
private | ||
|
||
# Downcasing all names before saving allows for more efficient matching by ensuring our index is lowercase. | ||
# If we find we need the non-lowercase Journal name in the future, we could store that as `additional_info` json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem like this could come up again, so I'm glad to see that you've already considered an option.
Why are these changes being introduced: * Storing journal information locally will be used for detecting journal names in incoming search terms Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TCO-38 How does this address that need: * Creates ActiveRecord model Journal inside a new Detector namespace * It is likely we will move StandardIdentifers into the Detector namespace along with any new Detectors (Hints, etc) Document any side effects to this change: * The `partial_term_match` algorithm may be unwieldy once a large number of journals are loaded. We'll need to assess once we do that whether this can be wired in as part of live matching or just as an interesting data point to help us understand if this would be useful for live detections * The GraphQL endpoints will need a refactor to consider how to model the `Detections` rather than just the `StandardIdentifers`. That is separately ticketed. * Storing everything that isn't a journal name as json is new for us. This is partially done to allow us to not decide on the full scope of what to store before we understand what we need, and partially to allow us to explore what json tables may allow for us and to learn if/how indexing strategies need to change if we use json tables. * We have not decided on where to source our journal data from yet. We may find once we move forward with that some changes will be needed here, but since this is a very simple model it felt safe enough to move foward at this time.
5da76f0
to
59e8f7a
Compare
Why are these changes being introduced:
Relevant ticket(s):
How does this address that need:
Document any side effects to this change:
partial_term_match
algorithm may be unwieldy once a large number of journals are loaded. We'll need to assess once we do that whether this can be wired in as part of live matching or just as an interesting data point to help us understand if this would be useful for live detectionsDetections
rather than just theStandardIdentifers
. That is separately ticketed.