Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Detector::Journal class #58

Merged
merged 1 commit into from
Jul 15, 2024
Merged

Conversation

JPrevost
Copy link
Member

Why are these changes being introduced:

  • Storing journal information locally will be used for detecting journal names in incoming search terms

Relevant ticket(s):

How does this address that need:

  • Creates ActiveRecord model Journal inside a new Detector namespace
  • It is likely we will move StandardIdentifers into the Detector namespace along with any new Detectors (Hints, etc)

Document any side effects to this change:

  • The partial_term_match algorithm may be unwieldy once a large number of journals are loaded. We'll need to assess once we do that whether this can be wired in as part of live matching or just as an interesting data point to help us understand if this would be useful for live detections
  • The GraphQL endpoints will need a refactor to consider how to model the Detections rather than just the StandardIdentifers. That is separately ticketed.
  • Storing everything that isn't a journal name as json is new for us. This is partially done to allow us to not decide on the full scope of what to store before we understand what we need, and partially to allow us to explore what json tables may allow for us and to learn if/how indexing strategies need to change if we use json tables.
  • We have not decided on where to source our journal data from yet. We may find once we move forward with that some changes will be needed here, but since this is a very simple model it felt safe enough to move foward at this time.

@mitlib mitlib temporarily deployed to tacos-api-pipeline-pr-58 July 12, 2024 16:59 Inactive
@JPrevost JPrevost force-pushed the tco38-journals-model-and-loader branch from a59d220 to 5da76f0 Compare July 12, 2024 17:02
@JPrevost JPrevost temporarily deployed to tacos-api-pipeline-pr-58 July 12, 2024 17:02 Inactive
Copy link
Contributor

@jazairi jazairi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an exciting step forward! I agree that partial_term_match is not likely to scale well, but it does feel like it will be useful. I'd like to leave it in for now and consider refactoring later if it becomes an important detector method but causes performance issues.

# Detectors are classes that implement various algorithms that allow us to identify patterns
# within search terms.
module Detector
def self.table_name_prefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't seen this convention before! Thanks for teaching me a new Rails trick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't either! The rails generator made this syntax and made me realize I should fixup how I did the namespace for Metrics :)

private

# Downcasing all names before saving allows for more efficient matching by ensuring our index is lowercase.
# If we find we need the non-lowercase Journal name in the future, we could store that as `additional_info` json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like this could come up again, so I'm glad to see that you've already considered an option.

@jazairi jazairi self-assigned this Jul 12, 2024
Why are these changes being introduced:

* Storing journal information locally will be used for detecting
  journal names in incoming search terms

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/TCO-38

How does this address that need:

* Creates ActiveRecord model Journal inside a new Detector namespace
* It is likely we will move StandardIdentifers into the Detector
  namespace along with any new Detectors (Hints, etc)

Document any side effects to this change:

* The `partial_term_match` algorithm may be unwieldy once a large
  number of journals are loaded. We'll need to assess once we do that
  whether this can be wired in as part of live matching or just as
  an interesting data point to help us understand if this would be
  useful for live detections
* The GraphQL endpoints will need a refactor to consider how to model
  the `Detections` rather than just the `StandardIdentifers`. That is
  separately ticketed.
* Storing everything that isn't a journal name as json is new for us.
  This is partially done to allow us to not decide on the full scope of
  what to store before we understand what we need, and partially to
  allow us to explore what json tables may allow for us and to learn
  if/how indexing strategies need to change if we use json tables.
* We have not decided on where to source our journal data from yet. We
  may find once we move forward with that some changes will be needed
  here, but since this is a very simple model it felt safe enough to
  move foward at this time.
@JPrevost JPrevost force-pushed the tco38-journals-model-and-loader branch from 5da76f0 to 59e8f7a Compare July 15, 2024 13:07
@JPrevost JPrevost merged commit e394f7e into main Jul 15, 2024
1 check passed
@JPrevost JPrevost deleted the tco38-journals-model-and-loader branch July 15, 2024 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants