-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Detector::Journal class #58
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# frozen_string_literal: true | ||
|
||
# Detectors are classes that implement various algorithms that allow us to identify patterns | ||
# within search terms. | ||
module Detector | ||
def self.table_name_prefix | ||
'detector_' | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# frozen_string_literal: true | ||
|
||
# == Schema Information | ||
# | ||
# Table name: detector_journals | ||
# | ||
# id :integer not null, primary key | ||
# name :string | ||
# additional_info :json | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
module Detector | ||
# Detector::Journal stores information about academic journals loaded from external sources to allow us to check our | ||
# incoming Terms against these information | ||
class Journal < ApplicationRecord | ||
before_save :downcase_fields! | ||
|
||
# Identify journals in which the incoming phrase matches a Journal.name exactly | ||
# | ||
# @note We always store the Journal.name downcased, so we should also always downcase the phrase | ||
# when matching | ||
# | ||
# @note In reality, multiple Journals can exist with the same name. Therefore, we don't enforce | ||
# unique names and don't expect a single Journal to be returned. | ||
# | ||
# @param phrase [String]. A string representation of a search term (not an actual Term object!) | ||
# | ||
# @return [Set of Detector::Journal] A set of ActiveRecord Detector::Journal relations. | ||
def self.full_term_match(phrase) | ||
Journal.where(name: phrase.downcase) | ||
end | ||
|
||
# Identify journals in which the incoming phrase contains one or more Journal names | ||
# | ||
# @note This likely won't scale well and may not be suitable for live detection as it loads all Journal records. | ||
# | ||
# @param phrase [String]. A string representation of a search term (not an actual Term object!) | ||
# | ||
# @return [Set of Detector::Journal] A set of ActiveRecord Detector::Journal relations. | ||
def self.partial_term_match(phrase) | ||
Journal.all.map { |journal| journal if phrase.downcase.include?(journal.name) }.compact | ||
end | ||
|
||
private | ||
|
||
# Downcasing all names before saving allows for more efficient matching by ensuring our index is lowercase. | ||
# If we find we need the non-lowercase Journal name in the future, we could store that as `additional_info` json | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does seem like this could come up again, so I'm glad to see that you've already considered an option. |
||
def downcase_fields! | ||
name.downcase! | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
class CreateDetectorJournals < ActiveRecord::Migration[7.1] | ||
def change | ||
create_table :detector_journals do |t| | ||
t.string :name | ||
t.json :additional_info | ||
|
||
t.timestamps | ||
end | ||
add_index :detector_journals, :name | ||
end | ||
end |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# == Schema Information | ||
# | ||
# Table name: detector_journals | ||
# | ||
# id :integer not null, primary key | ||
# name :string | ||
# additional_info :json | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
|
||
# Note: fixtures bypass ActiveRecord callbacks so while our model auto downcases titles, | ||
# these fixtures will be stored mixed case unless they are all manually downcased here. | ||
# Put another way, please make sure to always use downcase/lowercase for the 'name' in these fixtures | ||
# to properly match the real behavior of the application. | ||
nature: { | ||
name: nature, | ||
additional_info: {issns: ['0028-0836', '1476-4687']} | ||
} | ||
|
||
the_new_england_journal_of_medicine: { | ||
name: the new england journal of medicine, | ||
additional_info: {issns: ['0028-4793', '1533-4406']} | ||
} | ||
|
||
nature_medicine: { | ||
name: nature medicine, | ||
additional_info: {issns: ['1078-8956', '1546-170X']} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# frozen_string_literal: true | ||
|
||
# == Schema Information | ||
# | ||
# Table name: detector_journals | ||
# | ||
# id :integer not null, primary key | ||
# name :string | ||
# additional_info :json | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
require 'test_helper' | ||
|
||
module Detector | ||
class JournalTest < ActiveSupport::TestCase | ||
test 'exact term match on journal name' do | ||
expected = detector_journals('the_new_england_journal_of_medicine') | ||
actual = Detector::Journal.full_term_match('the new england journal of medicine') | ||
|
||
assert actual.count == 1 | ||
assert_equal(expected, actual.first) | ||
end | ||
|
||
test 'mixed case exact term match on journal name' do | ||
expected = detector_journals('the_new_england_journal_of_medicine') | ||
actual = Detector::Journal.full_term_match('The New England Journal of Medicine') | ||
|
||
assert actual.count == 1 | ||
assert_equal(expected, actual.first) | ||
end | ||
|
||
test 'exact match within longer term returns no matches' do | ||
actual = Detector::Journal.full_term_match('The New England Journal of Medicine, 1999') | ||
assert actual.count.zero? | ||
end | ||
|
||
test 'phrase match within longer term returns matches' do | ||
actual = Detector::Journal.partial_term_match('words and stuff The New England Journal of Medicine, 1999') | ||
assert actual.count == 1 | ||
end | ||
|
||
test 'multple matches can happen with phrase matching within longer terms' do | ||
actual = Detector::Journal.partial_term_match('words and stuff Nature medicine, 1999') | ||
assert actual.count == 2 | ||
end | ||
end | ||
end |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't seen this convention before! Thanks for teaching me a new Rails trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't either! The rails generator made this syntax and made me realize I should fixup how I did the namespace for Metrics :)