Skip to content

Commit

Permalink
Merge pull request #72 from MITLibraries/tco52-import-hints
Browse files Browse the repository at this point in the history
Create rake task for bulk reloading of SuggestedResource records
  • Loading branch information
matt-bernhardt authored Aug 8, 2024
2 parents 9be8b36 + 102b158 commit 9f52ce0
Show file tree
Hide file tree
Showing 9 changed files with 302 additions and 0 deletions.
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,26 @@ There is a `Makefile` that contains some useful command shortcuts for typical de

To see a current list of commands, run `make help`.

### Generating cassettes for tests

We use [VCR](https://github.com/vcr/vcr) to record transactions with remote systems for testing. This includes the rake
task for reloading Detector::SuggestedResource records, which do not yet have a standard provider. For the initial
feature development, we have used a Lando environment with the following definition:

```yml
name: static
recipe: lamp
config:
webroot: .
```
If you need to regenerate these cassettes, the following procedure should be sufficient:
1. Use the configuration above to ensure the needed files are visible at `http://static.lndo.site/filename.ext`.
2. Delete any existing cassette files which need to be regenerated.
3. Run the test(s).
4. Commit the resulting files along with your other work.

## Environment Variables

### Required
Expand Down
23 changes: 23 additions & 0 deletions app/models/detector/suggested_resource.rb
Original file line number Diff line number Diff line change
Expand Up @@ -53,5 +53,28 @@ def calculate_fingerprint(old_phrase)
# Rejoin tokens
tokens.join(' ')
end

# This replaces all current Detector::SuggestedResource records with a new set from an imported CSV.
#
# @note This method is called by the suggested_resource:reload rake task.
#
# @param input [CSV::Table] An imported CSV file containing all Suggested Resource records. The CSV file must have
# at least three headers, named "Title", "URL", and "Phrase". Please note: these values
# are case sensitive.
def self.bulk_replace(input)
raise ArgumentError.new, 'Tabular CSV is required' unless input.instance_of?(CSV::Table)

# Need to check what columns exist in input
required_headers = %w[Title URL Phrase]
missing_headers = required_headers - input.headers
raise ArgumentError.new, "Some CSV columns missing: #{missing_headers}" unless missing_headers.empty?

Detector::SuggestedResource.delete_all

input.each do |line|
record = Detector::SuggestedResource.new({ title: line['Title'], url: line['URL'], phrase: line['Phrase'] })
record.save
end
end
end
end
28 changes: 28 additions & 0 deletions lib/tasks/suggested_resources.rake
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# frozen_string_literal: true

# These define tasks for managing our SuggestedResource records.
namespace :suggested_resources do
# While we intend to use Dataclips for exporting these records when needed,
# we do need a way to import records from a CSV file.
desc 'Replace all Suggested Resources from CSV'
task :reload, [:addr] => :environment do |_task, args|
raise ArgumentError.new, 'URL is required' unless args.addr.present?

raise ArgumentError.new, 'Local files are not supported yet' unless URI(args.addr).scheme

Rails.logger.info('Reloading all Suggested Resource records from CSV')

url = URI.parse(args.addr)

raise ArgumentError.new, 'HTTP/HTTPS scheme is required' unless url.scheme.in?(%w[http https])

file = url.open.read.gsub("\xEF\xBB\xBF", '').force_encoding('UTF-8').encode
data = CSV.parse(file, headers: true)

Rails.logger.info("Record count before we reload: #{Detector::SuggestedResource.count}")

Detector::SuggestedResource.bulk_replace(data)

Rails.logger.info("Record count after we reload: #{Detector::SuggestedResource.count}")
end
end
3 changes: 3 additions & 0 deletions test/fixtures/files/suggested_resources.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Title,URL,Phrase
New Example,https://example.org,new example search
Web of Science,https://libraries.mit.edu/webofsci,web of Science
68 changes: 68 additions & 0 deletions test/tasks/suggested_resource_rake_test.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# frozen_string_literal: true

require 'test_helper'
require 'rake'

class SuggestedResourceRakeTest < ActiveSupport::TestCase
def setup
Tacos::Application.load_tasks if Rake::Task.tasks.empty?
Rake::Task['suggested_resources:reload'].reenable
end

test 'reload can accept a url' do
records_before = Detector::SuggestedResource.count # We have three fixtures at the moment
first_record_before = Detector::SuggestedResource.first
VCR.use_cassette('suggested_resource:reload from remote csv') do
remote_file = 'http://static.lndo.site/suggested_resources.csv'
Rake::Task['suggested_resources:reload'].invoke(remote_file)
end
refute_equal records_before, Detector::SuggestedResource.count
refute_equal first_record_before, Detector::SuggestedResource.first
end

test 'reload task errors without a file argument' do
error = assert_raises(ArgumentError) do
Rake::Task['suggested_resources:reload'].invoke
end
assert_equal 'URL is required', error.message
end

test 'reload errors on a local file' do
error = assert_raises(ArgumentError) do
local_file = Rails.root.join('test', 'fixtures', 'files', 'suggested_resources.csv').to_s
Rake::Task['suggested_resources:reload'].invoke(local_file)
end
assert_equal 'Local files are not supported yet', error.message
end

test 'reload fails with a non-CSV file' do
assert_raises(CSV::MalformedCSVError) do
VCR.use_cassette('suggested_resource:reload from remote non-csv') do
remote_file = 'http://static.lndo.site/suggested_resources.xlsx'
Rake::Task['suggested_resources:reload'].invoke(remote_file)
end
end
end

test 'reload fails unless all three columns are present: title, url, phrase' do
error = assert_raises(ArgumentError) do
VCR.use_cassette('suggested_resource:reload with missing field') do
remote_file = 'http://static.lndo.site/suggested_resources_missing_field.csv'
Rake::Task['suggested_resources:reload'].invoke(remote_file)
end
end
assert_equal 'Some CSV columns missing: ["Phrase"]', error.message
end

# assert_nothing_raised is viewed as an anti-pattern, but I'm leery of a test
# with no assertions. As a result, we use a single assertion to confirm
# something happened.
test 'reload succeeds if extra columns are present' do
records_before = Detector::SuggestedResource.count # We have three fixtures at the moment
VCR.use_cassette('suggested_resource:reload with extra field') do
remote_file = 'http://static.lndo.site/suggested_resources_extra.csv'
Rake::Task['suggested_resources:reload'].invoke(remote_file)
end
refute_equal records_before, Detector::SuggestedResource.count
end
end
40 changes: 40 additions & 0 deletions test/vcr_cassettes/suggested_resource_reload_from_remote_csv.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 9f52ce0

Please sign in to comment.