Skip to content

Commit

Permalink
Adds a rake task to reload SuggestedResources
Browse files Browse the repository at this point in the history
** Why are these changes being introduced:

* We need a way to populate SuggestedResource records quickly,
  especially in non-production environments (for local development, or
  for review apps on Heroku). Production records will eventually be
  managed via Administrate, although in some cases we might need to
  re-load these records there as well.

** Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/TCO-52

** How does this address that need:

* To allow these bulk operations, this defines a rake task to import a
  CSV file from a remote URL, which results in a CSV::Table object. That
  object is then passed to a new "bulk_replace" method on the
  Detector::SuggestedResource class, which deletes all current records
  and replaces them with the contents of that CSV.

** Document any side effects to this change:

* Initially I'd hoped that this task would support providing both a
  local file (for local development) as well as files from a URL (for
  Heroku apps). However, differences in how the URI and File libraries
  produce CSV files meant that for now everything will need to be
  done via remote URLs. A possible next step would be to build a
  user-facing administration form, rather than dealing with local
  file support.

* I tried to write suitable tests for the rake task, in addition to the
  bulk_replace method. Because we didn't have any rake tests before
  this, however, the overall test coverage is dropping by a fair amount.
  I'm not sure if our usual practice is just to never test rake tests,
  but that feels like a conversation for the entire team.

* The cassettes used for testing are generated by using a local Lando
  setup to host the imported CSV files. I'm not sure this is the most
  sustainable option for all of us, but I also don't know that we want
  to add sample files to our CDN for use in these cassettes? That's a
  question for the reviewer.
  • Loading branch information
matt-bernhardt committed Aug 7, 2024
1 parent 9be8b36 commit b1cb436
Show file tree
Hide file tree
Showing 8 changed files with 281 additions and 0 deletions.
22 changes: 22 additions & 0 deletions app/models/detector/suggested_resource.rb
Original file line number Diff line number Diff line change
Expand Up @@ -53,5 +53,27 @@ def calculate_fingerprint(old_phrase)
# Rejoin tokens
tokens.join(' ')
end

# This replaces all current Detector::SuggestedResource records with a new set from an imported CSV.
#
# @note This method is called by the suggested_resource:reload rake task.
#
# @param input [CSV::Table] An imported CSV file containing all Suggested Resource records. The CSV file must have
# at least three headers, named "Title", "URL", and "Phrase".
def self.bulk_replace(input)
raise ArgumentError.new, 'Tabular CSV is required' unless input.instance_of?(CSV::Table)

# Need to check what columns exist in input
required_headers = %w[Title URL Phrase]
missing_headers = required_headers - input.headers
raise ArgumentError.new, "Some CSV columns missing: #{missing_headers}" unless missing_headers.empty?

Detector::SuggestedResource.delete_all

input.each do |line|
record = Detector::SuggestedResource.new({ title: line['Title'], url: line['URL'], phrase: line['Phrase'] })
record.save
end
end
end
end
28 changes: 28 additions & 0 deletions lib/tasks/suggested_resources.rake
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# frozen_string_literal: true

# These define tasks for managing our SuggestedResource records.
namespace :suggested_resources do
# While we intend to use Dataclips for exporting these records when needed,
# we do need a way to import records from a CSV file.
desc 'Replace all Suggested Resources from CSV'
task :reload, [:addr] => :environment do |_task, args|
raise ArgumentError.new, 'URL is required' unless args.addr.present?

raise ArgumentError.new, 'Local files are not supported yet' unless URI(args.addr).scheme

Rails.logger.info('Reloading all Suggested Resource records from CSV')

url = URI.parse(args.addr)

raise ArgumentError.new, 'HTTP/HTTPS scheme is required' unless url.scheme.in?(%w[http https])

file = url.open.read.gsub("\xEF\xBB\xBF", '').force_encoding('UTF-8').encode
data = CSV.parse(file, headers: true)

Rails.logger.info("Record count before we reload: #{Detector::SuggestedResource.count}")

Detector::SuggestedResource.bulk_replace(data)

Rails.logger.info("Record count after we reload: #{Detector::SuggestedResource.count}")
end
end
3 changes: 3 additions & 0 deletions test/fixtures/files/suggested_resources.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Title,URL,Phrase
New Example,https://example.org,new example search
Web of Science,https://libraries.mit.edu/webofsci,web of Science
68 changes: 68 additions & 0 deletions test/tasks/suggested_resource_rake_test.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# frozen_string_literal: true

require 'test_helper'
require 'rake'

class SuggestedResourceRakeTest < ActiveSupport::TestCase
def setup
Tacos::Application.load_tasks if Rake::Task.tasks.empty?
Rake::Task['suggested_resources:reload'].reenable
end

test 'reload can accept a url' do
records_before = Detector::SuggestedResource.count # We have three fixtures at the moment
first_record_before = Detector::SuggestedResource.first
VCR.use_cassette('suggested_resource:reload from remote csv') do
remote_file = 'http://static.lndo.site/suggested_resources.csv'
Rake::Task['suggested_resources:reload'].invoke(remote_file)
end
refute_equal records_before, Detector::SuggestedResource.count
refute_equal first_record_before, Detector::SuggestedResource.first
end

test 'reload task errors without a file argument' do
error = assert_raises(ArgumentError) do
Rake::Task['suggested_resources:reload'].invoke
end
assert_equal 'URL is required', error.message
end

test 'reload errors on a local file' do
error = assert_raises(ArgumentError) do
local_file = Rails.root.join('test', 'fixtures', 'files', 'suggested_resources.csv').to_s
Rake::Task['suggested_resources:reload'].invoke(local_file)
end
assert_equal 'Local files are not supported yet', error.message
end

test 'reload fails with a non-CSV file' do
assert_raises(CSV::MalformedCSVError) do
VCR.use_cassette('suggested_resource:reload from remote non-csv') do
remote_file = 'http://static.lndo.site/suggested_resources.xlsx'
Rake::Task['suggested_resources:reload'].invoke(remote_file)
end
end
end

test 'reload fails unless all three columns are present: title, url, phrase' do
error = assert_raises(ArgumentError) do
VCR.use_cassette('suggested_resource:reload with missing field') do
remote_file = 'http://static.lndo.site/suggested_resources_missing_field.csv'
Rake::Task['suggested_resources:reload'].invoke(remote_file)
end
end
assert_equal 'Some CSV columns missing: ["Phrase"]', error.message
end

# assert_nothing_raised is viewed as an anti-pattern, but I'm leery of a test
# with no assertions. As a result, we use a single assertion to confirm
# something happened.
test 'reload succeeds if extra columns are present' do
records_before = Detector::SuggestedResource.count # We have three fixtures at the moment
VCR.use_cassette('suggested_resource:reload with extra field') do
remote_file = 'http://static.lndo.site/suggested_resources_extra.csv'
Rake::Task['suggested_resources:reload'].invoke(remote_file)
end
refute_equal records_before, Detector::SuggestedResource.count
end
end
40 changes: 40 additions & 0 deletions test/vcr_cassettes/suggested_resource_reload_from_remote_csv.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit b1cb436

Please sign in to comment.