Skip to content

Commit

Permalink
☑️ Refining globbed_tail_locations for S3
Browse files Browse the repository at this point in the history
Prior to this commit, we didn't have a spec for the S3 behavior.  We now
have a test for an S3 Faux Bucket.

Related to:

- https://github.com/scientist-softserv/adventist-dl/issues/330
- scientist-softserv/iiif_print#220
  • Loading branch information
jeremyf committed May 30, 2023
1 parent a45e57f commit b5a16e2
Show file tree
Hide file tree
Showing 5 changed files with 32 additions and 10 deletions.
6 changes: 3 additions & 3 deletions lib/derivative_rodeo/storage_locations/base_location.rb
Original file line number Diff line number Diff line change
Expand Up @@ -208,16 +208,16 @@ def derived_file_from(template:)

##
# When you have a known location and want to check for files that are within that location,
# use the #globbed_tail_locations method. In the case of {Generators::PdfSplitGenerator} we
# use the {#globbed_tail_locations} method. In the case of {Generators::PdfSplitGenerator} we
# need to know the path to the all of the image files we "split" off of the given PDF.
#
# We can use the :file_path as the prefix the given :tail_glob as the suffix for a "fully
# qualified" Dir.glob type search.
#
# @param tail_glob [String]
#
# @return [StorageLocations::BaseLocation] when there is one or more files at the location
# @return [NilClass] when there are no files
# @return [Enumerable<StorageLocations::BaseLocation>] the locations of the files; an empty
# array when there are none.
def globbed_tail_locations(tail_glob:)
raise NotImplementedError, "#{self.class}#globbed_locations"
end
Expand Down
16 changes: 13 additions & 3 deletions lib/derivative_rodeo/storage_locations/s3_location.rb
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ def exist?
#
# @note S3 allows searching on a prefix but does not allow for "wildcard" searches. We can
# use the components of the file_path to fake that behavior.
#
# @see Generators::PdfSplitGenerator#image_file_basename_template
def globbed_tail_locations(tail_glob:)
# file_path = "s3://blah/1234/hello-world/pages/*.tiff"
#
Expand All @@ -75,11 +77,19 @@ def globbed_tail_locations(tail_glob:)
# and miscolate two PDFs.
#
# file_path = "s3://blah/1234/hello-world/hello-world.pdf
# TODO: Should file_path be file_dir?
globname = File.join(file_path, tail_glob)
globname = File.join(file_dir, tail_glob)
regexp = %r{#{File.extname(globname)}$}

# NOTE: We're making some informed guesses, needing to include the fully qualified template
# based on both the key of the item in the bucket as well as the bucket's host.
uri = URI.parse(file_uri)
scheme_and_host = "#{uri.scheme}://#{uri.host}"

bucket.objects(prefix: File.dirname(globname)).flat_map do |object|
derived_file_from(object.key) if object.key.match(regexp)
if object.key.match(regexp)
template = File.join(scheme_and_host, object.key)
derived_file_from(template: template)
end
end
end

Expand Down
File renamed without changes.
18 changes: 15 additions & 3 deletions spec/derivative_rodeo/storage_locations/s3_location_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
before do
# Let's use a FakeBucket instead!
subject.use_actual_s3_bucket = false

DerivativeRodeo.config do |config|
config.aws_s3_bucket = 'fake-bucket'
config.aws_s3_access_key_id = "FAKEFAKEFAKE"
Expand Down Expand Up @@ -66,11 +67,22 @@

describe '#globbed_tail_locations' do
it 'searched the bucket' do
basename_ish = short_path.split(".").first
key = File.join(basename_ish, File.basename(__FILE__))
# Because we instantiated the subject as a location to the :file_path (e.g. let(:file_path))
# we are encoding where things are relative to this file. In other words, this logic is
# mirroring the generator logic that says where we're writing derivatives relative to their
# original file/input file.
bucket_dir = "files/#{File.basename(file_path, '.tiff')}"

basename = File.basename(__FILE__)
key = File.join(bucket_dir, "pages", basename)
subject.bucket.object(key).upload_file(__FILE__)

subject.globbed_tail_locations(tail_glob: "*.rb")
non_matching_key = File.join(bucket_dir, "missing", basename)
subject.bucket.object(non_matching_key).upload_file(__FILE__)

locations = subject.globbed_tail_locations(tail_glob: "ocr_color/pages/*.rb")

expect(locations.size).to eq(1)
end
end
end
2 changes: 1 addition & 1 deletion spec/spec_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
# of increasing the boot-up time by auto-requiring all files in the support
# directory. Alternatively, in the individual `*_spec.rb` files, manually
# require only the support files necessary.
Dir.glob(File.expand_path("../lib/spec/support/**/*.rb", __dir__)).each { |f| require f }
Dir.glob(File.expand_path("../lib/spec_support/**/*.rb", __dir__)).each { |f| require f }
Dir.glob(File.expand_path('./support/**/*.rb', __dir__)).each { |f| require f }

RSpec.configure do |config|
Expand Down

0 comments on commit b5a16e2

Please sign in to comment.