Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎁 Adjust Hyrax::Actors::CreateWithRemoteFilesActor to Leverage the Derivative::Rodeo #218

Closed
Tracked by #421 ...
jeremyf opened this issue Apr 14, 2023 · 1 comment
Closed
Tracked by #421 ...

Comments

@jeremyf
Copy link
Contributor

jeremyf commented Apr 14, 2023

From the scientist-softserv/adventist_knapsack#406

In the DerivativeRodeo, we will be storing the original file in the following S3 Bucket format: :s3_bucket_preamble/:aark_id/:original_basename

For Adventist, we are looking at the fastest pathway to begin ingesting the pre-processed files.

In the CreateWithRemoteFilesActor we are downloading the original file. Whether that comes from the DerivativeRodeo preprocess space or the original location is not relevant, as they "should" be the same files. Later on we'll want to be checking if the derivatives exist in the DerivativeRodeo.

With that information, this ticket can be considered closed for the scope of Adventist.

Past Recommendation and Discussion based on "Dash" Rodeo

Recommendation

The Hyrax::Actors::CreateWithRemoteFilesActor#attach_files (see below for the v2.9.6 code) is the place where the code begins attaching the remote files to the works. At this point, we should instead demand from the rodeo the original file and attach accordingly.

def attach_files(env, remote_files)
  return true unless remote_files
  remote_files.each do |file_info|
    next if file_info.blank? || file_info[:url].blank?
    # Escape any space characters, so that this is a legal URI
    uri = URI.parse(Addressable::URI.escape(file_info[:url]))
    unless validate_remote_url(uri)
      Rails.logger.error "User #{env.user.user_key} attempted to ingest file from url #{file_info[:url]}, which doesn't pass
      return false
    end
    auth_header = file_info.fetch(:auth_header, {})
    create_file_from_url(env, uri, file_info[:file_name], auth_header)
  end
  true
end

Discussion

The attributes that we have for identifying an original file in the rodeo are:

  • Parent identifier
  • Path to the original
  • Original filename

In Adventist the <original filename> is calculated from the path to the original; so we can eliminate <original filename> as it is less precise than the path to the original.

In Hyrax::Actors::CreateWithRemoteFilesActor we would want to change the logic for the URL. What would that look like?

By design, the Derivative::Rodeo is about moving and providing files in a local context. That is we can ask if the Derivative::Rodeo has the file locally. Or we could demand that the Derivative::Rodeo make the file local (e.g. download if missing and generate if still not present).

Further, by design, the Derivative::Rodeo has no prerequisites for the original file. In other words, there would be no pre-requisite steps that would need to be run to “produce” the original file.

Exploring two pathways:

  1. Pull from the rodeo if it exists, else leverage existing Hyrax::Actors::CreateWithRemoteFilesActor.
  2. Demand from the rodeo, ignoring the existing process.

Regardless of pathway, we would still need to write the code for dealing with the rodeo file. It would likely look more like the uploaded file logic.

By favoring demand from the rodeo, we would have one logical pathway that replaces Hyrax logic. Meaning we would not have some Hyrax logic for one conditional and some Derivative::Rodeo (or other gem) with logic for the other pathway.

By favoring the pull from the rodeo, else strategy, we would introduce a contained branching condition. I suspect we’d want to refactor the logic of Hyrax’s actor to extract some common “service object” like behavior.

My inclination is to lean further on the Derivative::Rodeo. The actual Hyrax-based logic might make more sense in something like the IIIF Print gem. That is to say, don’t make Hyrax a dependency of the Derivative::Rodeo when we already have the Hyrax dependency in IIIF Print gem.

In other words, the rodeo knows of the processing, locating, and generating of files; but it does not know what is done with those files.

@jeremyf jeremyf changed the title Adjust Hyrax::Actors::CreateWithRemoteFilesActor to Leverage the Derivative::Rodeo 🎁 Adjust Hyrax::Actors::CreateWithRemoteFilesActor to Leverage the Derivative::Rodeo May 22, 2023
@jeremyf
Copy link
Contributor Author

jeremyf commented May 23, 2023

See the notes

@jeremyf jeremyf closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants