You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the DerivativeRodeo, we will be storing the original file in the following S3 Bucket format: :s3_bucket_preamble/:aark_id/:original_basename
For Adventist, we are looking at the fastest pathway to begin ingesting the pre-processed files.
In the CreateWithRemoteFilesActor we are downloading the original file. Whether that comes from the DerivativeRodeo preprocess space or the original location is not relevant, as they "should" be the same files. Later on we'll want to be checking if the derivatives exist in the DerivativeRodeo.
With that information, this ticket can be considered closed for the scope of Adventist.
Past Recommendation and Discussion based on "Dash" Rodeo
Recommendation
The Hyrax::Actors::CreateWithRemoteFilesActor#attach_files (see below for the v2.9.6 code) is the place where the code begins attaching the remote files to the works. At this point, we should instead demand from the rodeo the original file and attach accordingly.
defattach_files(env,remote_files)returntrueunlessremote_filesremote_files.eachdo |file_info|
nextiffile_info.blank? || file_info[:url].blank?# Escape any space characters, so that this is a legal URIuri=URI.parse(Addressable::URI.escape(file_info[:url]))unlessvalidate_remote_url(uri)Rails.logger.error"User #{env.user.user_key} attempted to ingest file from url #{file_info[:url]},whichdoesn'tpassreturnfalseendauth_header=file_info.fetch(:auth_header,{})create_file_from_url(env,uri,file_info[:file_name],auth_header)endtrueend
Discussion
The attributes that we have for identifying an original file in the rodeo are:
Parent identifier
Path to the original
Original filename
In Adventist the <original filename> is calculated from the path to the original; so we can eliminate <original filename> as it is less precise than the path to the original.
By design, the Derivative::Rodeo is about moving and providing files in a local context. That is we can ask if the Derivative::Rodeo has the file locally. Or we could demand that the Derivative::Rodeo make the file local (e.g. download if missing and generate if still not present).
Further, by design, the Derivative::Rodeo has no prerequisites for the original file. In other words, there would be no pre-requisite steps that would need to be run to “produce” the original file.
Demand from the rodeo, ignoring the existing process.
Regardless of pathway, we would still need to write the code for dealing with the rodeo file. It would likely look more like the uploaded file logic.
By favoring demand from the rodeo, we would have one logical pathway that replaces Hyrax logic. Meaning we would not have some Hyrax logic for one conditional and some Derivative::Rodeo (or other gem) with logic for the other pathway.
By favoring the pull from the rodeo, else strategy, we would introduce a contained branching condition. I suspect we’d want to refactor the logic of Hyrax’s actor to extract some common “service object” like behavior.
My inclination is to lean further on the Derivative::Rodeo. The actual Hyrax-based logic might make more sense in something like the IIIF Print gem. That is to say, don’t make Hyrax a dependency of the Derivative::Rodeo when we already have the Hyrax dependency in IIIF Print gem.
In other words, the rodeo knows of the processing, locating, and generating of files; but it does not know what is done with those files.
The text was updated successfully, but these errors were encountered:
jeremyf
changed the title
Adjust Hyrax::Actors::CreateWithRemoteFilesActor to Leverage the Derivative::Rodeo
🎁 Adjust Hyrax::Actors::CreateWithRemoteFilesActor to Leverage the Derivative::Rodeo
May 22, 2023
From the scientist-softserv/adventist_knapsack#406
In the DerivativeRodeo, we will be storing the original file in the following S3 Bucket format:
:s3_bucket_preamble/:aark_id/:original_basename
For Adventist, we are looking at the fastest pathway to begin ingesting the pre-processed files.
In the
CreateWithRemoteFilesActor
we are downloading the original file. Whether that comes from the DerivativeRodeo preprocess space or the original location is not relevant, as they "should" be the same files. Later on we'll want to be checking if the derivatives exist in the DerivativeRodeo.With that information, this ticket can be considered closed for the scope of Adventist.
Past Recommendation and Discussion based on "Dash" Rodeo
Recommendation
The Hyrax::Actors::CreateWithRemoteFilesActor#attach_files (see below for the v2.9.6 code) is the place where the code begins attaching the remote files to the works. At this point, we should instead demand from the rodeo the original file and attach accordingly.
Discussion
The attributes that we have for identifying an original file in the rodeo are:
Parent identifier
Path to the original
Original filename
In Adventist the
<original filename>
is calculated from the path to the original; so we can eliminate<original filename>
as it is less precise than the path to the original.In Hyrax::Actors::CreateWithRemoteFilesActor we would want to change the logic for the URL. What would that look like?
By design, the Derivative::Rodeo is about moving and providing files in a local context. That is we can ask if the Derivative::Rodeo has the file locally. Or we could demand that the Derivative::Rodeo make the file local (e.g. download if missing and generate if still not present).
Further, by design, the Derivative::Rodeo has no prerequisites for the original file. In other words, there would be no pre-requisite steps that would need to be run to “produce” the original file.
Exploring two pathways:
Regardless of pathway, we would still need to write the code for dealing with the rodeo file. It would likely look more like the uploaded file logic.
By favoring demand from the rodeo, we would have one logical pathway that replaces Hyrax logic. Meaning we would not have some Hyrax logic for one conditional and some Derivative::Rodeo (or other gem) with logic for the other pathway.
By favoring the pull from the rodeo, else strategy, we would introduce a contained branching condition. I suspect we’d want to refactor the logic of Hyrax’s actor to extract some common “service object” like behavior.
My inclination is to lean further on the Derivative::Rodeo. The actual Hyrax-based logic might make more sense in something like the IIIF Print gem. That is to say, don’t make Hyrax a dependency of the Derivative::Rodeo when we already have the Hyrax dependency in IIIF Print gem.
In other words, the rodeo knows of the processing, locating, and generating of files; but it does not know what is done with those files.
The text was updated successfully, but these errors were encountered: