-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
🎁 Add ability to pass search term to PDF.js
This commit will add a PDF's text to the file set's solr document so it can be searched in the catalog. Now we can pass the search term to the viewer so when it loads it will highlight the search term.
- Loading branch information
Showing
3 changed files
with
45 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# frozen_string_literal: true | ||
|
||
# OVERRIDE Hyrax 3.5.0 to add PDF text to solr document when using the default PDF viewer (PDF.js) | ||
|
||
module Hyrax | ||
module FileSetIndexerDecorator | ||
def generate_solr_document | ||
return super unless Flipflop.default_pdf_viewer? | ||
|
||
super.tap do |solr_doc| | ||
solr_doc['all_text_timv'] = solr_doc['all_text_tsimv'] = pdf_text | ||
end | ||
end | ||
|
||
private | ||
|
||
def pdf_text | ||
return unless object.pdf? | ||
return unless object.original_file&.content.is_a? String | ||
|
||
text = IO.popen(['pdftotext', '-', '-'], 'r+b') do |pdftotext| | ||
pdftotext.write(object.original_file.content) | ||
pdftotext.close_write | ||
pdftotext.read | ||
end | ||
|
||
text.tr("\n", ' ') | ||
.squeeze(' ') | ||
.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') # remove non-UTF-8 characters | ||
end | ||
end | ||
end | ||
|
||
Hyrax::FileSetIndexer.prepend(Hyrax::FileSetIndexerDecorator) |