Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add contents of text like files to search engine #757

Open
johnf opened this issue May 8, 2024 · 3 comments
Open

Add contents of text like files to search engine #757

johnf opened this issue May 8, 2024 · 3 comments
Assignees

Comments

@johnf
Copy link
Member

johnf commented May 8, 2024

Now that we've switch to opensearch we should start indexing the contents of text like files/essences e.g. elan, pdf, rtf, doc and add these to the search engine.

@nthieberger
Copy link
Collaborator

Does that mean a seach within NABU would also find text within files? That is the virtue of the ROCrate solution so it is intersting if it could also available in NABU .... but maybe it iss best left to the new version if it wil take effort to include in NABU?

@johnf
Copy link
Member Author

johnf commented May 9, 2024

Yes, my thinking is that whenever an item that can be converted to text is uploaded, we add it to Elasticsearch and make the text searchable. I suspect I'd add it as part of the ingest pipeline.

Will need some thinking on the proper workflow and how to get it right but wouldn't be particularly difficult

Given that we are probably only talking about gigabytes of text, it might even make sense to add it to the database to make reindexing trivial.

@johnf
Copy link
Member Author

johnf commented May 9, 2024

Probably not something to work on straight away but I created some new issues as I was cleaning up GitHub yesterday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants