Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onboard neural sparse search #141

Merged
merged 6 commits into from
Apr 21, 2024

Conversation

ohltyler
Copy link
Member

@ohltyler ohltyler commented Apr 20, 2024

Description

This PR onboards an out-of-the-box neural sparse search workflow configuration by adding a base template, the base set of workspace nodes/edges, and updated/refactored logic in the parsing to produce a usable workflow template that can provision an ingest pipeline and index (and optionally a pretrained neural sparse model). More specifically:

  • adds OOTB sparse encoder pretrained models and adds them as options in ModelField
  • adds SparseEncoderTransformer ML transformer component
  • minor refactoring in Ingestor and QueryExecutor to handle neural sparse use case
  • refactoring in workflow_to_template_utils to handle neural sparse use case
  • added neural sparse template so it is propagated in the "Create new workflow" tab

Other minor changes, mostly related to readability:

  • adds a Document UI component for general readability and clearer understanding of the end-to-end ingest data flow
  • added logic to parse the edges and only include relevant ones in the downstream Workflow template (e.g., an edge to/from the UI-specific Document component should be ignored in the backend template)
  • minor changes to component names and input/outputs to match the data flow better
  • removes the create v. existing tabs in the component details component. for now, the scope is only creation
  • removes 'Search' meta block in DnD workspace

Testing:

  • fixed a bug of model ID not propagating to the ingest pipeline correctly for certain edge cases
  • ensured both pretrained and existing models work for semantic search case
  • ensured both pretrained and existing models work for neural sparse search case

Demo video:

  • creating and provisioning a neural sparse workflow using a pretrained sparse encoding model provided by ML commons plugin
  • ingesting some sample data
  • querying using a valid neural_sparse query clause
screen-capture.27.webm

Issues Resolved

Makes progress on #68

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Tyler Ohlsen <[email protected]>
@ohltyler ohltyler merged commit 4d5f50c into opensearch-project:main Apr 21, 2024
6 checks passed
@ohltyler ohltyler deleted the neural-sparse branch April 21, 2024 18:12
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 21, 2024
Signed-off-by: Tyler Ohlsen <[email protected]>
(cherry picked from commit 4d5f50c)
ohltyler added a commit that referenced this pull request Apr 22, 2024
Signed-off-by: Tyler Ohlsen <[email protected]>
(cherry picked from commit 4d5f50c)

Co-authored-by: Tyler Ohlsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants