Onboard neural sparse search #141

ohltyler · 2024-04-20T00:57:55Z

Description

This PR onboards an out-of-the-box neural sparse search workflow configuration by adding a base template, the base set of workspace nodes/edges, and updated/refactored logic in the parsing to produce a usable workflow template that can provision an ingest pipeline and index (and optionally a pretrained neural sparse model). More specifically:

adds OOTB sparse encoder pretrained models and adds them as options in ModelField
adds SparseEncoderTransformer ML transformer component
minor refactoring in Ingestor and QueryExecutor to handle neural sparse use case
refactoring in workflow_to_template_utils to handle neural sparse use case
added neural sparse template so it is propagated in the "Create new workflow" tab

Other minor changes, mostly related to readability:

adds a Document UI component for general readability and clearer understanding of the end-to-end ingest data flow
added logic to parse the edges and only include relevant ones in the downstream Workflow template (e.g., an edge to/from the UI-specific Document component should be ignored in the backend template)
minor changes to component names and input/outputs to match the data flow better
removes the create v. existing tabs in the component details component. for now, the scope is only creation
removes 'Search' meta block in DnD workspace

Testing:

fixed a bug of model ID not propagating to the ingest pipeline correctly for certain edge cases
ensured both pretrained and existing models work for semantic search case
ensured both pretrained and existing models work for neural sparse search case

Demo video:

creating and provisioning a neural sparse workflow using a pretrained sparse encoding model provided by ML commons plugin
ingesting some sample data
querying using a valid neural_sparse query clause

screen-capture.27.webm

Issues Resolved

Makes progress on #68

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Tyler Ohlsen <[email protected]>

Signed-off-by: Tyler Ohlsen <[email protected]> (cherry picked from commit 4d5f50c)

Signed-off-by: Tyler Ohlsen <[email protected]> (cherry picked from commit 4d5f50c) Co-authored-by: Tyler Ohlsen <[email protected]>

ohltyler added 5 commits April 19, 2024 11:31

Add Document component; filter out invalid edges during template cx

dca2c2c

Signed-off-by: Tyler Ohlsen <[email protected]>

Onboard neural sparse search use case

0021a04

Signed-off-by: Tyler Ohlsen <[email protected]>

refactor util fns

7e840b2

Signed-off-by: Tyler Ohlsen <[email protected]>

Finish onboarding all provisioning for neural sparse search

f215a29

Signed-off-by: Tyler Ohlsen <[email protected]>

Finish onboarding and edge case handling

fc83dad

Signed-off-by: Tyler Ohlsen <[email protected]>

ohltyler added rapid workflow prototyping labels Apr 20, 2024

ohltyler requested review from dbwiddis, owaiskazi19, joshpalis, amitgalitz and jackiehanyang as code owners April 20, 2024 00:57

opensearch-trigger-bot bot added the backport 2.x label Apr 20, 2024

Minor cleanup

ff09360

Signed-off-by: Tyler Ohlsen <[email protected]>

dbwiddis approved these changes Apr 20, 2024

View reviewed changes

minalsha approved these changes Apr 20, 2024

View reviewed changes

ohltyler merged commit 4d5f50c into opensearch-project:main Apr 21, 2024
6 checks passed

ohltyler deleted the neural-sparse branch April 21, 2024 18:12

opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 21, 2024

Onboard neural sparse search (#141)

0bcb09f

Signed-off-by: Tyler Ohlsen <[email protected]> (cherry picked from commit 4d5f50c)

opensearch-trigger-bot bot mentioned this pull request Apr 21, 2024

[Backport 2.x] Onboard neural sparse search #142

Merged

ohltyler added a commit that referenced this pull request Apr 22, 2024

Onboard neural sparse search (#141) (#142)

c3e4586

Signed-off-by: Tyler Ohlsen <[email protected]> (cherry picked from commit 4d5f50c) Co-authored-by: Tyler Ohlsen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onboard neural sparse search #141

Onboard neural sparse search #141

ohltyler commented Apr 20, 2024 •

edited

Loading

Onboard neural sparse search #141

Onboard neural sparse search #141

Conversation

ohltyler commented Apr 20, 2024 • edited Loading

Description

Issues Resolved

Check List

ohltyler commented Apr 20, 2024 •

edited

Loading