Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --pinecone-dataset-limit option #44

Merged
merged 1 commit into from
Feb 29, 2024
Merged

Add --pinecone-dataset-limit option #44

merged 1 commit into from
Feb 29, 2024

Conversation

daverigby
Copy link
Collaborator

Feature

Add a new option to limit the number documents which should be loaded
from a dataset. This allows a workload to be generated based on a
given dataset but at a reduced document count.

Note: If the dataset includes an explicit 'queries' set then that
queries set is used unchanged, and hence Recall may be significanlty
reduced as the vectors the query expects to find nearby may not
exist. As such, it may be desirable to ignore the query set and
instead randomly sample from the (limited) documents set using the
--pinecone-dataset-ignore-queries option.

Type of Change

  • New feature (non-breaking change which adds functionality)

Test Plan

New unit and integration tests added.

Add a new option to limit the number documents which should be loaded
from a dataset. This allows a workload to be generated based on a
given dataset but at a reduced document count.

Note: If the dataset includes an explicit 'queries' set then that
queries set is used unchanged, and hence Recall may be significanlty
reduced as the vectors the query expects to find nearby may not
exist. As such, it may be desirable to ignore the query set and
instead randomly sample from the (limited) documents set using the
--pinecone-dataset-ignore-queries option.
@daverigby daverigby merged commit 5b2e646 into main Feb 29, 2024
7 checks passed
@daverigby daverigby deleted the dataset_limit branch February 29, 2024 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant