You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
It's quite frustrating that a new contributor needs to create embeddings for all the chunks stored in scripts/assets/chunks.json. This means they have to spend a few cents on inference with either openai's text-embedding-3-small or with together AI's BAAI/bge-large-en-v1.5 to create embeddings for all the chunks there.
Describe the solution you'd like
I'd prefer it if the data is already embedded in its stored format (both with openai and together's embedder) so that there are no setup costs for the user (and it goes quicker too).
Describe alternatives you've considered
The alternative is what we currently have or that we give the contribs access to the neon database which already has the embedded data, but Neon branches should be reserved for internal collaborators.
Additional context
We will want this to be something that is also done for the dockerized postgres database.
The text was updated successfully, but these errors were encountered:
@alvaro-mazcu@louhelhir@iamrobzy I added this to your resource ingestion pipeline project since I think its something relevant to what you're working on. It's a much smaller task than the pipeline itself though, more of an afternoon implementation.
Is your feature request related to a problem? Please describe.
It's quite frustrating that a new contributor needs to create embeddings for all the chunks stored in
scripts/assets/chunks.json
. This means they have to spend a few cents on inference with either openai'stext-embedding-3-small
or with together AI'sBAAI/bge-large-en-v1.5
to create embeddings for all the chunks there.Describe the solution you'd like
I'd prefer it if the data is already embedded in its stored format (both with openai and together's embedder) so that there are no setup costs for the user (and it goes quicker too).
Describe alternatives you've considered
The alternative is what we currently have or that we give the contribs access to the neon database which already has the embedded data, but Neon branches should be reserved for internal collaborators.
Additional context
We will want this to be something that is also done for the dockerized postgres database.
The text was updated successfully, but these errors were encountered: