Integrate Apify Actors with Pinecone to seamlessly transfer and store data as vectors.
Explore how to utilize vector stores on the Apify platform by reading our blog post: Understanding Pinecone and Its Importance for Your LLMs.
This integration is designed to process and store data vectors from various Apify Actors. It interfaces with OpenAI
and Pinecone
through langchain
to perform the following steps:
- Retrieve Actor's dataset using
dataset_id
(automatically passed in integration). - Fetch the dataset using the
Apify SDK
. - [Optional] Segment text data into chunks with
langchain
'sRecursiveCharacterTextSplitter
(parameters likechunk_size
andchunk_overlap
are customizable). - Compute embeddings via
OpenAI
. - Store the resulting vectors in
Pinecone
.
Ensure you have the following prerequisites for this integration:
- An OpenAI account and API token. Sign up for a free account at OpenAI.
- A Pinecone database with a valid API KEY (
pinecone_token
).
Refer to the input schema for detailed information:
index_name
: Name of the Pinecone index.pinecone_token
: Your Pinecone access token (API KEY).openai_token
: Your OpenAI API token.fields
- Array of fields you want to save. For example, if you want to pushname
anduser.description
fields, you should set this field to["name", "user.description"]
.metadata_values
- Object of metadata values you want to save. For example, if you want to pushurl
andcreatedAt
values to Pinecone, you should set this field to{"url": "https://www.apify.com", "createdAt": "2021-09-01"}
.metadata_fields
- Object of metadata fields you want to save. For example, if you want to pushurl
andcreatedAt
fields, you should set this field to{"url": "url", "createdAt": "createdAt"}
. If it has the same key asmetadata_values
, it's replaced.chunk_size
: Maximum character length for each text chunk.chunk_overlap
: Overlap in characters between consecutive text chunks.
Fields, metadata_values
, and metadata_fields
support dot notation for nested data.
This integration saves selected fields from your Actor's output into your Pinecone database.
- Join our developer community on Discord to connect with other developers and discuss integrations.
- Visit Apify for data needs of your LLMs for tools to ingest comprehensive datasets from various sources, enriching your large language models.