Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedisVectorStore cannot be inserted under the same indexName when vector data is not added for the first time. #6644

Open
5 tasks done
zandko opened this issue Aug 27, 2024 · 1 comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@zandko
Copy link
Contributor

zandko commented Aug 27, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

RedisVectorStore.fromDocuments(documents, embeddings, {
   redisClient,
   indexName,
   keyPrefix,
});

Error Message and Stack Trace (if applicable)

No response

Description

When I need to add vector data to the same index, the data is normally in storage, but the index relationship is lost, and only the first batch of data added is normal.
IMG-2024-08-27-19 26 37@2x
IMG-2024-08-27-19 27 22@2x

System Info

node: v18.20.2
yarn: 1.22.22

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Aug 27, 2024
Copy link

dosubot bot commented Aug 27, 2024

The issue with RedisVectorStore losing the index relationship when adding vector data to the same indexName after the first insertion is likely due to the way the addVectors method handles the creation of the index. Specifically, the createIndex method is called every time addVectors is invoked, but it only creates the index if it does not already exist.

Here is the relevant part of the addVectors method:

async addVectors(
  vectors: number[][],
  documents: Document[],
  { keys, batchSize = 1000 }: RedisAddOptions = {}
) {
  if (!vectors.length || !vectors[0].length) {
    throw new Error("No vectors provided");
  }
  // check if the index exists and create it if it doesn't
  await this.createIndex(vectors[0].length);

  const info = await this.redisClient.ft.info(this.indexName);
  const lastKeyCount = parseInt(info.numDocs, 10) || 0;
  const multi = this.redisClient.multi();

  vectors.map(async (vector, idx) => {
    const key =
      keys && keys.length
        ? keys[idx]
        : `${this.keyPrefix}${idx + lastKeyCount}`;
    const metadata =
      documents[idx] && documents[idx].metadata
        ? documents[idx].metadata
        : {};

    multi.hSet(key, {
      [this.vectorKey]: this.getFloat32Buffer(vector),
      [this.contentKey]: documents[idx].pageContent,
      [this.metadataKey]: this.escapeSpecialChars(JSON.stringify(metadata)),
    });

    // write batch
    if (idx % batchSize === 0) {
      await multi.exec();
    }
  });

  // insert final batch
  await multi.exec();
}

The createIndex method is designed to create the index only if it does not already exist:

async createIndex(dimensions = 1536): Promise<void> {
  if (await this.checkIndexExists()) {
    return;
  }

  const schema: RediSearchSchema = {
    [this.vectorKey]: {
      type: SchemaFieldTypes.VECTOR,
      TYPE: "FLOAT32",
      DIM: dimensions,
      ...this.indexOptions,
    },
    [this.contentKey]: SchemaFieldTypes.TEXT,
    [this.metadataKey]: SchemaFieldTypes.TEXT,
  };

  await this.redisClient.ft.create(
    this.indexName,
    schema,
    this.createIndexOptions
  );
}

If the index already exists, the createIndex method does nothing, which means the existing index configuration is preserved. However, if there is any inconsistency or issue with the existing index, it might not be properly handled, leading to the loss of the index relationship.

To ensure the index relationship is maintained, you should verify that the index is correctly configured and that there are no issues with the existing index before adding new vectors. Additionally, ensure that the indexOptions and createIndexOptions are consistent across different insertions to avoid any potential conflicts [1][2].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant