Replies: 2 comments
-
🤖 Based on your code snippet, you are already using the However, to ensure that the new data is saved and the existing data is not overwritten, you need to persist the index after inserting the new documents. You can do this by calling the for doc in docs:
new_index.insert(doc)
# Persist the index after inserting the new document
new_index.storage_context.persist("./stor")
print(f"Moving to already indexed folder {doc}")
new_path = os.path.join(indexed_dir_path, f)
shutil.move(new_path_for_index, new_path) The Please note that persisting the index after inserting each document might be slow if you have a large number of documents. If this is the case, you might want to consider persisting the index after inserting a batch of documents instead of after inserting each document. I hope this helps! If you have any other questions, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
May I know how to have the document update to the current index name without overwrite: new_index = VectorStoreIndex.from_documents([], vectorstore=vector_store, files = os.listdir(dir_path) num_batches = len(files) // batch_size + (len(files) % batch_size > 0) for i in range(num_batches):
|
Beta Was this translation helpful? Give feedback.
-
Question Validation
Question
`import os
from pathlib import Path
import shutil
from llama_index import (
VectorStoreIndex,
SimpleDirectoryReader,
load_index_from_storage,
StorageContext,
)
from IPython.display import Markdown, display
new_index = VectorStoreIndex.from_documents([], vectorstore=vector_store,
storage_context=storage_context,
service_context=service_context,
show_progress=True)
new_index.set_index_id("vector_index")
index.storage_context.persist("./stor")
batch_size = 5
Get the list of files from the directory
files = os.listdir(dir_path)
Calculate the number of batches
num_batches = len(files) // batch_size + (len(files) % batch_size > 0)
for i in range(num_batches):
# Select files for the current batch
batch_files = files[i*batch_size : (i+1)*batch_size]
Here i try to index one file each time, by using index.insert, my question is how to save the new data based on current index not covered it
Beta Was this translation helpful? Give feedback.
All reactions