-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enhance installation documentation with critical prerequisites and ad…
…d embeddings setup guide
- Loading branch information
1 parent
7a6848b
commit f21761f
Showing
4 changed files
with
191 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# Embeddings Setup Guide | ||
|
||
## Prerequisites | ||
|
||
Before using Narr_ai_tive, you need to have document embeddings prepared in an `embeddings.json` file. This file contains vector representations of your documents that enable semantic search and context-aware story generation. | ||
|
||
## Embeddings File Structure | ||
|
||
The `embeddings.json` file should be placed in the `data/` directory and follow this structure: | ||
|
||
```json | ||
{ | ||
"documents": [ | ||
{ | ||
"id": "doc1", | ||
"text": "Original document text", | ||
"embedding": [0.1, 0.2, ...], // 1536-dimensional vector | ||
"metadata": { | ||
"title": "Document Title", | ||
"type": "character_profile", | ||
"tags": ["fantasy", "character"] | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
## Generating Embeddings | ||
|
||
1. **Prepare Your Documents** | ||
- Gather all documents you want to use for story generation | ||
- Supported formats: `.txt`, `.md`, `.pdf`, `.docx` | ||
|
||
2. **Install Required Tools** | ||
```bash | ||
pip install sentence-transformers | ||
``` | ||
|
||
3. **Generate Embeddings** | ||
```python | ||
from sentence_transformers import SentenceTransformer | ||
import json | ||
|
||
# Initialize the model | ||
model = SentenceTransformer('all-MiniLM-L6-v2') | ||
|
||
# Generate embeddings | ||
documents = [ | ||
{"text": "your document text", "title": "Doc Title"} | ||
# Add more documents... | ||
] | ||
|
||
for doc in documents: | ||
embedding = model.encode(doc["text"]) | ||
doc["embedding"] = embedding.tolist() | ||
|
||
# Save to JSON | ||
with open('data/embeddings.json', 'w') as f: | ||
json.dump({"documents": documents}, f) | ||
``` | ||
|
||
## Best Practices | ||
|
||
- Keep document chunks between 100-1000 words for optimal performance | ||
- Include relevant metadata for better context handling | ||
- Update embeddings when documents change | ||
- Use consistent document formatting | ||
|
||
## Validation | ||
|
||
You can validate your embeddings file using our utility: | ||
|
||
```bash | ||
narr_ai_tive validate-embeddings data/embeddings.json | ||
``` | ||
|
||
## Common Issues | ||
|
||
1. **Missing Embeddings** | ||
- Error: `FileNotFoundError: embeddings.json not found` | ||
- Solution: Ensure the file exists in the `data/` directory | ||
|
||
2. **Invalid Format** | ||
- Error: `InvalidEmbeddingsFormat: Invalid embeddings structure` | ||
- Solution: Verify JSON structure matches the required format | ||
|
||
3. **Incorrect Dimensions** | ||
- Error: `EmbeddingDimensionError: Expected 1536 dimensions` | ||
- Solution: Use compatible embedding model (we recommend all-MiniLM-L6-v2) | ||
|
||
## Support | ||
|
||
For more help with embeddings: | ||
- Check our [FAQ](docs/faq.md) | ||
- Join our [Discord community](#) | ||
- File an issue on GitHub | ||
|
||
--- | ||
|
||
**Note**: The quality of your story generation heavily depends on the quality and relevance of your document embeddings. Take time to prepare them properly. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Core AI and ML | ||
google-cloud-aiplatform>=1.36.0 | ||
sentence-transformers>=2.2.2 | ||
torch>=2.1.0 | ||
transformers>=4.36.0 | ||
numpy>=1.24.0 | ||
|
||
# Text Processing | ||
nltk>=3.8.1 | ||
spacy>=3.7.2 | ||
textblob>=0.17.1 | ||
python-docx>=1.0.0 | ||
markdown>=3.5.1 | ||
pdfminer.six>=20221105 | ||
|
||
# Semantic Search | ||
faiss-cpu>=1.7.4 | ||
annoy>=1.17.3 | ||
|
||
# UI and Terminal Interface | ||
rich>=13.7.0 | ||
textual>=0.44.1 | ||
prompt-toolkit>=3.0.41 | ||
tqdm>=4.66.1 | ||
|
||
# Data Handling | ||
pyyaml>=6.0.1 | ||
python-dotenv>=1.0.0 | ||
jsonschema>=4.20.0 | ||
|
||
# Development Tools | ||
hatch>=1.7.0 | ||
black>=23.12.0 | ||
mypy>=1.7.1 | ||
pytest>=7.4.3 | ||
pytest-cov>=4.1.0 | ||
|
||
# Export Formats | ||
pypdf>=3.17.1 | ||
python-docx>=1.0.0 | ||
ebooklib>=0.18 | ||
|
||
# Utilities | ||
requests>=2.31.0 | ||
pathlib>=1.0.1 | ||
typing-extensions>=4.8.0 | ||
loguru>=0.7.2 |