Enhance installation documentation with critical prerequisites and ad…

…d embeddings setup guide
wh0isdsmith · Dec 23, 2024 · f21761f · f21761f
1 parent 7a6848b
commit f21761f
Show file tree

Hide file tree

Showing 4 changed files with 191 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -98,10 +98,33 @@
 
 ### Prerequisites
 
+Before installing Narr_ai_tive, ensure you have:
+
+#### Required Software
 - Python 3.8+
 - [Hatchling](https://hatch.pypa.io/latest/)
 - Google API key (Gemini access)
 
+#### Required Data Setup
+- Document embeddings in `data/embeddings.json`
+  - See [Embeddings Setup Guide](docs/embeddings_setup.md)
+  - Must be generated before running the application
+  - Uses sentence-transformers format
+  - Required for semantic search functionality
+
+#### Required Files Structure
+```
+data/
+├── embeddings.json     # Your document embeddings
+├── character_profiles.json
+└── world_details.json
+```
+
+#### API Keys
+- Google Gemini API key
+- Store in `secrets.yaml`
+- Never commit this file to version control
+
 ### Installation
 
 ```bash

diff --git a/docs/embeddings_setup.md b/docs/embeddings_setup.md
@@ -0,0 +1,100 @@
+# Embeddings Setup Guide
+
+## Prerequisites
+
+Before using Narr_ai_tive, you need to have document embeddings prepared in an `embeddings.json` file. This file contains vector representations of your documents that enable semantic search and context-aware story generation.
+
+## Embeddings File Structure
+
+The `embeddings.json` file should be placed in the `data/` directory and follow this structure:
+
+```json
+{
+  "documents": [
+    {
+      "id": "doc1",
+      "text": "Original document text",
+      "embedding": [0.1, 0.2, ...],  // 1536-dimensional vector
+      "metadata": {
+        "title": "Document Title",
+        "type": "character_profile",
+        "tags": ["fantasy", "character"]
+      }
+    }
+  ]
+}
+```
+
+## Generating Embeddings
+
+1. **Prepare Your Documents**
+   - Gather all documents you want to use for story generation
+   - Supported formats: `.txt`, `.md`, `.pdf`, `.docx`
+
+2. **Install Required Tools**
+   ```bash
+   pip install sentence-transformers
+   ```
+
+3. **Generate Embeddings**
+   ```python
+   from sentence_transformers import SentenceTransformer
+   import json
+
+   # Initialize the model
+   model = SentenceTransformer('all-MiniLM-L6-v2')
+
+   # Generate embeddings
+   documents = [
+       {"text": "your document text", "title": "Doc Title"}
+       # Add more documents...
+   ]
+
+   for doc in documents:
+       embedding = model.encode(doc["text"])
+       doc["embedding"] = embedding.tolist()
+
+   # Save to JSON
+   with open('data/embeddings.json', 'w') as f:
+       json.dump({"documents": documents}, f)
+   ```
+
+## Best Practices
+
+- Keep document chunks between 100-1000 words for optimal performance
+- Include relevant metadata for better context handling
+- Update embeddings when documents change
+- Use consistent document formatting
+
+## Validation
+
+You can validate your embeddings file using our utility:
+
+```bash
+narr_ai_tive validate-embeddings data/embeddings.json
+```
+
+## Common Issues
+
+1. **Missing Embeddings**
+   - Error: `FileNotFoundError: embeddings.json not found`
+   - Solution: Ensure the file exists in the `data/` directory
+
+2. **Invalid Format**
+   - Error: `InvalidEmbeddingsFormat: Invalid embeddings structure`
+   - Solution: Verify JSON structure matches the required format
+
+3. **Incorrect Dimensions**
+   - Error: `EmbeddingDimensionError: Expected 1536 dimensions`
+   - Solution: Use compatible embedding model (we recommend all-MiniLM-L6-v2)
+
+## Support
+
+For more help with embeddings:
+- Check our [FAQ](docs/faq.md)
+- Join our [Discord community](#)
+- File an issue on GitHub
+
+---
+
+**Note**: The quality of your story generation heavily depends on the quality and relevance of your document embeddings. Take time to prepare them properly.
diff --git a/docs/installation.md b/docs/installation.md
@@ -1,12 +1,30 @@
 # Installation Guide
 
-## Prerequisites
+## Critical Prerequisites
 
+### 1. Document Embeddings
+Before installation, you **must** have:
+- Generated document embeddings using sentence-transformers
+- Placed them in `data/embeddings.json`
+- See [Embeddings Setup Guide](embeddings_setup.md) for details
+
+### 2. System Requirements
 - Python 3.8+
 - [Hatchling](https://hatch.pypa.io/latest/)
 - Google API key (Gemini access)
-
-## Steps
+- 16GB RAM recommended
+- 2GB free disk space
+
+### 3. Required Files
+Ensure these files exist:
+```
+data/
+├── embeddings.json           # Required before first run
+├── character_profiles.json   # Can be empty, but must exist
+└── world_details.json       # Can be empty, but must exist
+```
+
+## Installation Steps
 
 1. **Clone the repository**
 

diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,47 @@
+# Core AI and ML
+google-cloud-aiplatform>=1.36.0
+sentence-transformers>=2.2.2
+torch>=2.1.0
+transformers>=4.36.0
+numpy>=1.24.0
+
+# Text Processing
+nltk>=3.8.1
+spacy>=3.7.2
+textblob>=0.17.1
+python-docx>=1.0.0
+markdown>=3.5.1
+pdfminer.six>=20221105
+
+# Semantic Search
+faiss-cpu>=1.7.4
+annoy>=1.17.3
+
+# UI and Terminal Interface
+rich>=13.7.0
+textual>=0.44.1
+prompt-toolkit>=3.0.41
+tqdm>=4.66.1
+
+# Data Handling
+pyyaml>=6.0.1
+python-dotenv>=1.0.0
+jsonschema>=4.20.0
+
+# Development Tools
+hatch>=1.7.0
+black>=23.12.0
+mypy>=1.7.1
+pytest>=7.4.3
+pytest-cov>=4.1.0
+
+# Export Formats
+pypdf>=3.17.1
+python-docx>=1.0.0
+ebooklib>=0.18
+
+# Utilities
+requests>=2.31.0
+pathlib>=1.0.1
+typing-extensions>=4.8.0
+loguru>=0.7.2