From f21761f475d77e11c6680706918b138289a0994f Mon Sep 17 00:00:00 2001 From: Dustin Smith Date: Mon, 23 Dec 2024 15:15:56 -0500 Subject: [PATCH] Enhance installation documentation with critical prerequisites and add embeddings setup guide --- README.md | 23 +++++++++ docs/embeddings_setup.md | 100 +++++++++++++++++++++++++++++++++++++++ docs/installation.md | 24 ++++++++-- requirements.txt | 47 ++++++++++++++++++ 4 files changed, 191 insertions(+), 3 deletions(-) create mode 100644 docs/embeddings_setup.md create mode 100644 requirements.txt diff --git a/README.md b/README.md index 4b93769..3e3ddb4 100644 --- a/README.md +++ b/README.md @@ -98,10 +98,33 @@ ### Prerequisites +Before installing Narr_ai_tive, ensure you have: + +#### Required Software - Python 3.8+ - [Hatchling](https://hatch.pypa.io/latest/) - Google API key (Gemini access) +#### Required Data Setup +- Document embeddings in `data/embeddings.json` + - See [Embeddings Setup Guide](docs/embeddings_setup.md) + - Must be generated before running the application + - Uses sentence-transformers format + - Required for semantic search functionality + +#### Required Files Structure +``` +data/ +├── embeddings.json # Your document embeddings +├── character_profiles.json +└── world_details.json +``` + +#### API Keys +- Google Gemini API key +- Store in `secrets.yaml` +- Never commit this file to version control + ### Installation ```bash diff --git a/docs/embeddings_setup.md b/docs/embeddings_setup.md new file mode 100644 index 0000000..4e89a20 --- /dev/null +++ b/docs/embeddings_setup.md @@ -0,0 +1,100 @@ +# Embeddings Setup Guide + +## Prerequisites + +Before using Narr_ai_tive, you need to have document embeddings prepared in an `embeddings.json` file. This file contains vector representations of your documents that enable semantic search and context-aware story generation. + +## Embeddings File Structure + +The `embeddings.json` file should be placed in the `data/` directory and follow this structure: + +```json +{ + "documents": [ + { + "id": "doc1", + "text": "Original document text", + "embedding": [0.1, 0.2, ...], // 1536-dimensional vector + "metadata": { + "title": "Document Title", + "type": "character_profile", + "tags": ["fantasy", "character"] + } + } + ] +} +``` + +## Generating Embeddings + +1. **Prepare Your Documents** + - Gather all documents you want to use for story generation + - Supported formats: `.txt`, `.md`, `.pdf`, `.docx` + +2. **Install Required Tools** + ```bash + pip install sentence-transformers + ``` + +3. **Generate Embeddings** + ```python + from sentence_transformers import SentenceTransformer + import json + + # Initialize the model + model = SentenceTransformer('all-MiniLM-L6-v2') + + # Generate embeddings + documents = [ + {"text": "your document text", "title": "Doc Title"} + # Add more documents... + ] + + for doc in documents: + embedding = model.encode(doc["text"]) + doc["embedding"] = embedding.tolist() + + # Save to JSON + with open('data/embeddings.json', 'w') as f: + json.dump({"documents": documents}, f) + ``` + +## Best Practices + +- Keep document chunks between 100-1000 words for optimal performance +- Include relevant metadata for better context handling +- Update embeddings when documents change +- Use consistent document formatting + +## Validation + +You can validate your embeddings file using our utility: + +```bash +narr_ai_tive validate-embeddings data/embeddings.json +``` + +## Common Issues + +1. **Missing Embeddings** + - Error: `FileNotFoundError: embeddings.json not found` + - Solution: Ensure the file exists in the `data/` directory + +2. **Invalid Format** + - Error: `InvalidEmbeddingsFormat: Invalid embeddings structure` + - Solution: Verify JSON structure matches the required format + +3. **Incorrect Dimensions** + - Error: `EmbeddingDimensionError: Expected 1536 dimensions` + - Solution: Use compatible embedding model (we recommend all-MiniLM-L6-v2) + +## Support + +For more help with embeddings: +- Check our [FAQ](docs/faq.md) +- Join our [Discord community](#) +- File an issue on GitHub + +--- + +**Note**: The quality of your story generation heavily depends on the quality and relevance of your document embeddings. Take time to prepare them properly. diff --git a/docs/installation.md b/docs/installation.md index f3d44c4..2724240 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -1,12 +1,30 @@ # Installation Guide -## Prerequisites +## Critical Prerequisites +### 1. Document Embeddings +Before installation, you **must** have: +- Generated document embeddings using sentence-transformers +- Placed them in `data/embeddings.json` +- See [Embeddings Setup Guide](embeddings_setup.md) for details + +### 2. System Requirements - Python 3.8+ - [Hatchling](https://hatch.pypa.io/latest/) - Google API key (Gemini access) - -## Steps +- 16GB RAM recommended +- 2GB free disk space + +### 3. Required Files +Ensure these files exist: +``` +data/ +├── embeddings.json # Required before first run +├── character_profiles.json # Can be empty, but must exist +└── world_details.json # Can be empty, but must exist +``` + +## Installation Steps 1. **Clone the repository** diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..741a202 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,47 @@ +# Core AI and ML +google-cloud-aiplatform>=1.36.0 +sentence-transformers>=2.2.2 +torch>=2.1.0 +transformers>=4.36.0 +numpy>=1.24.0 + +# Text Processing +nltk>=3.8.1 +spacy>=3.7.2 +textblob>=0.17.1 +python-docx>=1.0.0 +markdown>=3.5.1 +pdfminer.six>=20221105 + +# Semantic Search +faiss-cpu>=1.7.4 +annoy>=1.17.3 + +# UI and Terminal Interface +rich>=13.7.0 +textual>=0.44.1 +prompt-toolkit>=3.0.41 +tqdm>=4.66.1 + +# Data Handling +pyyaml>=6.0.1 +python-dotenv>=1.0.0 +jsonschema>=4.20.0 + +# Development Tools +hatch>=1.7.0 +black>=23.12.0 +mypy>=1.7.1 +pytest>=7.4.3 +pytest-cov>=4.1.0 + +# Export Formats +pypdf>=3.17.1 +python-docx>=1.0.0 +ebooklib>=0.18 + +# Utilities +requests>=2.31.0 +pathlib>=1.0.1 +typing-extensions>=4.8.0 +loguru>=0.7.2