Skip to content

Commit

Permalink
Enhance installation documentation with critical prerequisites and ad…
Browse files Browse the repository at this point in the history
…d embeddings setup guide
  • Loading branch information
whoisdsmith committed Dec 23, 2024
1 parent 7a6848b commit f21761f
Show file tree
Hide file tree
Showing 4 changed files with 191 additions and 3 deletions.
23 changes: 23 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,33 @@

### Prerequisites

Before installing Narr_ai_tive, ensure you have:

#### Required Software
- Python 3.8+
- [Hatchling](https://hatch.pypa.io/latest/)
- Google API key (Gemini access)

#### Required Data Setup
- Document embeddings in `data/embeddings.json`
- See [Embeddings Setup Guide](docs/embeddings_setup.md)
- Must be generated before running the application
- Uses sentence-transformers format
- Required for semantic search functionality

#### Required Files Structure
```
data/
├── embeddings.json # Your document embeddings
├── character_profiles.json
└── world_details.json
```

#### API Keys
- Google Gemini API key
- Store in `secrets.yaml`
- Never commit this file to version control

### Installation

```bash
Expand Down
100 changes: 100 additions & 0 deletions docs/embeddings_setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Embeddings Setup Guide

## Prerequisites

Before using Narr_ai_tive, you need to have document embeddings prepared in an `embeddings.json` file. This file contains vector representations of your documents that enable semantic search and context-aware story generation.

## Embeddings File Structure

The `embeddings.json` file should be placed in the `data/` directory and follow this structure:

```json
{
"documents": [
{
"id": "doc1",
"text": "Original document text",
"embedding": [0.1, 0.2, ...], // 1536-dimensional vector
"metadata": {
"title": "Document Title",
"type": "character_profile",
"tags": ["fantasy", "character"]
}
}
]
}
```

## Generating Embeddings

1. **Prepare Your Documents**
- Gather all documents you want to use for story generation
- Supported formats: `.txt`, `.md`, `.pdf`, `.docx`

2. **Install Required Tools**
```bash
pip install sentence-transformers
```

3. **Generate Embeddings**
```python
from sentence_transformers import SentenceTransformer
import json

# Initialize the model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings
documents = [
{"text": "your document text", "title": "Doc Title"}
# Add more documents...
]

for doc in documents:
embedding = model.encode(doc["text"])
doc["embedding"] = embedding.tolist()

# Save to JSON
with open('data/embeddings.json', 'w') as f:
json.dump({"documents": documents}, f)
```

## Best Practices

- Keep document chunks between 100-1000 words for optimal performance
- Include relevant metadata for better context handling
- Update embeddings when documents change
- Use consistent document formatting

## Validation

You can validate your embeddings file using our utility:

```bash
narr_ai_tive validate-embeddings data/embeddings.json
```

## Common Issues

1. **Missing Embeddings**
- Error: `FileNotFoundError: embeddings.json not found`
- Solution: Ensure the file exists in the `data/` directory

2. **Invalid Format**
- Error: `InvalidEmbeddingsFormat: Invalid embeddings structure`
- Solution: Verify JSON structure matches the required format

3. **Incorrect Dimensions**
- Error: `EmbeddingDimensionError: Expected 1536 dimensions`
- Solution: Use compatible embedding model (we recommend all-MiniLM-L6-v2)

## Support

For more help with embeddings:
- Check our [FAQ](docs/faq.md)
- Join our [Discord community](#)
- File an issue on GitHub

---

**Note**: The quality of your story generation heavily depends on the quality and relevance of your document embeddings. Take time to prepare them properly.
24 changes: 21 additions & 3 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,30 @@
# Installation Guide

## Prerequisites
## Critical Prerequisites

### 1. Document Embeddings
Before installation, you **must** have:
- Generated document embeddings using sentence-transformers
- Placed them in `data/embeddings.json`
- See [Embeddings Setup Guide](embeddings_setup.md) for details

### 2. System Requirements
- Python 3.8+
- [Hatchling](https://hatch.pypa.io/latest/)
- Google API key (Gemini access)

## Steps
- 16GB RAM recommended
- 2GB free disk space

### 3. Required Files
Ensure these files exist:
```
data/
├── embeddings.json # Required before first run
├── character_profiles.json # Can be empty, but must exist
└── world_details.json # Can be empty, but must exist
```

## Installation Steps

1. **Clone the repository**

Expand Down
47 changes: 47 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Core AI and ML
google-cloud-aiplatform>=1.36.0
sentence-transformers>=2.2.2
torch>=2.1.0
transformers>=4.36.0
numpy>=1.24.0

# Text Processing
nltk>=3.8.1
spacy>=3.7.2
textblob>=0.17.1
python-docx>=1.0.0
markdown>=3.5.1
pdfminer.six>=20221105

# Semantic Search
faiss-cpu>=1.7.4
annoy>=1.17.3

# UI and Terminal Interface
rich>=13.7.0
textual>=0.44.1
prompt-toolkit>=3.0.41
tqdm>=4.66.1

# Data Handling
pyyaml>=6.0.1
python-dotenv>=1.0.0
jsonschema>=4.20.0

# Development Tools
hatch>=1.7.0
black>=23.12.0
mypy>=1.7.1
pytest>=7.4.3
pytest-cov>=4.1.0

# Export Formats
pypdf>=3.17.1
python-docx>=1.0.0
ebooklib>=0.18

# Utilities
requests>=2.31.0
pathlib>=1.0.1
typing-extensions>=4.8.0
loguru>=0.7.2

0 comments on commit f21761f

Please sign in to comment.