An automated document analyzer for Paperless-ngx using OpenAI API, Ollama and all OpenAI API compatible Services to automatically analyze and tag your documents.
It features: Automode, Manual Mode, Ollama and OpenAI, a Chat function to query your documents with AI, a modern and intuitive Webinterface.
Following Services and OpenAI API compatible services have been successfully tested:
- Ollama
- OpenAI
- DeepSeek.ai
- OpenRouter.ai
- Perplexity.ai
- Together.ai
- VLLM
- LiteLLM
- Fastchat
- Gemini (Google)
- ... and there are possibly many more
paperless-ai makes changes to the documents (not the content, the meta data) in your productive paperlessNGX instance that cannot be easily undone. Do the configuration carefully and think twice. Please test the results beforehand in a separate development environment and be sure to back up your documents and metadata beforehand.
Download here: https://github.com/clusterzx/paperless-ai/blob/main/paperless-ai-chrome-extension.zip
π Thank you for all your support, bug submit, feature requests π
If you upgrade from 1.x to 2.1.x or later:
- You are now forced to setup a user as the Webapp now requires authentication. I know many of you only use it in secured and encapsulated networks and don't care for authentication. But I think we all do good when we secure out data as much as possible.
- You have to set the username the token key belongs to. There were so many bugs and issues that were traced back to Documents having other user/access rights then the api key could provide.
- Thanks for listening, love ya!
- Automatic Scanning: Identifies and processes new documents within Paperless-ngx.
- AI-Powered Analysis: Leverages OpenAI API and Ollama (Mistral, Llama, Phi 3, Gemma 2) for precise document analysis.
- Metadata Assignment: Automatically assigns titles, tags, document_type and correspondent details.
- Predefined Processing Rules: Specify which documents to process based on existing tags. (Optional) π
- Selective Tag Assignment: Use only selected tags for processing. (Disables the prompt dialog) π
- Custom Tagging: Assign a specific tag (of your choice) to AI-processed documents for easy identification. π
- AI-Assisted Analysis: Manually analyze documents with AI support in a modern web interface. (Accessible via the
/manual
endpoint) π
- Document Querying: Ask questions about your documents and receive accurate, AI-generated answers. π
- Streamlined Configuration: Easy-to-use setup interface available at
/setup
. - Dashboard Overview: A clean and organized dashboard for monitoring and managing document processing.
- Error Handling: Automatic restarts and health monitoring for improved stability.
- Health Checks: Ensures system integrity and alerts on potential issues.
- Docker Integration: Full Docker support, including health checks, resource management, and persistent data storage.
- Docker and Docker Compose
- Access to a Paperless-ngx installation
- Cloud based LLM access (see list on top) or your own Ollama or other selfhosted LLM instance with your chosen model running and reachable.
- Basic understanding of cron syntax (for scan interval configuration)
Visit the Wiki for installation:
Click here for Installation
-
Document Discovery
- Periodically scans Paperless-ngx for new documents
- Tracks processed documents in a local SQLite database
-
AI Analysis
- Sends document content to OpenAI API or Ollama for analysis
- Extracts relevant tags and correspondent information
- Uses GPT-4o-mini or your custom Ollama model for accurate document understanding
-
Automatic Organization
- Creates new tags if they don't exist
- Creates new correspondents if they don't exist
- Updates documents with analyzed information
- Marks documents as processed to avoid duplicate analysis
You can now manually analyze your files by hand with the help of AI in a beautiful Webinterface.
Reachable via the /manual
endpoint from the webinterface.
The application can be configured through the Webinterface on the /setup
Route.
You dont need/can't set the environment vars through docker.
The application comes with full Docker support:
- Automatic container restart on failure
- Health monitoring
- Volume persistence for database
- Resource management
- Graceful shutdown handling
# Start the container
docker-compose up -d
# View logs
docker-compose logs -f
# Restart container
docker-compose restart
# Stop container
docker-compose down
# Rebuild and start
docker-compose up -d --build
The application provides a health check endpoint at /health
that returns:
# Healthy system
{
"status": "healthy"
}
# System not configured
{
"status": "not_configured",
"message": "Application setup not completed"
}
# Database error
{
"status": "database_error",
"message": "Database check failed"
}
The application includes a debug interface accessible via /debug
that helps administrators monitor and troubleshoot the system's data:
- π View all system tags
- π Inspect processed documents
- π₯ Review correspondent information
- Navigate to:
http://your-instance:3000/debug
- The interface provides:
- Interactive dropdown to select data category
- Tree view visualization of JSON responses
- Color-coded data representation
- Collapsible/expandable data nodes
Endpoint | Description |
---|---|
/debug/tags | Lists all tags in the system |
/debug/documents | Shows processed document information |
/debug/correspondents | Displays correspondent data |
The debug interface also integrates with the health check system, showing a configuration warning if the system is not properly set up.
To run the application locally without Docker:
- Install dependencies:
npm install
- Start the development server:
npm run test
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- Store API keys securely
- Restrict container access
- Monitor API usage
- Regularly update dependencies
- Back up your database
This project is licensed under the MIT License - see the LICENSE file for details.
- Paperless-ngx for the amazing document management system
- OpenAI API
- The Express.js and Node.js communities for their excellent tools
If you encounter any issues or have questions:
- Check the Issues section
- Create a new issue if yours isn't already listed
- Provide detailed information about your setup and the problem
- Support for custom AI models
- Support for multiple language analysis
- Advanced tag matching algorithms
- Custom rules for document processing
- Enhanced web interface with statistics