A REST API service that converts documents and web content to Markdown. Supports various file formats using Microsoft's MarkItDown and web content extraction using Trafilatura and python-readability.
- Document-to-Markdown conversion via file upload or URL
- Office documents (DOCX, XLSX, PPTX)
- PDF files
- Images (PNG, JPEG, GIF, WebP)
- Data files (CSV, JSON, XML)
- Web content extraction and conversion
- Primary extraction using Trafilatura
- Fallback to python-readability for robust content extraction
- Intelligent character encoding detection
- Clean Markdown output with preserved formatting
- Rich metadata for conversion results
- API Key-based authentication and OpenAPI documentation
- Robust content type detection and handling
- Python 3.8 or higher
- pip (Python package installer)
- Virtual environment (recommended)
- Clone the repository
git clone https://github.com/9bow/markitdown-api-fly-io.git
cd markitdown-api-fly-io
- Create and activate virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Configure environment variables
# Create .env file with the following variables (via .env.template)
cp .env.example .env
# Update the following variables in .env
VERSION=0.0.1
MAX_DOWNLOAD_SIZE=52428800 # 50MB in bytes
TIMEOUT_SECONDS=30
- Run development server
cd app/
uvicorn main:app --reload
- Install Fly.io CLI
curl -L https://fly.io/install.sh | sh
- Login and deploy
flyctl auth login
flyctl launch
flyctl secrets set API_KEY="your-secure-api-key"
flyctl deploy
All API endpoints require authentication using either:
- API key in the
X-API-Key
header - Bearer token in the
Authorization
header
curl -X GET \
-H "X-API-Key: your-secure-api-key" \
http://localhost:8000/health
# via file upload
curl -X POST \
-H "X-API-Key: your-secure-api-key" \
-F "[email protected]" \
http://localhost:8000/convert
# via file URL
curl -X POST \
-H "X-API-Key: your-secure-api-key" \
-F "url=https://example.com/document.pdf" \
http://localhost:8000/convert
Successful conversions return a JSON object with the following structure:
{
"result": "# Converted Markdown Content...",
"metadata": {
"content_type": "application/pdf",
"file_size": 12345,
"processing_time": 0.532,
"original_url": "https://example.com/document.pdf",
"conversion_method": "markitdown"
}
}
The API returns appropriate HTTP status codes and error messages:
- 400: Bad Request (invalid input)
- Unsupported file format
- Invalid URL
- Missing file/URL
- 401: Unauthorized (invalid API key)
- 408: Request Timeout
- 413: Payload Too Large (file size exceeds limit)
- 500: Internal Server Error
- PDF (
.pdf
) - Microsoft Word (
.docx
) - Microsoft Excel (
.xlsx
) - Microsoft PowerPoint (
.pptx
)
- HTML pages (
.html
,.htm
) - XML documents (
.xml
)
- CSV (
.csv
) - JSON (
.json
) - XML (
.xml
)
- JPEG (
.jpg
,.jpeg
) - PNG (
.png
) - GIF (
.gif
) - WebP (
.webp
)
pytest
mypy app/
This project is licensed under the MIT License - see the LICENSE file for details.