AutoDocMark leverages the MarkItDown library to convert documents into Markdown files. Use it effectively within your virtual environment.
To use MarkItDown in a virtual environment, follow these steps:
-
Create a virtual environment:
python3 -m venv my_markitdown_env source my_markitdown_env/bin/activate
-
Install MarkItDown directly from the repository:
pip install git+https://github.com/microsoft/markitdown.git
Convert a document file into Markdown using the provided convert_to_markdown.py script. Specify the input file as an argument:
python convert_to_markdown.py <file_path>
For example:
python convert_to_markdown.py example.docx
After execution, a Markdown file (e.g., example.md) will be created in the same directory as the input file.
- Flexible Input: Specify any supported document format (PDF, DOCX, XLSX, etc.).
- Simple Interface: Easy command-line usage for converting files.
- Customizable Output: Automatically saves the converted Markdown file in the same directory.
MarkItDown supports the following file types:
- Word (DOCX)
- Excel (XLSX)
- PowerPoint (PPTX)
- Images (with OCR capabilities)
- Audio files (with speech transcription)
- HTML
- CSV, JSON, XML
- ZIP files (processes contents)