This Python script extracts text from images using OpenAI's GPT-4 API. The script supports both image URLs and base64-encoded images as inputs. Extracted text can be saved to a local file for further use.
- Text Extraction from Image URLs: Extract text directly from an image available online.
- Text Extraction from Base64 Images: Extract text from locally stored images by converting them into base64 encoding.
- Accurate Text Parsing: Ensures completeness by carefully extracting all text elements, including faint or small text, and preserving the original structure.
- Output to File: Saves extracted text to a specified file.
- Python 3.7+
- OpenAI Python SDK
- Base64 library (standard library)
- Clone this repository or download the script.
- Install the required dependencies:
pip install openai
- Ensure you have an OpenAI API key.
- Uncomment the relevant section in the script:
# image_url = "https://example.com/image.jpg" # extracted_text = image_to_text_from_url(image_url) # output_file_path = "path/to/extracted_text.txt" # with open(output_file_path, "a", encoding="utf-8") as text_file: # text_file.write(extracted_text) # text_file.write("\n")
- Replace the
image_url
with the URL of your image. - Run the script and view the extracted text in the specified file.
- Provide the path to your local image:
local_image_path = "path/to/image.png"
- The script will automatically convert the image to base64 encoding and extract text.
- View the extracted text in the console or in the specified output file.
-
image_to_base64(image_path)
- Converts a local image to base64 encoding.
-
image_to_text_from_url(image_url)
- Extracts text from an image using its URL.
-
image_to_text_from_base64(image_base64)
- Extracts text from an image provided in base64 format.
- Input: Image URL or local image file.
- Process:
- For URLs: Direct API call.
- For local files: Convert to base64, then API call.
- Output: Extracted text saved to a file.
- Update your OpenAI API key in the
client
initialization:client = OpenAI( api_key="your-openai-api-key" )
- Specify the file path for saving the extracted text.
- The script requires an active OpenAI API key with appropriate permissions.
- Limited by the capabilities of the GPT-4 model for OCR tasks.
Feel free to fork this repository and submit pull requests for new features or improvements.
This project is licensed under the MIT License. See the LICENSE
file for details.