An innovative AI tool that generates audio short stories based on the context of uploaded images. It leverages cutting-edge GenAI models from Hugging Face, OpenAI, and LangChain, and is deployed on both Streamlit Cloud and Hugging Face Space.
-
Image to Text
- Uses Hugging Face's image-to-text transformer model (Salesforce/blip-image-captioning-base) to analyze the image and generate descriptive text.
-
Text to Story
- Utilizes OpenAI's GPT-3.5-Turbo model to create a short, imaginative story (default: 50 words) from the descriptive text.
-
Story to Speech
- Converts the story into a narrated audio file using Hugging Face's text-to-speech model (espnet/kan-bayashi_ljspeech_vits).
-
User-Friendly Interface
- Built with Streamlit for easy image uploading and playback of generated audio.
Audio file available in the img-audio
folder.
Audio file available in the img-audio
folder.
Audio file available in the img-audio
folder.
The following libraries and tools are required:
os
python-dotenv
transformers
torch
langchain
openai
requests
streamlit
- Obtain personal API tokens for Hugging Face and OpenAI.
- Save the tokens in a
.env
file with the following format:OPENAI_API_KEY=<your-api-key-here> HUGGINGFACE_API_TOKEN=<your-access-token-here>
Set up a virtual environment (venv) and install dependencies:
pip install -r requirements.txt
streamlit run app.py
The app will: Generate descriptive text for the uploaded image.
Provide a playable audio file of the narrated story.
Clone the Repository
git clone https://github.com/alimdsaif3/Image-to-Story-Converter.git
pip install -r requirements.txt
streamlit run app.py
©️ License This project is distributed under the MIT License. For details, see the LICENSE file in the repository.
🤝 Contributions If you like this project, please ⭐ the repository! Contributions are welcome. Submit a pull request if you have suggestions or enhancements.