Gemini Image Demo: AI-powered Vision Q&A Chatbot

Welcome to the Gemini Image Demo! This project is a Q&A chatbot powered by Google's Gemini-1.5-flash Generative AI model. It interprets images, specifically invoices, and responds to user queries based on the uploaded image content. The app is built with Streamlit for an interactive, easy-to-use experience.

Overview

The Gemini Image Demo project showcases an innovative approach to document interpretation using AI. By integrating Google Generative AI (Gemini) with Streamlit, this application allows users to upload an invoice image, input a text-based question, and receive answers extracted directly from the invoice image. This is ideal for use cases like automated invoice processing, financial data extraction, and interactive document-based Q&A.

Features

Intelligent Image Analysis: Leverages Google's Gemini-1.5-flash model to analyze and understand any image.
Text-based Q&A: Users can ask questions about the uploaded image, and the model provides relevant answers.
Simple User Interface: Built with Streamlit, ensuring an intuitive and interactive experience.
Error Handling: Provides user-friendly error messages if the image is missing or if any issues arise during processing.

Project Motivation

This project demonstrates my proficiency in:

Working with advanced AI models (Google Generative AI)
Deploying interactive web applications using Streamlit
Handling real-world data in image form and processing it for meaningful insights

Requirements

Ensure you have the following prerequisites installed:

Python 3.8 or higher
Google Generative AI SDK
Streamlit
PIL (Python Imaging Library)

Install the required libraries with:

pip install -r requirements.txt

Setup and Running the Application

Clone this repository:

git clone https://github.com/jddotcom/GEN-AI-Google-Gemini-Text-Extractor.git
cd GEN-AI-Google-Gemini-Text-Extractor

Set up your environment variables. Create a .env file and add your Google API key:
```
GOOGLE_API_KEY=your_google_api_key_here
```
Run the application:
```
streamlit run app.py
```
Open the local URL provided by Streamlit in your browser to use the app.

Usage

Enter a Prompt: This prompt can be a specific question you want the AI to answer based on the uploaded image (e.g., "What is the total amount on this invoice?").
Upload an Invoice Image: Click "Choose an image..." and upload an invoice image in JPG, JPEG, or PNG format.
Click "Tell me about the image": The AI model will analyze the image and provide an answer based on the prompt.

Code Explanation

Core Functions

get_gemini_response(input_text, image_data, prompt): Interacts with the Google Generative AI model, sending it the user’s input, image data, and prompt. Receives and returns the AI’s response.
input_image_setup(uploaded_file): Prepares the uploaded image file by converting it into bytes and formatting it for the model.

Streamlit Components

User Input: Collects the user's prompt and image file.
Response Display: Shows the AI-generated response or error messages if applicable.

Demo

Include a gif or screenshot of the app in action for a better impression.

Future Enhancements

Support for a wider range of documents beyond invoices.
Enhanced error handling and support for non-English invoices.
Deployment as a standalone web service for scalability.

Why This Project is Impressive

This project highlights advanced skills in AI and ML, showcasing the ability to:

Implement cutting-edge generative AI models.
Process and extract insights from image data.
Build interactive applications to enhance user engagement.

Contact

Feel free to connect with me on LinkedIn / or check out my other projects on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
venv		venv
.env		.env
README.md		README.md
app.py		app.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemini Image Demo: AI-powered Vision Q&A Chatbot

Overview

Features

Project Motivation

Requirements

Setup and Running the Application

Usage

Code Explanation

Core Functions

Streamlit Components

Demo

Future Enhancements

Why This Project is Impressive

Contact

About

Releases

Packages

Languages

jddotcom/GEN-AI-Google-Gemini-Vision

Folders and files

Latest commit

History

Repository files navigation

Gemini Image Demo: AI-powered Vision Q&A Chatbot

Overview

Features

Project Motivation

Requirements

Setup and Running the Application

Usage

Code Explanation

Core Functions

Streamlit Components

Demo

Future Enhancements

Why This Project is Impressive

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages