Add support for multimodal openai - early version #313

fm1320 · 2025-01-06T20:30:35Z

The PR adds multimodal (text + image) support to the existing OpenAI client while maintaining backward compatibility with text-only operations. I also adds image generation with Dall E 2 and 3. It also adds tests and updates docstring

Unified Input Handling

def convert_inputs_to_api_kwargs(self, input, model_kwargs, model_type):
    # Handles both text-only and image+text inputs in one place
    # Supports both simple text and structured messages

Image Processing

def _prepare_image_content(self, image_source, detail="auto"):
    # Supports multiple image input types:
    # - Local files (converts to base64)
    # - URLs (direct use)
    # - Pre-formatted content

Add DALL-E Image Generation Support

Added DALL-E 2 & 3 support to OpenAI client for image generation, variation, and editing. Users can now:

Generate images from text prompts
Create variations of existing images
Edit images using masks
Get results as URLs or base64

Key Changes

Added IMAGE_GENERATION model type
Enhanced client with DALL-E API integration
Added response parsing for image operations
Maintained existing error handling pattern

Example use:

Text only:

client = OpenAIClient()
response = client.call(
    api_kwargs={"input": "Hello", "model": "gpt-3.5-turbo"}
)

Multimodal:

client = OpenAIClient()
response = client.call(
    api_kwargs={
        "input": "Describe this",
        "model": "gpt-4o",
        "images": "path/to/image.jpg"
    }
)

Image generation:

class ImageGenerator(Generator):
    """Generator subclass for image generation."""
    model_type = ModelType.IMAGE_GENERATION
    
       dalle_gen = ImageGenerator(
        model_client=client,
        model_kwargs={
            "model": "dall-e-3",
            "size": "1024x1024",
            "quality": "standard",
            "n": 1
        }
    )
    
    # For image generation, input_str becomes the prompt
    response = dalle_gen({"input_str": "A happy siamese cat playing with a red ball of yarn"})
    print("\n=== DALL-E Generation ===")
    print(f"Generated Image URL: {response.data}")

TODO:

Everything shout be an Output Generator type - Generator cant raise error but put the error in error field
Image generation
How to raise and catch the error
parsed chat completion has to be a generator output - inside chat completion parser

Fixes #<issue_number>

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?

review-notebook-app · 2025-01-06T20:30:40Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

adalflow/adalflow/components/model_client/openai_client.py

adalflow/adalflow/utils/lazy_import.py

liyin2015

(1) image output potentially (2) test generator (3) test using real api by yourself

liyin2015 · 2025-01-13T19:15:54Z

adalflow/adalflow/components/model_client/openai_client.py

+            # For image generation, input is the prompt
+            final_model_kwargs["prompt"] = input
+            # Set defaults for DALL-E 3 if not specified
+            if "model" not in final_model_kwargs:


"model" must be set. We dont have to do default, because in the future, a model can be deprecated.

liyin2015 · 2025-01-13T19:17:02Z

adalflow/adalflow/components/model_client/openai_client.py

+            if "size" not in final_model_kwargs:
+                final_model_kwargs["size"] = "1024x1024"
+            if "quality" not in final_model_kwargs:
+                final_model_kwargs["quality"] = "standard"


instead of using so many if not, use final_model_kwargs["quality"] = final_model_kwargs.get("quality", "standard")

liyin2015 · 2025-01-13T19:17:54Z

adalflow/adalflow/components/model_client/openai_client.py

+                final_model_kwargs["response_format"] = "url"
+
+            # Handle image edits and variations
+            if "image" in final_model_kwargs:


these code is kind of ugly, might want to avoid so many embeded if

liyin2015 · 2025-01-13T19:22:15Z

overall it looks great @fm1320

liyin2015 · 2025-01-13T19:23:26Z

@fm1320 i need to see the testing of the generator using openai client, did you test it and add examples in the generator rst and ipynb

liyin2015 · 2025-01-13T19:26:24Z

notebooks/tutorials/adalflow_modelclient.ipynb

@@ -2043,6 +2043,272 @@
    "build_custom_model_client()"
   ]
  },
+  {


the examples you add fit more into the generator.ipynb, please move there. The modelclient layer is not really user facing layer but more for contributors.

Testing with new changes now

liyin2015 · 2025-01-13T21:29:24Z

docs/source/tutorials/generator.rst

@@ -106,6 +106,161 @@ In particular, we created :class:`GeneratorOutput<core.types.GeneratorOutput>` t
 Whether to do further processing or terminate the pipeline whenever an error occurs is up to the user from here on.


+Basic Generator Tutorial


add this in the genearator colab. They all go together. i can approve now, but you would need that for easy sharing

add multi modal support for openai draft

73089ff

fm1320 added 2 commits January 6, 2025 22:46

Change multimodal to one client

c6c4663

remove separate file refs

b0a473b

liyin2015 reviewed Jan 7, 2025

View reviewed changes

adalflow/adalflow/components/model_client/openai_client.py Outdated Show resolved Hide resolved

liyin2015 reviewed Jan 7, 2025

View reviewed changes

adalflow/adalflow/components/model_client/openai_client.py Outdated Show resolved Hide resolved

liyin2015 reviewed Jan 7, 2025

View reviewed changes

adalflow/adalflow/components/model_client/openai_client.py Outdated Show resolved Hide resolved

liyin2015 reviewed Jan 7, 2025

View reviewed changes

adalflow/adalflow/utils/lazy_import.py Outdated Show resolved Hide resolved

Single function openaiclient and test

00ea1d5

fm1320 marked this pull request as ready for review January 8, 2025 09:45

fm1320 added 2 commits January 8, 2025 16:31

add more tests with mock

578a165

add more tests with mock

852c212

liyin2015 reviewed Jan 8, 2025

View reviewed changes

fm1320 added 2 commits January 9, 2025 11:28

add image gen

ff1060a

Update .rst file and colab

5144fc4

fm1320 requested a review from liyin2015 January 10, 2025 01:47

liyin2015 reviewed Jan 13, 2025

View reviewed changes

Simplify nested ifs and add few more test examples

d8aa41c

fm1320 requested a review from liyin2015 January 13, 2025 21:13

liyin2015 approved these changes Jan 13, 2025

View reviewed changes

liyin2015 merged commit 98177d9 into main Jan 13, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for multimodal openai - early version #313

Add support for multimodal openai - early version #313

fm1320 commented Jan 6, 2025 •

edited

Loading

review-notebook-app bot commented Jan 6, 2025

liyin2015 left a comment

liyin2015 Jan 13, 2025

liyin2015 Jan 13, 2025

liyin2015 Jan 13, 2025

liyin2015 commented Jan 13, 2025

liyin2015 commented Jan 13, 2025

liyin2015 Jan 13, 2025

fm1320 Jan 13, 2025

liyin2015 Jan 13, 2025

		@@ -106,6 +106,161 @@ In particular, we created :class:`GeneratorOutput<core.types.GeneratorOutput>` t
		Whether to do further processing or terminate the pipeline whenever an error occurs is up to the user from here on.


		Basic Generator Tutorial

Add support for multimodal openai - early version #313

Add support for multimodal openai - early version #313

Conversation

fm1320 commented Jan 6, 2025 • edited Loading

The PR adds multimodal (text + image) support to the existing OpenAI client while maintaining backward compatibility with text-only operations. I also adds image generation with Dall E 2 and 3. It also adds tests and updates docstring

Add DALL-E Image Generation Support

Key Changes

review-notebook-app bot commented Jan 6, 2025

liyin2015 left a comment

Choose a reason for hiding this comment

liyin2015 Jan 13, 2025

Choose a reason for hiding this comment

liyin2015 Jan 13, 2025

Choose a reason for hiding this comment

liyin2015 Jan 13, 2025

Choose a reason for hiding this comment

liyin2015 commented Jan 13, 2025

liyin2015 commented Jan 13, 2025

liyin2015 Jan 13, 2025

Choose a reason for hiding this comment

fm1320 Jan 13, 2025

Choose a reason for hiding this comment

liyin2015 Jan 13, 2025

Choose a reason for hiding this comment

fm1320 commented Jan 6, 2025 •

edited

Loading