Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multimodal openai - early version #313

Merged
merged 9 commits into from
Jan 13, 2025
Merged

Conversation

fm1320
Copy link
Collaborator

@fm1320 fm1320 commented Jan 6, 2025

The PR adds multimodal (text + image) support to the existing OpenAI client while maintaining backward compatibility with text-only operations. I also adds image generation with Dall E 2 and 3. It also adds tests and updates docstring

  1. Unified Input Handling
def convert_inputs_to_api_kwargs(self, input, model_kwargs, model_type):
    # Handles both text-only and image+text inputs in one place
    # Supports both simple text and structured messages
  1. Image Processing
def _prepare_image_content(self, image_source, detail="auto"):
    # Supports multiple image input types:
    # - Local files (converts to base64)
    # - URLs (direct use)
    # - Pre-formatted content

Add DALL-E Image Generation Support

  1. Added DALL-E 2 & 3 support to OpenAI client for image generation, variation, and editing. Users can now:
  • Generate images from text prompts
  • Create variations of existing images
  • Edit images using masks
  • Get results as URLs or base64

Key Changes

  • Added IMAGE_GENERATION model type
  • Enhanced client with DALL-E API integration
  • Added response parsing for image operations
  • Maintained existing error handling pattern

Example use:

Text only:

client = OpenAIClient()
response = client.call(
    api_kwargs={"input": "Hello", "model": "gpt-3.5-turbo"}
)

Multimodal:

client = OpenAIClient()
response = client.call(
    api_kwargs={
        "input": "Describe this",
        "model": "gpt-4o",
        "images": "path/to/image.jpg"
    }
)

Image generation:

class ImageGenerator(Generator):
    """Generator subclass for image generation."""
    model_type = ModelType.IMAGE_GENERATION
    
       dalle_gen = ImageGenerator(
        model_client=client,
        model_kwargs={
            "model": "dall-e-3",
            "size": "1024x1024",
            "quality": "standard",
            "n": 1
        }
    )
    
    # For image generation, input_str becomes the prompt
    response = dalle_gen({"input_str": "A happy siamese cat playing with a red ball of yarn"})
    print("\n=== DALL-E Generation ===")
    print(f"Generated Image URL: {response.data}")

TODO:

  • Everything shout be an Output Generator type - Generator cant raise error but put the error in error field
  • Image generation
  • How to raise and catch the error
  • parsed chat completion has to be a generator output - inside chat completion parser

Fixes #<issue_number>

Before submitting
  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@fm1320 fm1320 marked this pull request as ready for review January 8, 2025 09:45
Copy link
Member

@liyin2015 liyin2015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) image output potentially (2) test generator (3) test using real api by yourself

@fm1320 fm1320 requested a review from liyin2015 January 10, 2025 01:47
# For image generation, input is the prompt
final_model_kwargs["prompt"] = input
# Set defaults for DALL-E 3 if not specified
if "model" not in final_model_kwargs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"model" must be set. We dont have to do default, because in the future, a model can be deprecated.

if "size" not in final_model_kwargs:
final_model_kwargs["size"] = "1024x1024"
if "quality" not in final_model_kwargs:
final_model_kwargs["quality"] = "standard"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using so many if not, use final_model_kwargs["quality"] = final_model_kwargs.get("quality", "standard")

final_model_kwargs["response_format"] = "url"

# Handle image edits and variations
if "image" in final_model_kwargs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these code is kind of ugly, might want to avoid so many embeded if

@liyin2015
Copy link
Member

overall it looks great @fm1320

@liyin2015
Copy link
Member

@fm1320 i need to see the testing of the generator using openai client, did you test it and add examples in the generator rst and ipynb

@@ -2043,6 +2043,272 @@
"build_custom_model_client()"
]
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the examples you add fit more into the generator.ipynb, please move there. The modelclient layer is not really user facing layer but more for contributors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing with new changes now

@fm1320 fm1320 requested a review from liyin2015 January 13, 2025 21:13
@@ -106,6 +106,161 @@ In particular, we created :class:`GeneratorOutput<core.types.GeneratorOutput>` t
Whether to do further processing or terminate the pipeline whenever an error occurs is up to the user from here on.


Basic Generator Tutorial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this in the genearator colab. They all go together. i can approve now, but you would need that for easy sharing

@liyin2015 liyin2015 merged commit 98177d9 into main Jan 13, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants