Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to modify DALL·E 3 generated images #723

Open
1 task done
jeffpaul opened this issue Feb 21, 2024 · 2 comments
Open
1 task done

Add ability to modify DALL·E 3 generated images #723

jeffpaul opened this issue Feb 21, 2024 · 2 comments
Milestone

Comments

@jeffpaul
Copy link
Member

Is your enhancement related to a problem? Please describe.

With the update in #717, we can now start to take advantage of some of the enhancements in the DALL·E 3 model. Let's look to extend our Generate images modal (e.g. /wp-admin/upload.php?action=classifai-generate-image) to add in a Modify image action link that further accepts textarea input which is then sent back to DALL·E 3.

When we receive the originally generated image(s) back from DALL·E 3 we'll want to ensure we capture the seed ID for each as that'll be needed for the modification step. We'll capture the desired modifications from the user via a textarea, then send that back to DALL·E 3 (e.g. "Modify image [#] with seed [seed-ID]: "), and then render the newly modified image(s) in place of the first image(s).

Designs

modify-image

In this rough mockup, I'm recommending we first tweak our primary button and secondary action link such that the first two options on generated images are [Insert to Post] and then _Import to Media Library_. A third option here for _Modify image, when clicked, would present a textarea (with "add / remove / change image by..." placeholder text to help guide the user on what to enter) and a [Generate modifications] button. Clicking that button would send the image seed ID and described modifications back to DALL·E 3 and then render in place of the prior image(s) with the newly modified image(s).

Describe alternatives you've considered

There's some consideration for what we do with the original image prompt and modification text (especially if there are multiple modifications made). We currently show the original prompt in the initial textarea in the Generate images modal, so perhaps we continue to amend that text with the modification text? In terms of the alt text that's added to any imported/inserted generated image we could attempt to include that same prompt+modification(s) text (or albeit more lazily / easily stick with the original prompt text).

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jeffpaul jeffpaul added this to the 3.1.0 milestone Feb 21, 2024
@jeffpaul jeffpaul moved this from Incoming to To Do in Open Source Practice Feb 21, 2024
@Sidsector9 Sidsector9 self-assigned this Mar 19, 2024
@Sidsector9 Sidsector9 moved this from To Do to In Progress in Open Source Practice Mar 19, 2024
@Sidsector9 Sidsector9 removed their assignment Apr 1, 2024
@Sidsector9 Sidsector9 moved this from In Progress to To Do in Open Source Practice Apr 1, 2024
@dkotter dkotter modified the milestones: 3.1.0, 3.2.0 Jul 12, 2024
@dkotter dkotter modified the milestones: 3.2.0, 3.3.0 Dec 11, 2024
@faisal-alvi
Copy link
Member

faisal-alvi commented Jan 29, 2025

@jeffpaul @dkotter 👋🏻

After reviewing the DALL·E API documentation, I’d like to share some insights and limitations regarding the requested enhancement:

  1. No Seed ID for Generated Images

    • Currently, the DALL·E API does not provide any seed ID with generated images. Even if seed IDs were available, I couldn’t find any endpoint in the documentation that allows modifications to a generated image using seed IDs.
  2. Existing Image Modification Options

    • However, the API provides a way to edit an existing image using inpainting, but it has specific requirements:
      • The image to edit must be a valid PNG file, less than 4MB, and square.
      • The mask must define the transparent areas where changes should be applied (e.g., fully transparent areas in the mask indicate the editable parts). This mask must also be a valid PNG, less than 4MB, and have the same dimensions as the image.
      • Along with the image and mask, the user can provide a new prompt specifying changes for the transparent areas.

    This approach works only for partial edits (inpainting) and is relatively limited in terms of freeform modifications.

  3. Image Variations

    • Another available option is generating variations of an image. This requires the original image to be passed as input and results in new variations without user-provided prompts. This is not suitable for implementing user-driven modifications since it doesn’t allow specifying changes.
  4. Iteration Through Prompts

    • The last option is iterating through prompts. However, this isn’t a new feature, it’s essentially what users can already do by editing the prompt and clicking the existing generate button.

Given these limitations, it seems that what the enhancement requests (e.g., modifying images directly using seed IDs or without requiring manual upload/masks) is not currently supported by the DALL·E API.

Would love to hear your thoughts or suggestions on how we could approach this, considering the current API constraints.

@dkotter
Copy link
Collaborator

dkotter commented Jan 29, 2025

In addition to the points @faisal-alvi makes, looking at the API documentation again, I think the biggest issue here is that DALL·E 3 only supports image generation, not edits or variations (these are only supported by DALL·E 2 which we no longer support as of #717). So until this is changed by OpenAI, I don't think there is any action we can take here.

That said, I do think there are additional improvements we can make to the image generation process. I've documented those in a separate issue here: #440 (comment) and assigned that to @faisal-alvi

@dkotter dkotter modified the milestones: 3.3.0, 3.4.0 Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To Do
Development

No branches or pull requests

4 participants