Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support img2img and inpaint in lpw-xl #6114

Merged
merged 5 commits into from
Dec 18, 2023

Conversation

a-r-r-o-w
Copy link
Member

@a-r-r-o-w a-r-r-o-w commented Dec 9, 2023

What does this PR do?

Adds support for img2img and inpainting in long prompt weighting XL community pipeline.

Before submitting

Who can review?

@patrickvonplaten @sayakpaul @xhinker

@a-r-r-o-w
Copy link
Member Author

a-r-r-o-w commented Dec 9, 2023

Long prompt weighting has been amazing to use but it does not support img2img and inpaint in the XL version of the pipeline yet whereas it does in the normal version (here). I believe LPW is also used a lot, likely, and so this could be a good addition. From testing, it doesn't seem to disrupt any of the existing functionality so far. Will add some examples here soon.

@a-r-r-o-w
Copy link
Member Author

The pipeline adds two new methods to the LPW SDXL class: .img2img() and .inpaint() similar to the LPW SD1.5 pipeline linked above.

Code
import torch
from PIL import Image

from diffusers.pipelines import DiffusionPipeline

model_id = "a-r-r-o-w/dreamshaper-xl-turbo"
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, variant="fp16", custom_pipeline="lpw_stable_diffusion_xl").to("cuda")

params = [
    ("an (extremely detailed cinematic digital painting masterpiece) (3/4 body shot) of one giant snowman in the distance absorbing the cool winter air wishing for a Christmas miracle, <lora:ChristmasWintery:0.8> ChristmasWinteryl surrounded by magical Christmas trees in a Christmas forest, bright glowing eyes, (Style-GravityMagic:1.1), (twisting swirls of snow in the background), (glowing aura), (energy), (snowing), (snowflakes:1.2), (windy), (moody color tones), (otherworldly:1.2), (elaborate textures:1.2), (soft lighting:1.2), medium depth of field, (8k UHD, Unreal Engine, highest quality)", 1024, 768),
    ("Photo of a beautiful greek goddess, RAW, ((red hair with extra long wavy), ((portrait)), (Detailed face: 1. 2)), (Detailed facial features)), fair-skinned, slim body, greek temple environment, detailed expressions, reflections, shot with a 50mm lens, f/2. 8, HDR, (8k), (movie lighting) (dramatic lighting) , (sharp focus), complex elements", 1024, 768),
    ("(full body, (dynamic pose)), muscular male android humanoid, flowing dark thick cloth cloak, dark metallic armour Burning Orange , neon parts, advanced civilization, towering structures, skyscrapers, immersed neon lights, fully detailed environment, cyberpunk, (rim lighting, studio lighting, distant moonlight, (night:1.1), bloom), (cinematic, best quality, masterpiece, ultra HD textures, highly detailed, hyper-realistic, intricate detail, 8k, photorealistic, concept art, matte painting, autodesk maya, vray render, ray tracing, hdr), (dslr, full frame, 16mm focal length, f/8 aperture, dynamic perspective, dynamic angle, wide field of view, deep depth of field) techwear, urbansamurai 3d, realistic, cyborg in a cyberhelmet head", 1024, 1024),
]
negative_prompt = '(nsfw:1.5),((3d, cartoon, anime, sketches)), (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), bad anatomy, out of view, cut off, ugly, deformed, mutated, ((young)), EasyNegative, paintings, sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans,extra fingers,fewer fingers,, "(ugly eyes, deformed iris, deformed pupils, fused lips and teeth:1.2), (un-detailed skin, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.2), text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"'

t2i_images = []
for param in params:
    prompt, height, width = param
    images = pipe.text2img(
        prompt=prompt,
        negative_prompt=negative_prompt,
        height=height,
        width=width,
        guidance_scale=3,
        num_inference_steps=8,
        num_images_per_prompt=1,
    ).images
    t2i_images.append(images[0])

from PIL import Image
from typing import List

def image_grid(images: List[Image.Image], rows: int, cols: int) -> Image.Image:
    if len(images) > rows * cols:
        raise ValueError(
            f"Number of images ({len(images)}) exceeds grid size ({rows}x{cols})."
        )
    w, h = images[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))
    for i, image in enumerate(images):
        grid.paste(image, box=(i % cols * w, i // cols * h))
    return grid

i2i_images = []
changes = ["(origami:1.5), artistic, paper art", "(nebulae, blackholes, quasars, stars, galatic, multiversal:1.6)", "(oil painting:1.5), artisitc, ((painting made by a 10 year old child))"]
for index in range(3):
    for param, img in zip(params, t2i_images):
        prompt, height, width = param
        prompt = changes[index] + ', ' + ' '.join(prompt.split()[:15]) # idk???
        images = pipe.img2img(
            prompt=prompt,
            negative_prompt=negative_prompt,
            image=img,
            height=height,
            width=width,
            guidance_scale=4,
            num_inference_steps=12,
            num_images_per_prompt=1,
            strength=0.8,
        ).images
        i2i_images.append(images[0])

grid_i2i = image_grid(i2i_images, 3, 3)

from diffusers.utils import load_image
image = load_image("image.png")
mask = load_image("mask.png")

inpaint_images = []
for i in range(6):
    images = pipe.inpaint(
        prompt="Digital illustration in of a samurai warrior in a duel against a (mystical:1.5) creature, (probably alien)",
        negative_prompt=negative_prompt,
        image=image,
        mask_image=mask,
        height=height,
        width=width,
        guidance_scale=2,
        num_inference_steps=20,
        num_images_per_prompt=1,
        strength=(i + 1) * 0.15,
    ).images
    inpaint_images.append(images[0])

grid_inpaint = image_grid(inpaint_images, 2, 3)
Results

All results generated based on the above code.

Text to Image
gif-1 gif-2 gif-3
Image to Image
gif-4
Inpaint
ImageMask
gif-5 gif-6
gif-4

@a-r-r-o-w a-r-r-o-w marked this pull request as ready for review December 16, 2023 15:59
@a-r-r-o-w
Copy link
Member Author

@patrickvonplaten @sayakpaul Gentle ping for review. Do let me know if any changes are required, thanks. I'll update the community README to showcase use of img2img and inpaint methods soon.

Copy link

@xhinkerzhu xhinkerzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool, I am good with the new changes

@xhinker
Copy link
Contributor

xhinker commented Dec 16, 2023

Test out your code, so far so good, the results are stunning good. Thank you @a-r-r-o-w

@xhinker
Copy link
Contributor

xhinker commented Dec 16, 2023

Long prompt weighting has been amazing to use but it does not support img2img and inpaint in the XL version of the pipeline yet whereas it does in the normal version (here). I believe LPW is also used a lot, likely, and so this could be a good addition. From testing, it doesn't seem to disrupt any of the existing functionality so far. Will add some examples here soon.

Like your samples, would help test out if more samples code are shared.

@a-r-r-o-w
Copy link
Member Author

a-r-r-o-w commented Dec 16, 2023

Like your samples, would help test out if more samples code are shared.

Hey @xhinker, thank you for the review and kind words! I've attached the code I used for generation in the same comment as the results (here). Is this what you mean?

@xhinker
Copy link
Contributor

xhinker commented Dec 16, 2023

Like your samples, would help test out if more samples code are shared.

Hey @xhinker, thank you for the review and kind words! I've attached the code I used for generation in the same comment as the results (here). Is this what you mean?

Those are great samples, I have ran it and all good, thought you want to share more sample code 😊

@a-r-r-o-w
Copy link
Member Author

a-r-r-o-w commented Dec 16, 2023

I have this ready-to-use Colab notebook. Sorry, but I'm having a little trouble understanding what you mean by sharing more sample code 😅 As far as testing the pipeline is concerned, I've run some exhaustive experiments playing around with different generation parameters.

Some more results based on PartiPrompts augmented with prompt weighting
gif-1 gif-2
gif-1 gif-2
gif-1 gif-2
gif-1 gif-2
gif-1 gif-2
gif-1

@xhinker
Copy link
Contributor

xhinker commented Dec 16, 2023

Will add some examples here soon.

You already shared the samples after this, never mind, thought you will share some more examples besides these.
With text2img, img2img, and inpaint2img all in one SDXL pipeline with weighted long prompt, I see huge potentials for other works.

Wondering if IPAdapter can be also added.

@a-r-r-o-w
Copy link
Member Author

Updated this comment with more examples 😉 I actually have thousands of generations from your original LPW pipeline alone when trying to run benchmarks comparing against normal SDXL pipeline to understand how exactly prompt weighting affects quality of generations. I will soon upload as a huggingface dataset maybe 👀

Sure, we can work together on adding IP Adapter in another PR maybe?

@xhinker
Copy link
Contributor

xhinker commented Dec 16, 2023

Updated this comment with more examples 😉 I actually have thousands of generations from your original LPW pipeline alone when trying to run benchmarks comparing against normal SDXL pipeline to understand how exactly prompt weighting affects quality of generations. I will soon upload as a huggingface dataset maybe 👀

Sure, we can work together on adding IP Adapter in another PR maybe?

I am occupied by other stuffs now, happy to work with you on the IP Adapter integration in another PR.

@sayakpaul
Copy link
Member

@a-r-r-o-w sorry for the delay here. From the comments, I see that the results are quite nice. It would be good to have some mentions about this from the corresponding section of the README with some results.

I think then we should good to merge.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@a-r-r-o-w
Copy link
Member Author

@a-r-r-o-w sorry for the delay here. From the comments, I see that the results are quite nice. It would be good to have some mentions about this from the corresponding section of the README with some results.

I think then we should good to merge.

Thanks! Not sure how I should link the generated images in the README. From other PRs, I've noticed that we have to open a PR to the HF documentation image hosting repo. Should I do that as well with some of these generations? For now, I've added a link to this PR in the README for someone wanting to see more results.

@patrickvonplaten patrickvonplaten merged commit 67b3d32 into huggingface:main Dec 18, 2023
14 checks passed
@patrickvonplaten
Copy link
Contributor

Cool, good job!

@a-r-r-o-w a-r-r-o-w deleted the lpwxl-img2img-inpaint branch December 18, 2023 19:04
donhardman pushed a commit to donhardman/diffusers that referenced this pull request Dec 29, 2023
* add img2img and inpaint support to lpw-xl

* update community README

---------

Co-authored-by: Sayak Paul <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* add img2img and inpaint support to lpw-xl

* update community README

---------

Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants