Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate stable-diffusion-3 pipeline #14

Merged
merged 9 commits into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ venv.zip
.idea/modules.xml
.idea/jarRepositories.xml
.idea/compiler.xml
.idea/discord.xml
.idea/libraries/
*.iws
*.iml
Expand Down Expand Up @@ -40,4 +41,8 @@ build/
.vscode/

### Mac OS ###
.DS_Store
.DS_Store

### Image Outputs ###
/result/generated/**
/result/upscaled/**
5 changes: 1 addition & 4 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

38 changes: 25 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# diffusion-tool
Diffusion Tool is an AI image generator and upscaler created for my third-year Artificial Intelligence university exam, using Java and Python.
<p align="center">
<img width="180" src="src/main/resources/tool/logo-512.png" alt="diffusion-tool"></img>
<h1 align="center">diffusion-tool</h1>
<p align="center">Image generator and upscaler created for my AI university exam
</p>

# Project Description
## Description
At its core, it's a JavaFX application that integrates the Python interpreter and uses it to implement Stable Diffusion pipelines for generative AI plus upscaling
and BSRGAN's degradation model for the upscaling of any image.
I initially thought about using the Spring framework to manage user registration, but I wanted everyone to be able to use the program offline, so I opted
Expand All @@ -10,7 +13,7 @@ It is structured as follow: from the user side, we have the Login and Sign Up pa
and Upscale pages.
The last two are the essential part of the project and they act as GUI for the Python scripts.

# Prerequisites
## Prerequisites
In order to compile and run the software, it is required that you have the following prerequisites:
- Open Java Development Kit (OpenJDK) 17 or above
- Apache Maven (at least version 3.6.3 is recommended)
Expand All @@ -23,7 +26,7 @@ with the packages listed in *requirements*.
pip install -r requirements.txt
```

# System requirements
## System requirements
I will only include consumer-level hardware.
AI-computing capable hardware that has a GPU with enough VRAM should be capable of running this software.
**ATTENTION**: currently, AMD GPUs are not supported as the application relies on CUDA, a technology exclusive to NVIDIA.
Expand All @@ -34,7 +37,7 @@ AI-computing capable hardware that has a GPU with enough VRAM should be capable
| `RAM` 16 GBs | 16 GBs |
| `GPU` NVIDIA GeForce GTX 1660 SUPER | NVIDIA GeForce RTX 3060 |

# Building
## Building
Executable packages can be downloaded from [Releases](https://github.com/ShyVortex/diffusion-tool/releases) or manually built instead.
You can do that assuming the above prerequisites have already been installed.
Once you're in the project directory, type the following in a terminal to download the dependencies and compile all the classes:
Expand All @@ -47,20 +50,28 @@ Then, if you also want a runnable .jar archive, type:
```
With these commands, a new folder named 'target' is created containing the compiled project as well as the executable file.

# Screenshots
## Unlock Stable Diffusion 3
The newest generative model is currently gated, so first you need to sign up [here](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers).
Proceed to generate a [token](https://huggingface.co/settings/tokens) under your account settings which you will use to login with:
```shell
huggingface-cli login
```
Enter your credentials first, then the token when it's needed.

## Screenshots
### Home
![immagine](https://github.com/ShyVortex/diffusion-tool/assets/111277410/53a8ba6f-a189-4376-a8af-0c9996a26d62)
![home-view](https://github.com/user-attachments/assets/50052e5a-c8a4-4eaa-b39f-ae537c81fb9f)
### Image Generation
![immagine](https://github.com/ShyVortex/diffusion-tool/assets/111277410/4a83e1f2-3613-4ae2-a498-cb3f2a8b1479)
![generate-view](https://github.com/user-attachments/assets/dc8239d9-faa7-4a88-bb09-7d808763220c)
### Image Upscaling
![immagine](https://github.com/ShyVortex/diffusion-tool/assets/111277410/e6f6aea6-e9a2-46f4-8b7b-066ae73aa8f4)
![upscale-view](https://github.com/user-attachments/assets/db703513-dc09-4344-96c8-1a6c0ce5d246)

# Upscaling Comparison
## Upscaling Comparison
### Low-res vs. Upscaled
![UpscalingComparison](https://github.com/ShyVortex/diffusion-tool/assets/111277410/0e380dda-36f4-4187-8ff2-9cf287dca06d)
![UpscalingComparison2](https://github.com/ShyVortex/diffusion-tool/assets/111277410/05f0d876-1b9b-4b50-8dba-c558abf815fe)

# Credits
## Credits
As stated before, this project uses BSRGAN's degradation model for upscaling purposes.
BSRGAN is a practical degradation model for Deep Blind Image Super-Resolution, developed by [Kai Zhang](https://cszn.github.io/), Jingyun Liang,
[Luc Van Gool](https://vision.ee.ethz.ch/people-details.OTAyMzM=.TGlzdC8zMjQ4LC0xOTcxNDY1MTc4.html), [Radu Timofte](http://people.ee.ethz.ch/~timofter/),
Expand All @@ -71,10 +82,11 @@ I've edited said script to adapt it and make it work on my project, keeping ackn
The project utilizes Stable Diffusion's generative AI pipelines for image generation and upscaling, in particular:
+ [stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1)
+ [stable-diffusion-2-1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base)
+ [stable-diffusion-3-medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers)
+ [sd-x2-latent-upscaler](https://huggingface.co/stabilityai/sd-x2-latent-upscaler)
+ [pixel-art-style](https://huggingface.co/kohbanye/pixel-art-style)
+ [pixel-art-xl](https://huggingface.co/nerijs/pixel-art-xl)

# License
## License
- This project is distributed under the [GNU General Public License v3.0](https://github.com/ShyVortex/diffusion-tool/blob/master/LICENSE.md).
- Copyright of [@ShyVortex](https://github.com/ShyVortex), 2024.
8 changes: 4 additions & 4 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<groupId>it.unimol</groupId>
<artifactId>diffusion-tool</artifactId>
<version>1.0.1</version>
<version>1.1.0</version>
<name>diffusion-tool</name>

<properties>
Expand All @@ -20,17 +20,17 @@
<dependency>
<groupId>org.openjfx</groupId>
<artifactId>javafx-controls</artifactId>
<version>17.0.6</version>
<version>17.0.12</version>
</dependency>
<dependency>
<groupId>org.openjfx</groupId>
<artifactId>javafx-fxml</artifactId>
<version>17.0.6</version>
<version>17.0.12</version>
</dependency>
<dependency>
<groupId>org.openjfx</groupId>
<artifactId>javafx-web</artifactId>
<version>17.0.6</version>
<version>17.0.12</version>
</dependency>
<dependency>
<groupId>org.controlsfx</groupId>
Expand Down
9 changes: 5 additions & 4 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
accelerate==0.26.1
certifi==2024.7.4
charset-normalizer==3.3.2
diffusers==0.25.0
diffusers==0.29.2
filelock==3.13.1
fsspec==2023.12.2
huggingface-hub==0.20.1
huggingface-hub==0.24.3
idna==3.7
importlib-metadata==7.0.1
Jinja2==3.1.4
Expand All @@ -21,10 +21,10 @@ nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
opencv-python==4.9.0.80
opencv-python==4.10.0.82
packaging==23.2
peft==0.9.0
pillow==10.3.0
Expand All @@ -43,3 +43,4 @@ triton==2.1.0
typing_extensions==4.9.0
urllib3==2.2.2
zipp==3.19.1
sentencepiece==0.2.0
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ public User getUser() {
}

private void setVersion() {
this.version = "1.0.1";
this.version = "1.1.0";
}

public void setRootNode(Parent rootNode) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -252,11 +252,12 @@ private void initGenerateView() {
profilePicProperty.set(diffApp.getUser().getProfilePic());
homeUserImage.imageProperty().bind(profilePicProperty);
styleComboBox.getItems().addAll(
"General",
"Stable Diffusion 2.1",
"Stable Diffusion 3",
"Pixel Art"
);
styleComboBox.setPromptText(styleComboBox.getItems().get(0));
styleComboBox.setValue("General");
styleComboBox.setValue("Stable Diffusion 2.1");
}

@FXML
Expand Down Expand Up @@ -588,10 +589,10 @@ private void OnProfileDeleteClick() throws Exception {

@FXML
private void OnStyleSelect() {
if (styleComboBox.getValue().equals("General"))
upscaleCheckBox.setVisible(true);
else
upscaleCheckBox.setVisible(false);
// Upscaling checkbox only visible if selected model is SD2.1
upscaleCheckBox.setVisible(
styleComboBox.getValue().equals("Stable Diffusion 2.1")
);
}

@FXML
Expand Down Expand Up @@ -1219,9 +1220,11 @@ public File findPyScript() {
String fileName;
switch (pythonCalledBy) {
case 1:
if (styleComboBox.getValue().equals("General"))
// if (includeUpscaling) -> generate_upscale.py, else -> generate.py
fileName = includeUpscaling ? "generate_upscale.py" : "generate.py";
if (styleComboBox.getValue().equals("Stable Diffusion 2.1"))
// if (includeUpscaling) -> generate_upscale.py, else -> generate_sd2-1.py
fileName = includeUpscaling ? "generate_upscale.py" : "generate_sd2-1.py";
else if (styleComboBox.getValue().equals("Stable Diffusion 3"))
fileName = "generate_sd3.py";
else
fileName = "generate_pixart.py";
break;
Expand Down
4 changes: 2 additions & 2 deletions src/main/python/it/unimol/diffusiontool/generate_pixart.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
def main():
# Check if the correct number of command-line arguments is provided
if len(sys.argv) != 4:
print("Usage: python generate.py <prompt> <tags> <date>")
print("Usage: python generate_pixelart.py <prompt> <tags> <date>")
sys.exit(1)

# Get the prompt and date from the command-line arguments passed from Java
Expand All @@ -27,7 +27,7 @@ def main():
# Process the prompt and set the output path
with torch.cuda.amp.autocast():
image = pipe(prompt=prompt, negative_prompt=tags, num_inference_steps=25).images[0]
output_folder = os.path.abspath("result/generated/general")
output_folder = os.path.abspath("result/generated/pixelart")
output_filename = f"generated_image_{date}.png"
output_filepath = os.path.join(output_folder, output_filename)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
def main():
# Check if the correct number of command-line arguments is provided
if len(sys.argv) != 4:
print("Usage: python generate.py <prompt> <tags> <date>")
print("Usage: python generate_sd2-1.py <prompt> <tags> <date>")
sys.exit(1)

# Get the prompt and date from the command-line arguments passed from Java
Expand All @@ -22,12 +22,14 @@ def main():
repo_id = "stabilityai/stable-diffusion-2-1"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

# offload components to CPU during inference to save memory
pipe.enable_model_cpu_offload()

# Process the prompt and set the output path
with torch.cuda.amp.autocast():
image = pipe(prompt=prompt, negative_prompt=tags, num_inference_steps=25).images[0]
output_folder = os.path.abspath("result/generated/general")
output_folder = os.path.abspath("result/generated/sd2-1")
output_filename = f"generated_image_{date}.png"
output_filepath = os.path.join(output_folder, output_filename)

Expand Down
62 changes: 62 additions & 0 deletions src/main/python/it/unimol/diffusiontool/generate_sd3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
import sys
from diffusers import StableDiffusion3Pipeline
from PIL import Image
from io import BytesIO
import torch
import os
import base64


def main():
# Check if the correct number of command-line arguments is provided
if len(sys.argv) != 4:
print("Usage: python generate_sd3.py <prompt> <tags> <date>")
sys.exit(1)

# Get the prompt and date from the command-line arguments passed from Java
prompt = sys.argv[1]
tags = sys.argv[2]
date = sys.argv[3]

# Model initialization and processing
repo_id = "stabilityai/stable-diffusion-3-medium-diffusers"
pipe = StableDiffusion3Pipeline.from_pretrained(
repo_id,

# removes memory-intensive text encoder to decrease memory requirements
text_encoder_3=None,
tokenizer_3=None,

torch_dtype=torch.float16,
)

# offload components to CPU during inference to save memory
pipe.enable_model_cpu_offload()

# Process the prompt and set the output path
with torch.cuda.amp.autocast():
image = pipe(
prompt=prompt,
negative_prompt=tags,
num_inference_steps=25,
guidance_scale=6.5
).images[0]
output_folder = os.path.abspath("result/generated/sd3")
output_filename = f"generated_image_{date}.png"
output_filepath = os.path.join(output_folder, output_filename)

# Check if the output folder exists, and create it if not, then save the image
if not os.path.exists(output_folder):
os.makedirs(output_folder)
image.save(output_filepath)

# Encode the image as a base64 string
with open(output_filepath, "rb") as image_file:
encoded_image = base64.b64encode(image_file.read()).decode('utf-8')

# Print image as string
print(encoded_image)


if __name__ == "__main__":
main()
6 changes: 4 additions & 2 deletions src/main/python/it/unimol/diffusiontool/generate_upscale.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ def main():
repo_id = "stabilityai/stable-diffusion-2-1-base" # you can use 2-1 if you have more VRAM
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

# offload components to CPU during inference to save memory
pipe.enable_model_cpu_offload()

# Load upscaling model
model_id = "stabilityai/sd-x2-latent-upscaler"
Expand All @@ -47,7 +49,7 @@ def main():
guidance_scale=0,
generator=torch.manual_seed(33),
).images[0]
output_folder = os.path.abspath("result/generated/general")
output_folder = os.path.abspath("result/generated/sd2_1")
output_filename = f"generated_image_{date}.png"
output_filepath = os.path.join(output_folder, output_filename)

Expand Down
10 changes: 5 additions & 5 deletions src/main/resources/app-generate-view.fxml
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
<?import javafx.scene.layout.VBox?>
<?import javafx.scene.text.Font?>

<HBox maxHeight="-Infinity" maxWidth="-Infinity" minHeight="-Infinity" minWidth="-Infinity" prefHeight="800.0" prefWidth="1300.0"
xmlns="http://javafx.com/javafx" xmlns:fx="http://javafx.com/fxml/1"
<HBox maxHeight="-Infinity" maxWidth="-Infinity" minHeight="-Infinity" minWidth="-Infinity" prefHeight="800.0"
prefWidth="1300.0" xmlns="http://javafx.com/javafx/" xmlns:fx="http://javafx.com/fxml/1"
fx:controller="it.unimol.diffusiontool.controller.DiffusionController">

<VBox prefHeight="800.0" prefWidth="300.0">
Expand Down Expand Up @@ -75,13 +75,13 @@
</font>
</Label>
<TextArea fx:id="promptArea" layoutX="50.0" layoutY="227.0" prefHeight="100.0" prefWidth="670.0" wrapText="true" />
<Label layoutX="50.0" layoutY="342.0" text="Style:">
<Label layoutX="50.0" layoutY="342.0" text="Model:">
<font>
<Font size="16.0" />
</font>
</Label>
<ComboBox fx:id="styleComboBox" layoutX="107.0" layoutY="340.0" onAction="#OnStyleSelect" prefWidth="150.0" />
<Label layoutX="50.0" layoutY="376.0" text="Enter tags to improve the result:">
<ComboBox fx:id="styleComboBox" layoutX="120.0" layoutY="340.0" onAction="#OnStyleSelect" prefWidth="175.0" />
<Label layoutX="50.0" layoutY="376.0" text="Enter what you don't want to see in the result:">
<font>
<Font size="16.0" />
</font>
Expand Down
Binary file added src/main/resources/tool/logo-512.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading