Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(SD) Add benchmark option and add a printer. #773

Merged
merged 5 commits into from
Jul 12, 2024
Merged

(SD) Add benchmark option and add a printer. #773

merged 5 commits into from
Jul 12, 2024

Conversation

monorimet
Copy link
Contributor

@monorimet monorimet commented Jul 12, 2024

usage:
--benchmark=all
--benchmark=unet
--benchmark=clip,vae
--verbose

@@ -238,6 +243,41 @@ def __call__(self, function_name, inputs: list):
return output


class Printer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use this instead of just import logging and use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set it up as a logger, I used this since we had it setup nicely for tresleches full_runner.py

@@ -456,8 +500,8 @@ def is_prepared(self, vmfbs, weights):
mlir_keywords.remove(kw)
avail_files = os.listdir(pipeline_dir)
candidates = []
# print("MLIR KEYS: ", mlir_keywords)
# print("AVAILABLE FILES: ", avail_files)
# self.printer.print("MLIR KEYS: ", mlir_keywords)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented code

@monorimet monorimet requested a review from IanNod July 12, 2024 20:16
@monorimet
Copy link
Contributor Author

example verbose/benchmark run:

:~/SHARK-Turbine$ python models/turbine_models/custom_models/sd_inference/sd_pipeline.py --device=hip://1 --precision=fp16 --iree_target_triple=gfx942 --external_weights=safetensors --hf_model_name=stabilityai/stable-diffusion-xl-base-1.0 --width=1024 --height=1024 --use_i8_punet --batch_size=1 --benchmark=all --verbose
/home/eagarvey/iree/iree.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
[t=1.694 dt=1.694] All necessary files found.
[t=1.694 dt=0.000] Loading compiled_clip from ./vmfbs/stable_diffusion_xl_base_1_0_bs1_64_fp16_prompt_encoder_rocm_gfx942.vmfb with external weights: ./weights/stable_diffusion_xl_base_1_0_text_encoder_fp16.safetensors.
[t=15.973 dt=14.279] Loading compiled_punet from ./vmfbs/stable_diffusion_xl_base_1_0_bs1_64_1024x1024_i8_punet_gfx942.vmfb with external weights: ./weights/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa.
[t=24.617 dt=8.645] Loading compiled_vae from ./vmfbs/stable_diffusion_xl_base_1_0_bs1_1024x1024_fp16_vae_gfx942.vmfb with external weights: ./weights/stable_diffusion_xl_base_1_0_vae_fp16.safetensors.
[t=25.173 dt=0.555] Loading compiled_scheduler from ./vmfbs/stable_diffusion_xl_base_1_0_EulerDiscreteScheduler_bs1_1024x1024_fp16_30_gfx942.vmfb with external weights: None.
[t=25.311 dt=0.138] Latency for compiled_clip['encode_prompts']: 0.02198624610900879sec
[t=25.427 dt=0.116] Latency for compiled_punet['main']: 0.09272503852844238sec
[t=25.513 dt=0.086] Latency for compiled_punet['main']: 0.0851752758026123sec
[t=25.600 dt=0.087] Latency for compiled_punet['main']: 0.08541464805603027sec
[t=25.687 dt=0.087] Latency for compiled_punet['main']: 0.08573579788208008sec
[t=25.773 dt=0.086] Latency for compiled_punet['main']: 0.08547711372375488sec
[t=25.860 dt=0.087] Latency for compiled_punet['main']: 0.08586978912353516sec
[t=25.947 dt=0.087] Latency for compiled_punet['main']: 0.08583927154541016sec
[t=26.033 dt=0.087] Latency for compiled_punet['main']: 0.08553814888000488sec
[t=26.120 dt=0.087] Latency for compiled_punet['main']: 0.08585166931152344sec
[t=26.210 dt=0.090] Latency for compiled_punet['main']: 0.08874630928039551sec
[t=26.300 dt=0.090] Latency for compiled_punet['main']: 0.08906340599060059sec
[t=26.391 dt=0.091] Latency for compiled_punet['main']: 0.0895528793334961sec
[t=26.482 dt=0.091] Latency for compiled_punet['main']: 0.08941054344177246sec
[t=26.573 dt=0.091] Latency for compiled_punet['main']: 0.08950352668762207sec
[t=26.663 dt=0.091] Latency for compiled_punet['main']: 0.08926606178283691sec
[t=26.754 dt=0.090] Latency for compiled_punet['main']: 0.08909058570861816sec
[t=26.844 dt=0.091] Latency for compiled_punet['main']: 0.08918428421020508sec
[t=26.935 dt=0.091] Latency for compiled_punet['main']: 0.08947038650512695sec
[t=27.026 dt=0.091] Latency for compiled_punet['main']: 0.08939838409423828sec
[t=27.117 dt=0.091] Latency for compiled_punet['main']: 0.08928704261779785sec
[t=27.207 dt=0.091] Latency for compiled_punet['main']: 0.0892179012298584sec
[t=27.298 dt=0.091] Latency for compiled_punet['main']: 0.08947348594665527sec
[t=27.389 dt=0.091] Latency for compiled_punet['main']: 0.08913540840148926sec
[t=27.479 dt=0.091] Latency for compiled_punet['main']: 0.08927750587463379sec
[t=27.570 dt=0.090] Latency for compiled_punet['main']: 0.0892488956451416sec
[t=27.661 dt=0.091] Latency for compiled_punet['main']: 0.08957576751708984sec
[t=27.752 dt=0.091] Latency for compiled_punet['main']: 0.08965015411376953sec
[t=27.841 dt=0.090] Latency for compiled_punet['main']: 0.08837294578552246sec
[t=27.929 dt=0.088] Latency for compiled_punet['main']: 0.0865163803100586sec
[t=28.016 dt=0.087] Latency for compiled_punet['main']: 0.08648180961608887sec
[t=28.096 dt=0.079] Latency for compiled_vae['decode']: 0.07868456840515137sec
sdxl_output_2024-07-12_15-19-52_0.png saved
Image generation complete.

example quiet(default) run:

~/SHARK-Turbine$ python models/turbine_models/custom_models/sd_inference/sd_pipeline.py --device=hip://1 --precision=fp16 --iree_target_triple=gfx942 --external_weights=safetensors --hf_model_name=stabilityai/stable-diffusion-xl-base-1.0 --width=1024 --height=1024 --use_i8_punet --batch_size=1
/home/eagarvey/iree/iree.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
30it [00:02, 11.23it/s]
sdxl_output_2024-07-12_15-22-02_0.png saved
Image generation complete.

@monorimet monorimet merged commit a0e4792 into main Jul 12, 2024
1 of 3 checks passed
@monorimet monorimet deleted the print-bench branch July 12, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants