(SD) Add benchmark option and add a printer. #773

monorimet · 2024-07-12T19:50:17Z

usage:
--benchmark=all
--benchmark=unet
--benchmark=clip,vae
--verbose

…neBase and components.

IanNod · 2024-07-12T19:56:44Z

models/turbine_models/custom_models/pipeline_base.py

@@ -238,6 +243,41 @@ def __call__(self, function_name, inputs: list):
        return output


+class Printer:


Is there a reason to use this instead of just import logging and use that?

We can set it up as a logger, I used this since we had it setup nicely for tresleches full_runner.py

IanNod · 2024-07-12T19:57:01Z

models/turbine_models/custom_models/pipeline_base.py

@@ -456,8 +500,8 @@ def is_prepared(self, vmfbs, weights):
                    mlir_keywords.remove(kw)
            avail_files = os.listdir(pipeline_dir)
            candidates = []
-            # print("MLIR KEYS: ", mlir_keywords)
-            # print("AVAILABLE FILES: ", avail_files)
+            # self.printer.print("MLIR KEYS: ", mlir_keywords)


commented code

monorimet · 2024-07-12T20:22:18Z

example verbose/benchmark run:

:~/SHARK-Turbine$ python models/turbine_models/custom_models/sd_inference/sd_pipeline.py --device=hip://1 --precision=fp16 --iree_target_triple=gfx942 --external_weights=safetensors --hf_model_name=stabilityai/stable-diffusion-xl-base-1.0 --width=1024 --height=1024 --use_i8_punet --batch_size=1 --benchmark=all --verbose
/home/eagarvey/iree/iree.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
[t=1.694 dt=1.694] All necessary files found.
[t=1.694 dt=0.000] Loading compiled_clip from ./vmfbs/stable_diffusion_xl_base_1_0_bs1_64_fp16_prompt_encoder_rocm_gfx942.vmfb with external weights: ./weights/stable_diffusion_xl_base_1_0_text_encoder_fp16.safetensors.
[t=15.973 dt=14.279] Loading compiled_punet from ./vmfbs/stable_diffusion_xl_base_1_0_bs1_64_1024x1024_i8_punet_gfx942.vmfb with external weights: ./weights/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa.
[t=24.617 dt=8.645] Loading compiled_vae from ./vmfbs/stable_diffusion_xl_base_1_0_bs1_1024x1024_fp16_vae_gfx942.vmfb with external weights: ./weights/stable_diffusion_xl_base_1_0_vae_fp16.safetensors.
[t=25.173 dt=0.555] Loading compiled_scheduler from ./vmfbs/stable_diffusion_xl_base_1_0_EulerDiscreteScheduler_bs1_1024x1024_fp16_30_gfx942.vmfb with external weights: None.
[t=25.311 dt=0.138] Latency for compiled_clip['encode_prompts']: 0.02198624610900879sec
[t=25.427 dt=0.116] Latency for compiled_punet['main']: 0.09272503852844238sec
[t=25.513 dt=0.086] Latency for compiled_punet['main']: 0.0851752758026123sec
[t=25.600 dt=0.087] Latency for compiled_punet['main']: 0.08541464805603027sec
[t=25.687 dt=0.087] Latency for compiled_punet['main']: 0.08573579788208008sec
[t=25.773 dt=0.086] Latency for compiled_punet['main']: 0.08547711372375488sec
[t=25.860 dt=0.087] Latency for compiled_punet['main']: 0.08586978912353516sec
[t=25.947 dt=0.087] Latency for compiled_punet['main']: 0.08583927154541016sec
[t=26.033 dt=0.087] Latency for compiled_punet['main']: 0.08553814888000488sec
[t=26.120 dt=0.087] Latency for compiled_punet['main']: 0.08585166931152344sec
[t=26.210 dt=0.090] Latency for compiled_punet['main']: 0.08874630928039551sec
[t=26.300 dt=0.090] Latency for compiled_punet['main']: 0.08906340599060059sec
[t=26.391 dt=0.091] Latency for compiled_punet['main']: 0.0895528793334961sec
[t=26.482 dt=0.091] Latency for compiled_punet['main']: 0.08941054344177246sec
[t=26.573 dt=0.091] Latency for compiled_punet['main']: 0.08950352668762207sec
[t=26.663 dt=0.091] Latency for compiled_punet['main']: 0.08926606178283691sec
[t=26.754 dt=0.090] Latency for compiled_punet['main']: 0.08909058570861816sec
[t=26.844 dt=0.091] Latency for compiled_punet['main']: 0.08918428421020508sec
[t=26.935 dt=0.091] Latency for compiled_punet['main']: 0.08947038650512695sec
[t=27.026 dt=0.091] Latency for compiled_punet['main']: 0.08939838409423828sec
[t=27.117 dt=0.091] Latency for compiled_punet['main']: 0.08928704261779785sec
[t=27.207 dt=0.091] Latency for compiled_punet['main']: 0.0892179012298584sec
[t=27.298 dt=0.091] Latency for compiled_punet['main']: 0.08947348594665527sec
[t=27.389 dt=0.091] Latency for compiled_punet['main']: 0.08913540840148926sec
[t=27.479 dt=0.091] Latency for compiled_punet['main']: 0.08927750587463379sec
[t=27.570 dt=0.090] Latency for compiled_punet['main']: 0.0892488956451416sec
[t=27.661 dt=0.091] Latency for compiled_punet['main']: 0.08957576751708984sec
[t=27.752 dt=0.091] Latency for compiled_punet['main']: 0.08965015411376953sec
[t=27.841 dt=0.090] Latency for compiled_punet['main']: 0.08837294578552246sec
[t=27.929 dt=0.088] Latency for compiled_punet['main']: 0.0865163803100586sec
[t=28.016 dt=0.087] Latency for compiled_punet['main']: 0.08648180961608887sec
[t=28.096 dt=0.079] Latency for compiled_vae['decode']: 0.07868456840515137sec
sdxl_output_2024-07-12_15-19-52_0.png saved
Image generation complete.

example quiet(default) run:

~/SHARK-Turbine$ python models/turbine_models/custom_models/sd_inference/sd_pipeline.py --device=hip://1 --precision=fp16 --iree_target_triple=gfx942 --external_weights=safetensors --hf_model_name=stabilityai/stable-diffusion-xl-base-1.0 --width=1024 --height=1024 --use_i8_punet --batch_size=1
/home/eagarvey/iree/iree.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
30it [00:02, 11.23it/s]
sdxl_output_2024-07-12_15-22-02_0.png saved
Image generation complete.

Pipe through an optionfor benchmarks and add a printer for the Pipeli…

cc8ee42

…neBase and components.

IanNod reviewed Jul 12, 2024

View reviewed changes

monorimet and others added 3 commits July 12, 2024 14:59

Guard item assignment if benchmark=all

abc9356

Remove commented code, add option for batched inputs

a7de2ba

Don't redundantly use tqdm progress if we're printing benchmarks

6bc8cb4

monorimet requested a review from IanNod July 12, 2024 20:16

fix tqdm disable condition

6f0f7c7

IanNod approved these changes Jul 12, 2024

View reviewed changes

monorimet merged commit a0e4792 into main Jul 12, 2024
1 of 3 checks passed

monorimet deleted the print-bench branch July 12, 2024 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(SD) Add benchmark option and add a printer. #773

(SD) Add benchmark option and add a printer. #773

monorimet commented Jul 12, 2024 •

edited

Loading

IanNod Jul 12, 2024

monorimet Jul 12, 2024

IanNod Jul 12, 2024

monorimet commented Jul 12, 2024

		@@ -238,6 +243,41 @@ def __call__(self, function_name, inputs: list):
		return output


		class Printer:

(SD) Add benchmark option and add a printer. #773

(SD) Add benchmark option and add a printer. #773

Conversation

monorimet commented Jul 12, 2024 • edited Loading

IanNod Jul 12, 2024

Choose a reason for hiding this comment

monorimet Jul 12, 2024

Choose a reason for hiding this comment

IanNod Jul 12, 2024

Choose a reason for hiding this comment

monorimet commented Jul 12, 2024

monorimet commented Jul 12, 2024 •

edited

Loading