Support for sdxl pipeline (benchmarking) #155

saienduri · 2024-04-04T02:46:02Z

This commit adds the V0 support for testing all sdxl submodels. This PR is not a long term solution for benchmarking as Scott and I discussed here: #152. This is the result of a request to get sdxl benchmarking in ASAP by our team. Due to the high priority for this to be added to the sdxl testing as our team lands patches in IREE, this is simply just to get the implementation in and working. Scott and I discussed some more intensive and well structured ways to add benchmarking, which either of us may implement in the future.

Also, this PR depends on this one in terms of landing: #152 (hence the CI failure)

Notes for future if we decide that we need a stronger implementation:

Maybe something like Support external parameters in IREE benchmark suite iree-org/iree#16965 which will feed into https://perf.iree.dev/.
This is the benchmarking framework we already have: https://github.com/openxla/iree-comparative-benchmark and https://github.com/openxla/iree/tree/main/build_tools/benchmarks
Some questions Scott raised to keep in mind for future implementation:

What metrics/artifacts do we want from benchmarking?
- Each model in isolation? Full pipeline latency? Just dispatch time?
What do we want done with benchmark results / artifacts?
- The in-tree benchmarks in IREE submit results to a dashboard (that should use a queryable database...), upload Tracy files to cloud storage, and comment on pending pull requests with results summaries
Where do we want benchmarks to run?
- Right after tests, on presubmit to IREE?
- In a separate job, on separate runners?
  If we decide benchmarking needs changes, we will address all of these and come up with a more structured, methodical implementation that either creates a new benchmarking flow here or plugs into the iree benchmarking setup.

ScottTodd · 2024-04-04T15:53:04Z

iree_tests/conftest.py

@@ -292,6 +295,8 @@ def __init__(self, spec, **kwargs):
        self.run_args.extend(self.spec.iree_run_module_flags)
        self.run_args.append(f"--flagfile={self.spec.data_flagfile_name}")

+        self.benchmark_args = ["iree-benchmark-module", "--device=local-task", f"--module={prompt_encoder_path}/model_sdxl_cpu_llvm_task_real_weights.vmfb", f"--parameters=model={prompt_encoder_path}/real_weights.irpa", f"--module={scheduled_unet_path}/model_sdxl_cpu_llvm_task_real_weights.vmfb", f"--parameters=model={scheduled_unet_path}/real_weights.irpa", f"--module={vae_decode_path}/model_sdxl_cpu_llvm_task_real_weights.vmfb", f"--parameters=model={vae_decode_path}/real_weights.irpa", f"--module={vmfb_name}", "--function=tokens_to_image", "--input=1x4x128x128xf16", "--input=1xf16", "--input=1x64xi64", "--input=1x64xi64", "--input=1x64xi64", "--input=1x64xi64"]


Looking at this with fresh eyes... if all you want for the moment is to benchmark a single program, you can just script that instead of hooking into pytest.

benchmark_sdxl.sh:

# license header set -xeuo pipefail iree-compile ... iree-benchmark-module ...

We should not be hardcoding any paths or arguments into the test runner.

This is definitely what should be done. Thanks :) Do you know if we run a script in CI if it shows the output of running that script?

set -xeuo pipefail helps with that. I can never remember the individual flags, so
https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html

x: print commands as they are run

e: stop immediately if any command fails (instead of just continuing... yes, bash is silly)

u: treat unset variables as an error (instead of just continuing with them unset...)

-o pipefail: exit the script with the return code of the last command (instead of silently succeeding if the script fails)

So you should see a mix of

command output of command command output of command

Ah ok cool, thanks for all the info!

saienduri · 2024-04-04T23:29:08Z

Updated PR to use benchmarking script instead. Should be good to go

ScottTodd · 2024-04-08T15:30:56Z

iree_tests/benchmarks/benchmark_sdxl_cpu.sh

+this_dir="$(cd $(dirname $0) && pwd)"
+iree_root="$(cd $this_dir/.. && pwd)"
+vae_decode_dir="$iree_root/pytorch/models/sdxl-vae-decode-tank"
+scheduled_unet_dir="$iree_root/pytorch/models/sdxl-scheduled-unet-3-tank"
+prompt_encoder_dir="$iree_root/pytorch/models/sdxl-prompt-encoder-tank"


nit: prefer to wrap variables in {} curly braces so they are unambiguous

Suggested change

this_dir="$(cd $(dirname $0) && pwd)"

iree_root="$(cd $this_dir/.. && pwd)"

vae_decode_dir="$iree_root/pytorch/models/sdxl-vae-decode-tank"

scheduled_unet_dir="$iree_root/pytorch/models/sdxl-scheduled-unet-3-tank"

prompt_encoder_dir="$iree_root/pytorch/models/sdxl-prompt-encoder-tank"

this_dir="$(cd $(dirname $0) && pwd)"

iree_root="$(cd ${this_dir}/.. && pwd)"

vae_decode_dir="${iree_root}/pytorch/models/sdxl-vae-decode-tank"

scheduled_unet_dir="${iree_root}/pytorch/models/sdxl-scheduled-unet-3-tank"

prompt_encoder_dir="${iree_root}/pytorch/models/sdxl-prompt-encoder-tank"

references:

https://stackoverflow.com/questions/8748831/when-do-we-need-curly-braces-around-shell-variables

https://google.github.io/styleguide/shellguide.html

You can also append a question mark like ${variable?} to check that the variable is defined instead of defaulting to empty string: https://stackoverflow.com/questions/8889302/what-is-the-meaning-of-a-question-mark-in-bash-variable-parameter-expansion-as-i

ScottTodd · 2024-04-08T15:32:33Z

iree_tests/benchmarks/benchmark_sdxl_cpu.sh

+
+echo "Running sdxl benchmark"
+
+iree-benchmark-module --device=local-task --module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$prompt_encoder_dir/real_weights.irpa" --module="$scheduled_unet_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$scheduled_unet_dir/real_weights.irpa" --module="$vae_decode_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$vae_decode_dir/real_weights.irpa" --module="$this_dir/sdxl_full_pipeline_fp16_.vmfb" --function=tokens_to_image --input=1x4x128x128xf16 --input=1xf16 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64


nit: wrap this line so they it is easier to view and edit?

Suggested change

iree-benchmark-module --device=local-task --module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$prompt_encoder_dir/real_weights.irpa" --module="$scheduled_unet_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$scheduled_unet_dir/real_weights.irpa" --module="$vae_decode_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$vae_decode_dir/real_weights.irpa" --module="$this_dir/sdxl_full_pipeline_fp16_.vmfb" --function=tokens_to_image --input=1x4x128x128xf16 --input=1xf16 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64

iree-benchmark-module \

--device=local-task \

--module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" \

--parameters=model="${prompt_encoder_dir}/real_weights.irpa" \

--module="${scheduled_unet_dir}/model_cpu_llvm_task_real_weights.vmfb" \

--parameters=model="${scheduled_unet_dir}/real_weights.irpa" \

--module="${vae_decode_dir}/model_cpu_llvm_task_real_weights.vmfb" \

--parameters=model="${vae_decode_dir}/real_weights.irpa" \

--module="${this_dir}/sdxl_full_pipeline_fp16_.vmfb" \

--function=tokens_to_image \

--input=1x4x128x128xf16 \

--input=1xf16 \

--input=1x64xi64 \

--input=1x64xi64 \

--input=1x64xi64 \

--input=1x64xi64

ScottTodd · 2024-04-08T15:33:45Z

iree_tests/benchmarks/benchmark_sdxl_cpu.sh

+
+set -xeuo pipefail
+
+this_dir="$(cd $(dirname $0) && pwd)"


nit/tip: it's common in some scripts to differentiate between local variables and constants / environment variables. Here you could use ALL_CAPS for these variables defined at the top that are constant throughout the rest of the script

THIS_DIR, IREE_ROOT, VAE_DECODE_DIR, etc.

https://google.github.io/styleguide/shellguide.html#constants-and-environment-variable-names

ScottTodd · 2024-04-08T15:37:19Z

iree_tests/benchmarks/sdxl_full_pipeline_fp16_.vmfb

Shouldn't this be compiled on-demand? If left checked in, this is one extra file that would need to be regenerated manually or by script whenever the compiler changes. A benchmark action that always runs the same code won't provide much value, unless we just want to monitor the machine/environment.

Actually, how do the different files fit together in this pipeline? The model_cpu_* files are compiled from the pipeline stages and use real weights, then this "full_pipeline" connects them together? That should all be documented somewhere so others can understand what code is running and how to update it when needed.

If really keeping the file, could use Git LFS for it (10KB binary file).

Switched to using the mlir and adding a compile script, so anyone can understand what's going on

Yeah, basically the iree testing compiles all the submodels and tests individual model accuracy. Then, this pipelining mlir connects all the pieces and runs the whole pipeline

ScottTodd · 2024-04-08T15:38:46Z

.github/workflows/test_iree.yml

+      - name: "Running SDXL pipeline benchmark"
+        run: |
+          bash iree_tests/benchmarks/benchmark_sdxl_cpu.sh


This may be better suited to a separate workflow file that runs on its own schedule, if the goal is to get regular benchmark numbers.

This file is currently running only when there is pull request activity:

name: IREE Test Suite on: # TODO(scotttodd): run on schedule (nightly), and/or on pushes to `main` pull_request: paths: - "iree_tests/**" workflow_dispatch:

Especially since this "cpu" script is running under linux_x86_64_w7900_gpu_models :P

(And how much do we even care to benchmark this model on CPU?)

Haha true. The team asked that it would be great to have performance numbers with every PR to make sure we don't regress. But, I think they were referring to rocm backend probably. You think I should just put in seperate area for now then?

In this repo, we can have a nightly job to run whatever benchmarks that we want. In the IREE repo for presubmit, we can either

add this on to the end of the pkgci test jobs

add new pkgci benchmark jobs

slot into the existing benchmark pipeline (for integration into https://perf.iree.dev/)

For sprints like the work on the SDXL branch, I think we should aim for something easy to set up, e.g. only edits to a single workflow file (or about the size of this PR). Needing to click into the CI logs to see the benchmark results is fine (though not ideal) in that case.

Ok in this repo, I will create a new scheduled workflow for benchmarking. In IREE repo, I would vote for add this on to the end of the pkgci test jobs, so that at least for this sprint over the next couple of weeks, the dev can easily check accuracy and benchmarking results in the same workflow logs for all sdxl related stuff. (It would just be adding one line right? Because we are cloning this repo there too, just adding

- name: "Running SDXL pipeline benchmark" run: | bash iree_tests/benchmarks/benchmark_sdxl_cpu.sh

to the end of the job in that repo workflow file should work)

Also because, we won't be able to run a benchmark if we can't compile the submodules and shouldn't be running it if accuracy is off. So, instead of separating them (would need to also duplicate all the compiling steps for the submodels done here), probably best to keep them as a sequential execution

saienduri · 2024-04-08T21:45:32Z

@ScottTodd this should be good to go now

ScottTodd

Ah. I was waiting to see if you were going to add a new workflow in this repo, to read through the latest logs there.

saienduri · 2024-04-08T22:01:05Z

Ah. I was waiting to see if you were going to add a new workflow in this repo, to read through the latest logs there.

Hmm, so about that, I would have to change the script to do all the compiling for the submodels as well, so that it is independent and can be run as a nightly. I didn't want to do that because I wanted it to be in line with our iree repo use case we discussed. I can add another script that does the whole thing for this repo though. Was just wondering if that's needed when we're eventually just gonna be running it on PRs in the iree repo.

This commit adds the V0 support for testing all sdxl submodels. This PR is not a long term solution for benchmarking as Scott and I discussed here: #152. This is the result of a request to get sdxl benchmarking in ASAP by our team. Due to the high priority for this to be added to the sdxl testing as our team lands patches in IREE, this is simply just to get the implementation in and working. Scott and I discussed some more intensive and well structured ways to add benchmarking, which either of us may implement in the future. Also, this PR depends on this one in terms of landing: #152 (hence the CI failure) Notes for future if we decide that we need a stronger implementation: 1. Maybe something like iree-org/iree#16965 which will feed into https://perf.iree.dev/. 2. This is the benchmarking framework we already have: https://github.com/openxla/iree-comparative-benchmark and https://github.com/openxla/iree/tree/main/build_tools/benchmarks 3. Some questions Scott raised to keep in mind for future implementation: * What metrics/artifacts do we want from benchmarking? * Each model in isolation? Full pipeline latency? Just dispatch time? * What do we want done with benchmark results / artifacts? * The in-tree benchmarks in IREE submit results to a dashboard (that should use a queryable database...), upload Tracy files to cloud storage, and comment on pending pull requests with results summaries * Where do we want benchmarks to run? * Right after tests, on presubmit to IREE? * In a separate job, on separate runners? If we decide benchmarking needs changes, we will address all of these and come up with a more structured, methodical implementation that either creates a new benchmarking flow here or plugs into the iree benchmarking setup.

saienduri added 5 commits April 3, 2024 19:28

add benchmarking

9367391

sync with main

5bd6714

add conftest

5270f18

header

f655dff

add benchmark folder

55b5700

saienduri mentioned this pull request Apr 4, 2024

Support for sdxl pipeline (testing) #152

Merged

saienduri requested a review from ScottTodd April 4, 2024 02:47

ScottTodd requested changes Apr 4, 2024

View reviewed changes

saienduri and others added 7 commits April 4, 2024 13:41

Merge branch 'main' into sdxl-benchmark

4d97eb2

try benchmark script

9e5cdfe

try variation

9ea7c72

try with fixed paths

402109c

uniform names

d6e0582

new line

2fc3f62

remove previous support

9eb7a62

saienduri requested a review from ScottTodd April 4, 2024 23:28

ScottTodd requested changes Apr 8, 2024

View reviewed changes

saienduri added 4 commits April 8, 2024 10:22

Address comments

2b5cf86

update paths

a3634c3

new line

109fed7

remove benchmarking in this workflow

ab59896

saienduri requested a review from ScottTodd April 8, 2024 21:45

ScottTodd approved these changes Apr 8, 2024

View reviewed changes

saienduri merged commit a721335 into main Apr 8, 2024
2 checks passed

saienduri deleted the sdxl-benchmark branch April 8, 2024 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for sdxl pipeline (benchmarking) #155

Support for sdxl pipeline (benchmarking) #155

saienduri commented Apr 4, 2024 •

edited

Loading

ScottTodd Apr 4, 2024

saienduri Apr 4, 2024

ScottTodd Apr 4, 2024

saienduri Apr 4, 2024

saienduri commented Apr 4, 2024

ScottTodd Apr 8, 2024

ScottTodd Apr 8, 2024

ScottTodd Apr 8, 2024

ScottTodd Apr 8, 2024

saienduri Apr 8, 2024

saienduri Apr 8, 2024 •

edited

Loading

ScottTodd Apr 8, 2024

ScottTodd Apr 8, 2024

ScottTodd Apr 8, 2024

saienduri Apr 8, 2024

ScottTodd Apr 8, 2024

saienduri Apr 8, 2024

saienduri Apr 8, 2024 •

edited

Loading

ScottTodd Apr 8, 2024

saienduri commented Apr 8, 2024

ScottTodd left a comment

saienduri commented Apr 8, 2024


		echo "Running sdxl benchmark"

		iree-benchmark-module --device=local-task --module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$prompt_encoder_dir/real_weights.irpa" --module="$scheduled_unet_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$scheduled_unet_dir/real_weights.irpa" --module="$vae_decode_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$vae_decode_dir/real_weights.irpa" --module="$this_dir/sdxl_full_pipeline_fp16_.vmfb" --function=tokens_to_image --input=1x4x128x128xf16 --input=1xf16 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64

-iree-benchmark-module --device=local-task --module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$prompt_encoder_dir/real_weights.irpa" --module="$scheduled_unet_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$scheduled_unet_dir/real_weights.irpa" --module="$vae_decode_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$vae_decode_dir/real_weights.irpa" --module="$this_dir/sdxl_full_pipeline_fp16_.vmfb" --function=tokens_to_image --input=1x4x128x128xf16 --input=1xf16 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64
+iree-benchmark-module \
+  --device=local-task \
+  --module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" \
+  --parameters=model="${prompt_encoder_dir}/real_weights.irpa" \
+  --module="${scheduled_unet_dir}/model_cpu_llvm_task_real_weights.vmfb" \
+  --parameters=model="${scheduled_unet_dir}/real_weights.irpa" \
+  --module="${vae_decode_dir}/model_cpu_llvm_task_real_weights.vmfb" \
+  --parameters=model="${vae_decode_dir}/real_weights.irpa" \
+  --module="${this_dir}/sdxl_full_pipeline_fp16_.vmfb" \
+  --function=tokens_to_image \
+  --input=1x4x128x128xf16 \
+  --input=1xf16 \
+  --input=1x64xi64 \
+  --input=1x64xi64 \
+  --input=1x64xi64 \
+  --input=1x64xi64

Support for sdxl pipeline (benchmarking) #155

Support for sdxl pipeline (benchmarking) #155

Conversation

saienduri commented Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri commented Apr 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri commented Apr 8, 2024

ScottTodd left a comment

Choose a reason for hiding this comment

saienduri commented Apr 8, 2024

saienduri commented Apr 4, 2024 •

edited

Loading

saienduri Apr 8, 2024 •

edited

Loading

saienduri Apr 8, 2024 •

edited

Loading