-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for sdxl pipeline (benchmarking) #155
Conversation
iree_tests/conftest.py
Outdated
@@ -292,6 +295,8 @@ def __init__(self, spec, **kwargs): | |||
self.run_args.extend(self.spec.iree_run_module_flags) | |||
self.run_args.append(f"--flagfile={self.spec.data_flagfile_name}") | |||
|
|||
self.benchmark_args = ["iree-benchmark-module", "--device=local-task", f"--module={prompt_encoder_path}/model_sdxl_cpu_llvm_task_real_weights.vmfb", f"--parameters=model={prompt_encoder_path}/real_weights.irpa", f"--module={scheduled_unet_path}/model_sdxl_cpu_llvm_task_real_weights.vmfb", f"--parameters=model={scheduled_unet_path}/real_weights.irpa", f"--module={vae_decode_path}/model_sdxl_cpu_llvm_task_real_weights.vmfb", f"--parameters=model={vae_decode_path}/real_weights.irpa", f"--module={vmfb_name}", "--function=tokens_to_image", "--input=1x4x128x128xf16", "--input=1xf16", "--input=1x64xi64", "--input=1x64xi64", "--input=1x64xi64", "--input=1x64xi64"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this with fresh eyes... if all you want for the moment is to benchmark a single program, you can just script that instead of hooking into pytest.
benchmark_sdxl.sh:
# license header
set -xeuo pipefail
iree-compile ...
iree-benchmark-module ...
We should not be hardcoding any paths or arguments into the test runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely what should be done. Thanks :) Do you know if we run a script in CI if it shows the output of running that script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set -xeuo pipefail
helps with that. I can never remember the individual flags, so
https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html
x
: print commands as they are rune
: stop immediately if any command fails (instead of just continuing... yes, bash is silly)u
: treat unset variables as an error (instead of just continuing with them unset...)-o pipefail
: exit the script with the return code of the last command (instead of silently succeeding if the script fails)
So you should see a mix of
command
output of command
command
output of command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok cool, thanks for all the info!
Updated PR to use benchmarking script instead. Should be good to go |
this_dir="$(cd $(dirname $0) && pwd)" | ||
iree_root="$(cd $this_dir/.. && pwd)" | ||
vae_decode_dir="$iree_root/pytorch/models/sdxl-vae-decode-tank" | ||
scheduled_unet_dir="$iree_root/pytorch/models/sdxl-scheduled-unet-3-tank" | ||
prompt_encoder_dir="$iree_root/pytorch/models/sdxl-prompt-encoder-tank" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer to wrap variables in {}
curly braces so they are unambiguous
this_dir="$(cd $(dirname $0) && pwd)" | |
iree_root="$(cd $this_dir/.. && pwd)" | |
vae_decode_dir="$iree_root/pytorch/models/sdxl-vae-decode-tank" | |
scheduled_unet_dir="$iree_root/pytorch/models/sdxl-scheduled-unet-3-tank" | |
prompt_encoder_dir="$iree_root/pytorch/models/sdxl-prompt-encoder-tank" | |
this_dir="$(cd $(dirname $0) && pwd)" | |
iree_root="$(cd ${this_dir}/.. && pwd)" | |
vae_decode_dir="${iree_root}/pytorch/models/sdxl-vae-decode-tank" | |
scheduled_unet_dir="${iree_root}/pytorch/models/sdxl-scheduled-unet-3-tank" | |
prompt_encoder_dir="${iree_root}/pytorch/models/sdxl-prompt-encoder-tank" |
references:
- https://stackoverflow.com/questions/8748831/when-do-we-need-curly-braces-around-shell-variables
- https://google.github.io/styleguide/shellguide.html
You can also append a question mark like ${variable?}
to check that the variable is defined instead of defaulting to empty string: https://stackoverflow.com/questions/8889302/what-is-the-meaning-of-a-question-mark-in-bash-variable-parameter-expansion-as-i
|
||
echo "Running sdxl benchmark" | ||
|
||
iree-benchmark-module --device=local-task --module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$prompt_encoder_dir/real_weights.irpa" --module="$scheduled_unet_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$scheduled_unet_dir/real_weights.irpa" --module="$vae_decode_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$vae_decode_dir/real_weights.irpa" --module="$this_dir/sdxl_full_pipeline_fp16_.vmfb" --function=tokens_to_image --input=1x4x128x128xf16 --input=1xf16 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wrap this line so they it is easier to view and edit?
iree-benchmark-module --device=local-task --module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$prompt_encoder_dir/real_weights.irpa" --module="$scheduled_unet_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$scheduled_unet_dir/real_weights.irpa" --module="$vae_decode_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$vae_decode_dir/real_weights.irpa" --module="$this_dir/sdxl_full_pipeline_fp16_.vmfb" --function=tokens_to_image --input=1x4x128x128xf16 --input=1xf16 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 | |
iree-benchmark-module \ | |
--device=local-task \ | |
--module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" \ | |
--parameters=model="${prompt_encoder_dir}/real_weights.irpa" \ | |
--module="${scheduled_unet_dir}/model_cpu_llvm_task_real_weights.vmfb" \ | |
--parameters=model="${scheduled_unet_dir}/real_weights.irpa" \ | |
--module="${vae_decode_dir}/model_cpu_llvm_task_real_weights.vmfb" \ | |
--parameters=model="${vae_decode_dir}/real_weights.irpa" \ | |
--module="${this_dir}/sdxl_full_pipeline_fp16_.vmfb" \ | |
--function=tokens_to_image \ | |
--input=1x4x128x128xf16 \ | |
--input=1xf16 \ | |
--input=1x64xi64 \ | |
--input=1x64xi64 \ | |
--input=1x64xi64 \ | |
--input=1x64xi64 |
|
||
set -xeuo pipefail | ||
|
||
this_dir="$(cd $(dirname $0) && pwd)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/tip: it's common in some scripts to differentiate between local variables and constants / environment variables. Here you could use ALL_CAPS for these variables defined at the top that are constant throughout the rest of the script
THIS_DIR
, IREE_ROOT
, VAE_DECODE_DIR
, etc.
https://google.github.io/styleguide/shellguide.html#constants-and-environment-variable-names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be compiled on-demand? If left checked in, this is one extra file that would need to be regenerated manually or by script whenever the compiler changes. A benchmark action that always runs the same code won't provide much value, unless we just want to monitor the machine/environment.
Actually, how do the different files fit together in this pipeline? The model_cpu_*
files are compiled from the pipeline stages and use real weights, then this "full_pipeline" connects them together? That should all be documented somewhere so others can understand what code is running and how to update it when needed.
If really keeping the file, could use Git LFS for it (10KB binary file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to using the mlir and adding a compile script, so anyone can understand what's going on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, basically the iree testing compiles all the submodels and tests individual model accuracy. Then, this pipelining mlir connects all the pieces and runs the whole pipeline
.github/workflows/test_iree.yml
Outdated
- name: "Running SDXL pipeline benchmark" | ||
run: | | ||
bash iree_tests/benchmarks/benchmark_sdxl_cpu.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be better suited to a separate workflow file that runs on its own schedule, if the goal is to get regular benchmark numbers.
This file is currently running only when there is pull request activity:
name: IREE Test Suite
on:
# TODO(scotttodd): run on schedule (nightly), and/or on pushes to `main`
pull_request:
paths:
- "iree_tests/**"
workflow_dispatch:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially since this "cpu" script is running under linux_x86_64_w7900_gpu_models
:P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And how much do we even care to benchmark this model on CPU?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha true. The team asked that it would be great to have performance numbers with every PR to make sure we don't regress. But, I think they were referring to rocm backend probably. You think I should just put in seperate area for now then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this repo, we can have a nightly job to run whatever benchmarks that we want. In the IREE repo for presubmit, we can either
- add this on to the end of the pkgci test jobs
- add new pkgci benchmark jobs
- slot into the existing benchmark pipeline (for integration into https://perf.iree.dev/)
For sprints like the work on the SDXL branch, I think we should aim for something easy to set up, e.g. only edits to a single workflow file (or about the size of this PR). Needing to click into the CI logs to see the benchmark results is fine (though not ideal) in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok in this repo, I will create a new scheduled workflow for benchmarking. In IREE repo, I would vote for add this on to the end of the pkgci test jobs, so that at least for this sprint over the next couple of weeks, the dev can easily check accuracy and benchmarking results in the same workflow logs for all sdxl related stuff. (It would just be adding one line right? Because we are cloning this repo there too, just adding
- name: "Running SDXL pipeline benchmark"
run: |
bash iree_tests/benchmarks/benchmark_sdxl_cpu.sh
to the end of the job in that repo workflow file should work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also because, we won't be able to run a benchmark if we can't compile the submodules and shouldn't be running it if accuracy is off. So, instead of separating them (would need to also duplicate all the compiling steps for the submodels done here), probably best to keep them as a sequential execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM!
@ScottTodd this should be good to go now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I was waiting to see if you were going to add a new workflow in this repo, to read through the latest logs there.
Hmm, so about that, I would have to change the script to do all the compiling for the submodels as well, so that it is independent and can be run as a nightly. I didn't want to do that because I wanted it to be in line with our iree repo use case we discussed. I can add another script that does the whole thing for this repo though. Was just wondering if that's needed when we're eventually just gonna be running it on PRs in the iree repo. |
This commit adds the V0 support for testing all sdxl submodels. This PR is not a long term solution for benchmarking as Scott and I discussed here: #152. This is the result of a request to get sdxl benchmarking in ASAP by our team. Due to the high priority for this to be added to the sdxl testing as our team lands patches in IREE, this is simply just to get the implementation in and working. Scott and I discussed some more intensive and well structured ways to add benchmarking, which either of us may implement in the future. Also, this PR depends on this one in terms of landing: #152 (hence the CI failure) Notes for future if we decide that we need a stronger implementation: 1. Maybe something like iree-org/iree#16965 which will feed into https://perf.iree.dev/. 2. This is the benchmarking framework we already have: https://github.com/openxla/iree-comparative-benchmark and https://github.com/openxla/iree/tree/main/build_tools/benchmarks 3. Some questions Scott raised to keep in mind for future implementation: * What metrics/artifacts do we want from benchmarking? * Each model in isolation? Full pipeline latency? Just dispatch time? * What do we want done with benchmark results / artifacts? * The in-tree benchmarks in IREE submit results to a dashboard (that should use a queryable database...), upload Tracy files to cloud storage, and comment on pending pull requests with results summaries * Where do we want benchmarks to run? * Right after tests, on presubmit to IREE? * In a separate job, on separate runners? If we decide benchmarking needs changes, we will address all of these and come up with a more structured, methodical implementation that either creates a new benchmarking flow here or plugs into the iree benchmarking setup.
This commit adds the V0 support for testing all sdxl submodels. This PR is not a long term solution for benchmarking as Scott and I discussed here: #152. This is the result of a request to get sdxl benchmarking in ASAP by our team. Due to the high priority for this to be added to the sdxl testing as our team lands patches in IREE, this is simply just to get the implementation in and working. Scott and I discussed some more intensive and well structured ways to add benchmarking, which either of us may implement in the future.
Also, this PR depends on this one in terms of landing: #152 (hence the CI failure)
Notes for future if we decide that we need a stronger implementation:
If we decide benchmarking needs changes, we will address all of these and come up with a more structured, methodical implementation that either creates a new benchmarking flow here or plugs into the iree benchmarking setup.