Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for sdxl pipeline (benchmarking) #155

Merged
merged 16 commits into from
Apr 8, 2024
Merged

Support for sdxl pipeline (benchmarking) #155

merged 16 commits into from
Apr 8, 2024

Conversation

saienduri
Copy link
Contributor

@saienduri saienduri commented Apr 4, 2024

This commit adds the V0 support for testing all sdxl submodels. This PR is not a long term solution for benchmarking as Scott and I discussed here: #152. This is the result of a request to get sdxl benchmarking in ASAP by our team. Due to the high priority for this to be added to the sdxl testing as our team lands patches in IREE, this is simply just to get the implementation in and working. Scott and I discussed some more intensive and well structured ways to add benchmarking, which either of us may implement in the future.

Also, this PR depends on this one in terms of landing: #152 (hence the CI failure)

Notes for future if we decide that we need a stronger implementation:

  1. Maybe something like Support external parameters in IREE benchmark suite iree-org/iree#16965 which will feed into https://perf.iree.dev/.
  2. This is the benchmarking framework we already have: https://github.com/openxla/iree-comparative-benchmark and https://github.com/openxla/iree/tree/main/build_tools/benchmarks
  3. Some questions Scott raised to keep in mind for future implementation:
  • What metrics/artifacts do we want from benchmarking?
    • Each model in isolation? Full pipeline latency? Just dispatch time?
  • What do we want done with benchmark results / artifacts?
    • The in-tree benchmarks in IREE submit results to a dashboard (that should use a queryable database...), upload Tracy files to cloud storage, and comment on pending pull requests with results summaries
  • Where do we want benchmarks to run?
    • Right after tests, on presubmit to IREE?
    • In a separate job, on separate runners?
      If we decide benchmarking needs changes, we will address all of these and come up with a more structured, methodical implementation that either creates a new benchmarking flow here or plugs into the iree benchmarking setup.

@@ -292,6 +295,8 @@ def __init__(self, spec, **kwargs):
self.run_args.extend(self.spec.iree_run_module_flags)
self.run_args.append(f"--flagfile={self.spec.data_flagfile_name}")

self.benchmark_args = ["iree-benchmark-module", "--device=local-task", f"--module={prompt_encoder_path}/model_sdxl_cpu_llvm_task_real_weights.vmfb", f"--parameters=model={prompt_encoder_path}/real_weights.irpa", f"--module={scheduled_unet_path}/model_sdxl_cpu_llvm_task_real_weights.vmfb", f"--parameters=model={scheduled_unet_path}/real_weights.irpa", f"--module={vae_decode_path}/model_sdxl_cpu_llvm_task_real_weights.vmfb", f"--parameters=model={vae_decode_path}/real_weights.irpa", f"--module={vmfb_name}", "--function=tokens_to_image", "--input=1x4x128x128xf16", "--input=1xf16", "--input=1x64xi64", "--input=1x64xi64", "--input=1x64xi64", "--input=1x64xi64"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this with fresh eyes... if all you want for the moment is to benchmark a single program, you can just script that instead of hooking into pytest.

benchmark_sdxl.sh:

# license header

set -xeuo pipefail

iree-compile ...
iree-benchmark-module ...

We should not be hardcoding any paths or arguments into the test runner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely what should be done. Thanks :) Do you know if we run a script in CI if it shows the output of running that script?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set -xeuo pipefail helps with that. I can never remember the individual flags, so
https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html

  • x: print commands as they are run
  • e: stop immediately if any command fails (instead of just continuing... yes, bash is silly)
  • u: treat unset variables as an error (instead of just continuing with them unset...)
  • -o pipefail: exit the script with the return code of the last command (instead of silently succeeding if the script fails)

So you should see a mix of

command
output of command
command
output of command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok cool, thanks for all the info!

@saienduri saienduri requested a review from ScottTodd April 4, 2024 23:28
@saienduri
Copy link
Contributor Author

Updated PR to use benchmarking script instead. Should be good to go

Comment on lines 11 to 15
this_dir="$(cd $(dirname $0) && pwd)"
iree_root="$(cd $this_dir/.. && pwd)"
vae_decode_dir="$iree_root/pytorch/models/sdxl-vae-decode-tank"
scheduled_unet_dir="$iree_root/pytorch/models/sdxl-scheduled-unet-3-tank"
prompt_encoder_dir="$iree_root/pytorch/models/sdxl-prompt-encoder-tank"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer to wrap variables in {} curly braces so they are unambiguous

Suggested change
this_dir="$(cd $(dirname $0) && pwd)"
iree_root="$(cd $this_dir/.. && pwd)"
vae_decode_dir="$iree_root/pytorch/models/sdxl-vae-decode-tank"
scheduled_unet_dir="$iree_root/pytorch/models/sdxl-scheduled-unet-3-tank"
prompt_encoder_dir="$iree_root/pytorch/models/sdxl-prompt-encoder-tank"
this_dir="$(cd $(dirname $0) && pwd)"
iree_root="$(cd ${this_dir}/.. && pwd)"
vae_decode_dir="${iree_root}/pytorch/models/sdxl-vae-decode-tank"
scheduled_unet_dir="${iree_root}/pytorch/models/sdxl-scheduled-unet-3-tank"
prompt_encoder_dir="${iree_root}/pytorch/models/sdxl-prompt-encoder-tank"

references:


You can also append a question mark like ${variable?} to check that the variable is defined instead of defaulting to empty string: https://stackoverflow.com/questions/8889302/what-is-the-meaning-of-a-question-mark-in-bash-variable-parameter-expansion-as-i


echo "Running sdxl benchmark"

iree-benchmark-module --device=local-task --module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$prompt_encoder_dir/real_weights.irpa" --module="$scheduled_unet_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$scheduled_unet_dir/real_weights.irpa" --module="$vae_decode_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$vae_decode_dir/real_weights.irpa" --module="$this_dir/sdxl_full_pipeline_fp16_.vmfb" --function=tokens_to_image --input=1x4x128x128xf16 --input=1xf16 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wrap this line so they it is easier to view and edit?

Suggested change
iree-benchmark-module --device=local-task --module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$prompt_encoder_dir/real_weights.irpa" --module="$scheduled_unet_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$scheduled_unet_dir/real_weights.irpa" --module="$vae_decode_dir/model_cpu_llvm_task_real_weights.vmfb" --parameters=model="$vae_decode_dir/real_weights.irpa" --module="$this_dir/sdxl_full_pipeline_fp16_.vmfb" --function=tokens_to_image --input=1x4x128x128xf16 --input=1xf16 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64 --input=1x64xi64
iree-benchmark-module \
--device=local-task \
--module="$prompt_encoder_dir/model_cpu_llvm_task_real_weights.vmfb" \
--parameters=model="${prompt_encoder_dir}/real_weights.irpa" \
--module="${scheduled_unet_dir}/model_cpu_llvm_task_real_weights.vmfb" \
--parameters=model="${scheduled_unet_dir}/real_weights.irpa" \
--module="${vae_decode_dir}/model_cpu_llvm_task_real_weights.vmfb" \
--parameters=model="${vae_decode_dir}/real_weights.irpa" \
--module="${this_dir}/sdxl_full_pipeline_fp16_.vmfb" \
--function=tokens_to_image \
--input=1x4x128x128xf16 \
--input=1xf16 \
--input=1x64xi64 \
--input=1x64xi64 \
--input=1x64xi64 \
--input=1x64xi64


set -xeuo pipefail

this_dir="$(cd $(dirname $0) && pwd)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/tip: it's common in some scripts to differentiate between local variables and constants / environment variables. Here you could use ALL_CAPS for these variables defined at the top that are constant throughout the rest of the script

THIS_DIR, IREE_ROOT, VAE_DECODE_DIR, etc.

https://google.github.io/styleguide/shellguide.html#constants-and-environment-variable-names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be compiled on-demand? If left checked in, this is one extra file that would need to be regenerated manually or by script whenever the compiler changes. A benchmark action that always runs the same code won't provide much value, unless we just want to monitor the machine/environment.

Actually, how do the different files fit together in this pipeline? The model_cpu_* files are compiled from the pipeline stages and use real weights, then this "full_pipeline" connects them together? That should all be documented somewhere so others can understand what code is running and how to update it when needed.


If really keeping the file, could use Git LFS for it (10KB binary file).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using the mlir and adding a compile script, so anyone can understand what's going on

Copy link
Contributor Author

@saienduri saienduri Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, basically the iree testing compiles all the submodels and tests individual model accuracy. Then, this pipelining mlir connects all the pieces and runs the whole pipeline

Comment on lines 113 to 115
- name: "Running SDXL pipeline benchmark"
run: |
bash iree_tests/benchmarks/benchmark_sdxl_cpu.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be better suited to a separate workflow file that runs on its own schedule, if the goal is to get regular benchmark numbers.

This file is currently running only when there is pull request activity:

name: IREE Test Suite
on:
  # TODO(scotttodd): run on schedule (nightly), and/or on pushes to `main`
  pull_request:
    paths:
      - "iree_tests/**"
  workflow_dispatch:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially since this "cpu" script is running under linux_x86_64_w7900_gpu_models :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And how much do we even care to benchmark this model on CPU?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha true. The team asked that it would be great to have performance numbers with every PR to make sure we don't regress. But, I think they were referring to rocm backend probably. You think I should just put in seperate area for now then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this repo, we can have a nightly job to run whatever benchmarks that we want. In the IREE repo for presubmit, we can either

  • add this on to the end of the pkgci test jobs
  • add new pkgci benchmark jobs
  • slot into the existing benchmark pipeline (for integration into https://perf.iree.dev/)

For sprints like the work on the SDXL branch, I think we should aim for something easy to set up, e.g. only edits to a single workflow file (or about the size of this PR). Needing to click into the CI logs to see the benchmark results is fine (though not ideal) in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok in this repo, I will create a new scheduled workflow for benchmarking. In IREE repo, I would vote for add this on to the end of the pkgci test jobs, so that at least for this sprint over the next couple of weeks, the dev can easily check accuracy and benchmarking results in the same workflow logs for all sdxl related stuff. (It would just be adding one line right? Because we are cloning this repo there too, just adding

- name: "Running SDXL pipeline benchmark"
        run: |
          bash iree_tests/benchmarks/benchmark_sdxl_cpu.sh

to the end of the job in that repo workflow file should work)

Copy link
Contributor Author

@saienduri saienduri Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also because, we won't be able to run a benchmark if we can't compile the submodules and shouldn't be running it if accuracy is off. So, instead of separating them (would need to also duplicate all the compiling steps for the submodels done here), probably best to keep them as a sequential execution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

@saienduri saienduri requested a review from ScottTodd April 8, 2024 21:45
@saienduri
Copy link
Contributor Author

@ScottTodd this should be good to go now

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I was waiting to see if you were going to add a new workflow in this repo, to read through the latest logs there.

@saienduri
Copy link
Contributor Author

Ah. I was waiting to see if you were going to add a new workflow in this repo, to read through the latest logs there.

Hmm, so about that, I would have to change the script to do all the compiling for the submodels as well, so that it is independent and can be run as a nightly. I didn't want to do that because I wanted it to be in line with our iree repo use case we discussed. I can add another script that does the whole thing for this repo though. Was just wondering if that's needed when we're eventually just gonna be running it on PRs in the iree repo.

@saienduri saienduri merged commit a721335 into main Apr 8, 2024
2 checks passed
@saienduri saienduri deleted the sdxl-benchmark branch April 8, 2024 22:24
renxida pushed a commit that referenced this pull request Jul 18, 2024
This commit adds the V0 support for testing all sdxl submodels. This PR
is not a long term solution for benchmarking as Scott and I discussed
here: #152. This is the
result of a request to get sdxl benchmarking in ASAP by our team. Due to
the high priority for this to be added to the sdxl testing as our team
lands patches in IREE, this is simply just to get the implementation in
and working. Scott and I discussed some more intensive and well
structured ways to add benchmarking, which either of us may implement in
the future.

Also, this PR depends on this one in terms of landing:
#152 (hence the CI
failure)

Notes for future if we decide that we need a stronger implementation: 

1. Maybe something like iree-org/iree#16965 which
will feed into https://perf.iree.dev/.
2. This is the benchmarking framework we already have:
https://github.com/openxla/iree-comparative-benchmark and
https://github.com/openxla/iree/tree/main/build_tools/benchmarks
3. Some questions Scott raised to keep in mind for future
implementation:
* What metrics/artifacts do we want from benchmarking?
  * Each model in isolation? Full pipeline latency? Just dispatch time?
* What do we want done with benchmark results / artifacts?
* The in-tree benchmarks in IREE submit results to a dashboard (that
should use a queryable database...), upload Tracy files to cloud
storage, and comment on pending pull requests with results summaries
* Where do we want benchmarks to run?
  * Right after tests, on presubmit to IREE?
  * In a separate job, on separate runners?
If we decide benchmarking needs changes, we will address all of these
and come up with a more structured, methodical implementation that
either creates a new benchmarking flow here or plugs into the iree
benchmarking setup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants