Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run the stable diffusion example #558

Closed
Ryu1845 opened this issue Nov 4, 2022 · 5 comments
Closed

Can't run the stable diffusion example #558

Ryu1845 opened this issue Nov 4, 2022 · 5 comments

Comments

@Ryu1845
Copy link

Ryu1845 commented Nov 4, 2022

Hello, I followed the step in the README, and it gives me an error at the end.
I'm on Linux, here's my gpu information (GTX 1070)

➜  stable-diffusion git:(main) nvidia-smi                                                           
Fri Nov  4 16:01:27 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06    Driver Version: 520.56.06    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:07:00.0  On |                  N/A |
| 40%   51C    P8    16W / 151W |    515MiB /  8192MiB |      9%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

and the pytorch version

➜  stable-diffusion git:(main) python -c 'import torch;print(torch.__version__)' 
1.12.1+cu102

The output is the following;

➜  stable-diffusion git:(main) cargo run --example stable-diffusion --features regex -- "A rusty robot holding a fire torch."
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `/home/ryu/tch-rs/target/debug/examples/stable-diffusion 'A rusty robot holding a fire torch.'`
Cuda available: true
Cudnn available: true
Str: <|startoftext|>a rusty robot holding a fire torch . <|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>
Tokens: Tensor[[1, 77], Ok(Int64)]
Building the Clip transformer.
Text embeddings: Tensor[[2, 77, 768], Ok(Float)]
Building the autoencoder.
Building the unet.
Timestep 0 990 Tensor[[1, 4, 64, 64], Ok(Float)]
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Torch("Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_mm)\nException raised from common_device_check_failure at /build/python-pytorch/src/pytorch-1.13.0-cuda/aten/src/ATen/core/adaption.cpp:7 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x92 (0x7f253b126bf2 in /usr/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x8a (0x7f253b0ec7ea in /usr/lib/libc10.so)\nframe #2: c10::impl::common_device_check_failure(c10::optional<c10::Device>&, at::Tensor const&, char const*, char const*) + 0x3ea (0x7f24ed09dcfa in /usr/lib/libtorch_cpu.so)\nframe #3: <unknown function> + 0x394f11d (0x7f24fb14f11d in /usr/lib/libtorch_cuda.so)\nframe #4: <unknown function> + 0x394f208 (0x7f24fb14f208 in /usr/lib/libtorch_cuda.so)\nframe #5: at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) + 0x90 (0x7f24edd9e5d0 in /usr/lib/libtorch_cpu.so)\nframe #6: <unknown function> + 0x4613be5 (0x7f24f0213be5 in /usr/lib/libtorch_cpu.so)\nframe #7: <unknown function> + 0x461436b (0x7f24f021436b in /usr/lib/libtorch_cpu.so)\nframe #8: at::_ops::mm::call(at::Tensor const&, at::Tensor const&) + 0x172 (0x7f24ede050b2 in /usr/lib/libtorch_cpu.so)\nframe #9: <unknown function> + 0x1714356 (0x7f24ed314356 in /usr/lib/libtorch_cpu.so)\nframe #10: at::native::matmul(at::Tensor const&, at::Tensor const&) + 0x68 (0x7f24ed314928 in /usr/lib/libtorch_cpu.so)\nframe #11: <unknown function> + 0x2742f69 (0x7f24ee342f69 in /usr/lib/libtorch_cpu.so)\nframe #12: at::_ops::matmul::call(at::Tensor const&, at::Tensor const&) + 0x172 (0x7f24edf1d092 in /usr/lib/libtorch_cpu.so)\nframe #13: <unknown function> + 0x2b4f7b (0x56422e92cf7b in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #14: <unknown function> + 0x2c0746 (0x56422e938746 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #15: <unknown function> + 0x279c88 (0x56422e8f1c88 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #16: <unknown function> + 0x27cc4f (0x56422e8f4c4f in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #17: <unknown function> + 0x29f036 (0x56422e917036 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #18: <unknown function> + 0x292a71 (0x56422e90aa71 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #19: <unknown function> + 0x80084 (0x56422e6f8084 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #20: <unknown function> + 0x80d5b (0x56422e6f8d5b in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #21: <unknown function> + 0x81c3d (0x56422e6f9c3d in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #22: <unknown function> + 0x8a7df (0x56422e7027df in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #23: <unknown function> + 0x8de41 (0x56422e705e41 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #24: <unknown function> + 0x91531 (0x56422e709531 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #25: <unknown function> + 0x9ba4b (0x56422e713a4b in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #26: <unknown function> + 0x99b5e (0x56422e711b5e in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #27: <unknown function> + 0x9b1c1 (0x56422e7131c1 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #28: <unknown function> + 0x2fa4ce (0x56422e9724ce in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #29: <unknown function> + 0x9b190 (0x56422e713190 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #30: <unknown function> + 0x9207c (0x56422e70a07c in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\nframe #31: <unknown function> + 0x23290 (0x7f24eba3c290 in /usr/lib/libc.so.6)\nframe #32: __libc_start_main + 0x8a (0x7f24eba3c34a in /usr/lib/libc.so.6)\nframe #33: <unknown function> + 0x69a25 (0x56422e6e1a25 in /home/ryu/tch-rs/target/debug/examples/stable-diffusion)\n")', src/wrappers/tensor_generated.rs:11282:30
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
   2: core::result::unwrap_failed
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/result.rs:1814:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/result.rs:1107:23
   4: tch::wrappers::tensor_generated::<impl tch::wrappers::tensor::Tensor>::matmul
             at /home/ryu/tch-rs/src/wrappers/tensor_generated.rs:11282:9
   5: <tch::nn::linear::Linear as tch::nn::module::Module>::forward
             at /home/ryu/tch-rs/src/nn/linear.rs:52:13
   6: tch::nn::module::<impl tch::wrappers::tensor::Tensor>::apply
             at /home/ryu/tch-rs/src/nn/module.rs:47:9
   7: stable_diffusion::CrossAttention::forward
             at ./main.rs:754:19
   8: stable_diffusion::BasicTransformerBlock::forward
             at ./main.rs:788:18
   9: stable_diffusion::SpatialTransformer::forward
             at ./main.rs:854:18
  10: stable_diffusion::CrossAttnDownBlock2D::forward
             at ./main.rs:1814:18
  11: stable_diffusion::UNet2DConditionModel::forward
             at ./main.rs:2289:21
  12: stable_diffusion::main
             at ./main.rs:2509:26
  13: core::ops::function::FnOnce::call_once
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
@LaurentMazare
Copy link
Owner

Thanks for reporting this, I've just pushed some changes that should hopefully fix some of the issues (though sadly I cannot really test as it runs out of memory on my 8GB GPU when loading the model, not sure why it works on yours, maybe some different cuda version).

@Ryu1845
Copy link
Author

Ryu1845 commented Nov 4, 2022

Thank you for looking at my issue! I'll try out the changes.

@Ryu1845
Copy link
Author

Ryu1845 commented Nov 4, 2022

I am indeed running into "out of memory" issues, but not into the earlier ones! Thank you very much for the quick fix.

@Ryu1845 Ryu1845 closed this as completed Nov 4, 2022
@LaurentMazare
Copy link
Owner

No problem, note that you can also run on the cpu if your gpu does not have enough memory (this is a lot slower though), there are some instructions on the readme.

@LaurentMazare
Copy link
Owner

Just to mention that with a couple tweaks, I was able to run the code on a 8GB card (RTX 2080 though), more details on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants