Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tract model in half float #211

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Run tract model in half float #211

wants to merge 2 commits into from

Conversation

Rikorose
Copy link
Owner

@Rikorose Rikorose commented Dec 13, 2022

running

cargo run -p deep_filter --features=tract,bin,wav-utils,transforms --bin deep-filter -- --half-floats assets/noisy_snr0.wav -o out -vv -D

results in the following stack trace:

   Compiling deep_filter v0.4.1-pre (/home/hendrik/projects/DeepFilterNet/libDF)
    Finished dev [optimized + debuginfo] target(s) in 20.20s
     Running `target/debug/deep-filter --half-floats assets/noisy_snr0.wav -o out -vv -D`
[2022-12-13T10:09:59Z INFO  df::tract] Init encoder with delay: 0
[2022-12-13T10:09:59Z INFO  tract_linalg::x86_64_fma] mmm_f32, sigmoid_f32, tanh_f32: x86_64/fma activated
[2022-12-13T10:09:59Z INFO  tract_linalg::x86_64_fma] mmm_i8_i8 and mmm_i8_i32: x86_64/avx2 activated
[2022-12-13T10:09:59Z INFO  df::tract] Init ERB decoder with delay: 0
[2022-12-13T10:09:59Z INFO  df::tract] Init DF decoder with delay: 0
[2022-12-13T10:09:59Z INFO  df::tract] Running with model type deepfilternet3 lookahead 2
Error: Evaluating #3 "/erb_conv0/1/Conv.pad" Pad

Caused by:
    Tensor datum type error: tensor is F32, accessed as F16

Stack backtrace:
   0: tract_data::tensor::Tensor::check_for_access
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-data-0.18.5/src/tensor.rs:748:13
   1: tract_data::tensor::Tensor::to_scalar
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-data-0.18.5/src/tensor.rs:848:9
   2: tract_core::ops::array::pad::Pad::eval_t
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/ops/array/pad.rs:34:38
      <tract_core::ops::array::pad::Pad as tract_core::ops::EvalOp>::eval
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/ops/array/pad.rs:109:18
   3: tract_core::plan::eval
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:491:17
   4: core::ops::function::FnMut::call_mut
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:164:5
      tract_core::plan::SimpleState<F,O,M,P>::exec_plan_with_eval
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:264:26
   5: tract_core::plan::SimpleState<F,O,M,P>::run_plan_with_eval
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:199:9
      tract_core::plan::SimpleState<F,O,M,P>::run
             at /home/hendrik/.cargo/registry/src/github.com-1ecc6299db9ec823/tract-core-0.18.5/src/plan.rs:177:9
   6: df::tract::DfTract::process
             at ./libDF/src/tract.rs:401:17
   7: deep_filter::main
             at ./libDF/src/bin/enhance_wav.rs:139:13
   8: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/sys_common/backtrace.rs:122:18
   9: std::rt::lang_start::{{closure}}
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:166:18
  10: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:283:13
      std::panicking::try::do_call
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492:40
      std::panicking::try
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:456:19
      std::panic::catch_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panic.rs:137:14
      std::rt::lang_start_internal::{{closure}}
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:148:48
      std::panicking::try::do_call
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492:40
      std::panicking::try
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:456:19
      std::panic::catch_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panic.rs:137:14
      std::rt::lang_start_internal
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:148:20
  11: main
  12: __libc_start_call_main
  13: __libc_start_main_alias_2
  14: _start

@VariantXYZ
Copy link

Hi @Rikorose ,

I went ahead and tried out your branch out of curiosity on my a55 (on a rockchip sbc which is the only fp16 compatible CPU I have) after seeing the comment in the tract repo, and my guess wasn’t too far off. A large chunk of time (20% according to gprof) was spent in binary16::convert::f32_to_f16_fallback.

f16_to_f32 was lower at 7%, but still a sizeable chunk.

@VariantXYZ
Copy link

VariantXYZ commented Dec 15, 2022

Some additional information (maybe relevant to tract):

The inner_loop generic in core/src/ops/cnn/conv/depth_wise.rs is doing a back and forth conversion like every single line.

Some other ops may be like this too, but this seems to be the main offender.

Edit: tried adding the half crate and specifying std and use-intrinsics manually but it still seems to be doing the conversions… I must be missing something here to force it to use fp16 on a55.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants