Skip to content
Martijn Courteaux edited this page Nov 21, 2023 · 19 revisions

Debugging Halide applications is not like debugging C++. This is because there is an extra level of indirection: instead of debugging an implementation, you are debugging the description which generates that implementation.

This means that just firing up a C++ debugger will not always give you the information you need to solve bugs - it is usually too hard to see how the symbols in the generated loop nest correspond back to the Halide code you wrote.

There are several possible types of bugs, with their own typical causes and preferred debugging techniques.

Symptom Typical Root Cause Typical Debugging Techniques
Program runs as expected, but output is incorrect. Algorithm bug, or race-condition allowed in schedule, or buffers not marked as dirty when working with GPU printing, tracing
Program runs with correct output, but the schedule did not have the intended effect. Schedule bug tracing, generated program inspection
Program crashes in the C++ code that defines the algorithm/schedule. C++ bug or Halide error C++ debugging, introspection
Program crashes in the "runner program" that calls the Halide-generated program runtime. C++ bug in runner program Normal debugging techniques for the language in which the runner is written (typically C/C++, see C/C++ debugging)
Program crashes during the runtime of the Halide-generated program. Various causes (see runtime crash) See runtime crash

Note that runner program here refers to the code which calls the kernel implementation after it has been generated by Halide. If using JIT compilation (i.e. realize()), the runner program is part of the same C++ code that defines and schedules the Halide algorithm. In AOT mode, this is a separate program which includes a generated artifact (such as a static library) and calls the function inside.

Debugging Techniques

This is meant to be a complete list of known methods for debugging Halide code. Several of these methods are covered in the tutorials.

Printing

print( Expr, ... ) and print_when( condition, Expr, ... ) can be used to print values of Exprs at runtime. This is useful in particular when trying to find and solve functional bugs in the definition part of Halide code.

Example: suppose you think b is taking on a bad value in the following Func definition:

f(x) = a + b;

You could rewrite this to:

f(x) = a + print(b, " <- b has this value when x is ", x);

This would print messages like 5 <- b has this value when x is 3 at runtime. print returns the first argument, and prints all the arguments as a side-effect. The arguments may be of type Expr, std::string, const char *, int, or float.

You can also conditionally print:

f(x) = a + print_when(b < 0, b, " b was not supposed to be negative! It happened at x = ", x);

This would print messages like -1 b was not supposed to be negative! It happened at x = 5, but only when b is indeed negative. print_when returns the second argument. When the first argument is true, it prints all the other arguments as a side-effect.

The advantage over tracing is that you can print pretty much anything - the downside is that you have to manually inject such print() calls in your Halide code.

Tracing

With tracing, various events that happen during execution can be logged. It can be enabled from inside the Halide code.

Tracing loads and stores

Suppose you want to know all the values of f(x,y) that are computed throughout the program. You can enable store tracing on f as follows:

f.trace_stores();

Upon running the program, tracing messages will appear on the command line. For example:

Store f.0(0, 0) = 5

However, this will only work if f is not scheduled inline, because in that case, the values of f will never be stored anywhere! Also note that the order these messages appear depends on your schedule - values may also be recomputed, in which case they will appear more than once. trace_loads() works in the same way, except the value is printed when it is loaded as opposed to when it is stored.

Tracing realizations

Realization trace messages show more coarse-grained events relating to particular functions:

  • When the application starts and finishes;
  • When functions are realized (buffers allocated for them), and what their sizes are at those points in time;
  • When realizations end (buffers are deallocated);
  • When functions go into their production and consumption phases.

For example, suppose we call g.trace_realizations(); and f.trace_realizations();, we would get messages like:

Begin pipeline h.0()
Begin realization f.0(0, 104)
Produce f.0(0, 104)
End produce f.0(0, 104)
Consume f.0(0, 104)
End consume f.0(0, 104)
End realization f.0(0, 104)
End pipeline h.0()

Global tracing settings

Instead of enabling tracing on particular functions, it is also possible to enable tracing globally for all functions. For that, the target descriptor can be configured.

A typical way to get a default target is by using get_host_target() (e.g. f.realize( get_host_target() );). You can add additional "features" to these targets. The following are the available tracing features:

get_host_target().with_feature( Halide::Target::TraceLoads ); //trace loads on all Funcs
get_host_target().with_feature( Halide::Target::TraceStores ); //trace stores on all Funcs
get_host_target().with_feature( Halide::Target::TraceRealizations ); //trace realizations on all Funcs

Trace dump

It is possible to store tracing information into a binary file instead of writing it to the command line. It won't be readable, but it will take less space. To enable this, run the tracing-enabled program with the environment variable HL_TRACE_FILE set to the output filename you want.

Halide also has a trace dump tool, which will take the traced memory accesses from a trace file and dump each found Func into its own image/data file.

For example:

HL_TRACE_FILE=trace.bin ./run_app
<Path to Halide>/bin/HalideTraceDump -i trace.bin -t png

Would dump any two-dimensional function with memory access tracing turned on into its own PNG file. Example output:

[INFO] Starting parse of binary trace...
[INFO] First pass...
[INFO] Found Func with tracked accesses: f
[INFO] Found Func with tracked accesses: g
[INFO] Found Func with tracked accesses: h
[INFO] Finished pass 1 after 2136 packets.
[INFO] Finished pass 2 after 2136 packets.

Trace stats:
  Funcs:
    f:
      Type: int32
      Dimensions: 1
      Size: 104
      Minimum stored to in each dim: {0}
      Maximum stored to in each dim: {103}
    g:
      Type: int32
      Dimensions: 1
      Size: 100
      Minimum stored to in each dim: {0}
      Maximum stored to in each dim: {99}
    h:
      Type: int32
      Dimensions: 1
      Size: 100
      Minimum stored to in each dim: {0}
      Maximum stored to in each dim: {99}
[INFO] Dumping func 'f' to file: f.png
[INFO] Dumping func 'g' to file: g.png
[INFO] Dumping func 'h' to file: h.png
Done.

As with all tracing, only the functions which are not scheduled inline can be dumped.

There is also the HalideTraceViz tool for animated visualization of the pipeline. For usage information, run it without arguments:

<Path to Halide>/bin/HalideTraceViz

Generated program inspection

Sometimes the best way to understand the implementation resulting from a Halide description is to inspect the generated code itself. This can be done at several levels of abstraction.

Print loop nest

This command is useful for seeing the overall loop nest structure of the generated program, while omitting any details about the computation or memory access taking place. For example, this can be used to check the nesting order of loop variables. f.print_loop_nest() will print this loop nest summary to the command line during Halide generation time.

Pseudo-code generation

You can use Func::compile_to_lowered_stmt to compile a Halide pipeline to human-readable imperative pseudo-code that includes all the effects of scheduling.

#include <Halide.h>

using namespace Halide;

int main(int argc, char **argv) {
    Func f;
    Var x, y;
    f(x, y) = x+y;
    f.vectorize(x, 4).unroll(x, 2).parallel(y);
    f.compile_to_lowered_stmt("f.html", HTML);

    return 0;
}

produces this output.

C/C++ Debugging

Since Halide code is embedded in C++ programs, bugs you encounter may simply be C++ bugs. To debug these, use regular C++ debugging techniques (for example, using a debugger such as gdb).

Introspection

If Halide throws an error based on a bug in your Halide code, you may notice that you don't see any line numbers being reported, and that functions and variables may be reported with non-descriptive names (such as f0 for Funcs and v1 for Vars).

Halide is able to report line numbers and give these objects the same names they were declared with in C++ - but in order for it to work, introspection has to be enabled.

Introspection requires the user program to be compiled with debug information. In other words: your project needs to be configured to build in debug mode. For GCC or CLANG, this means adding the -g flag to your compile command.

Adding debug info to your program will not degrade the performance of the kernel implementations that Halide generates. However, it may slow down your C++ control code which calls the generated kernel, if you are running in JIT mode.

Troubleshooting

Crash during Halide runtime

A Halide-generated kernel may crash with a descriptive error message if the user attempts to use it in an incorrect way (e.g. with a badly dimensioned input or output buffer). However, it is unusual for Halide-generated programs to crash in the middle of computation. If it does happen, there can be several causes:

  • A segmentation fault typically means there was something wrong with the buffer descriptors passed to the Halide program (for example, the allocation of a buffer may be smaller than its descriptor suggests). This would be a bug in the control code calling the Halide program.
  • An arithmetic exception may occur if the algorithm you have described performs a division by zero. Using a debugger won't help much, as it would step through the generated code and not give useful information about the root cause. You could however trace realizations to at least know what section of the program causes the fault, then carefully check all divisions to make sure they will not attempt to divide by zero.
  • In rare cases, the issue could be a compiler bug which caused Halide to generate an incorrect program. In such a case, try to make a minimal example and post it in the GitHub issues list.