Halide 2018/02/15
You probably want halide-linux-64-trunk, halide-mac-64-trunk or halide-win-distro-64-trunk for linux, os x, and windows respectively. For linux, pay attention to the various gcc versions and download the one that matches your compiler version. You may get linker errors if you download the wrong one.
Notable changes include:
- Scheduling:
- New scheduling directive: compute_with
- Codegen:
- Better instruction selection for Hexagon
- Less integer math in cuda kernels
- Support for warp shuffle instructions on cuda
- Support for MSAN in Clang
- X86 Runtime: various AVX2 improvements
- Fixes:
- Buffer now uses halide_device_crop API from within the Buffer class instead of just discarding any device allocation when a Buffer is cropped
- Auto-scheduler: unbounded function bugs
- halide_print() now defaults to output to stdout rather than stderr
- Various fixes to corner cases of Buffer<> with const types
- API:
- Completely rewrote Python bindings using PyBind11 (not yet complete but much more robust and well-supported)
- Removed long-deprecated variants of gpu_tile()
- Added IRMutator2, deprecated IRMutator
- Apps:
- replaced apps/hexagon_matmul with apps/nn_ops, which provides fast implementations of common
deep learning network operations on all platforms that Halide supports
- replaced apps/hexagon_matmul with apps/nn_ops, which provides fast implementations of common
- Generators:
- Revised LoopLevel to allow deferred-evaluation, making it easier to compose separate pieces of Halide code (e.g. when the compute_at or store_at may not be known yet)
- remove Generator::ScheduleParam entirely; added support for GeneratorParam instead
- Simplified Stubs to no longer be stateful, but just a single "generate" method
- Build:
- All prebuilt libHalide versions (both static and dynamic) are now built with RTTI enabled (previously they were built with RTTI disabled)
- Much better CMake support, including 'make distrib', 'make install', and better test targets
- Drop support for LLVM 3.9