Releases: NVIDIA/DALI
DALI v1.10.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- New operators:
- Color-based augmentations were extended to support video data (#3580).
- Improved performance of the
slice
operator (#3584, #3573, and #3568). - Added an experimental debug (immediate execution) mode (#3586 and #3531).
Fixed Issues
No major issues were fixed in this release.
Improvements
- Adds video support to color based augmentations (#3580)
- Fixed cmake error (#3601)
- Fix debug build failures in benchmark code (#3585)
- Make sanitizers tests fail when it encounters the first issue (#3583)
- Use proper attribute filters for nosetests (#3592)
- Fix wrong parameter name in Laplacian docs (#3593)
- QA script fix: Add an empty negative branch to a conditional to prevent automatic error (#3588)
- Small refactoring in Slice GPU kernel (#3584)
- GetProperty operator CPU+GPU (#3572)
- Add comments about scale argument (#3581)
- Fix coverity issues (#3579)
- Check when using ES source and feed_input (#3574)
- Prototype of the debug mode (#3531)
- Enable tests for dynamically loaded cuda libraries (#3540)
- Add Laplacian operator [CPU] (#3563)
- Add CUDAStreamPool & CUDAStreamLease. (#3569)
- Coalesce stores in Slice for smaller output types (#3568)
- Turn off OpticalFlow test on aarch64 platform for driver r495.x and newer (#3566)
Bug Fixes
- Fixing typos in WDS's source_info (#3602)
- Fix handling of scalar argument in slice operator (#3596)
- Use the same device for debug mode test and baseline (#3594)
- Fix JPEG distortion GPU quality argument handling for sequences (#3590)
- Use current device in _as_gpu (#3586)
- Fix
version_ge: command not found
error in TL0_python-self-test-base-cuda (#3582) - Disable coalescing values in Slice for CUDA 10 (#3573)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.10.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.10.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.10.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.10.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.10.0-3728184-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.10.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.10.0-3728186-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.10.0-3728186-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.10.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.9.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Extended the
jpeg_compression_distortion
operator to support video inputs (#3482 and #3447). - Added the
file_filter
argument to thereaders.file
operator that allows you to filter files by names (#3459). - Extended the
slice
operator to support per-sampleaxes
arguments and negative axis indexing (#3516). - Extended the
pad
operator to support per-sampleaxes
,fill_value
arguments, and negative axis indexing (#3534). - Improved the performance of the
slice
operator for small batch sizes (#3557). - Added the Laplacian CPU kernel (#3565, #3535, and #3518).
Fixed Issues
This DALI release includes the following fixes:
- Fixed a race condition that randomly caused incorrect outputs in the TensorFlow plugin (#3547).
- Fixed synchronization issues in the PaddlePaddle plugin that may have caused incorrect results (#3498 and #3487).
Improvements
- Make Slice kernel tiling adaptive (#3557)
- Add Laplacian CPU kernel (#3518)
- Allows DALI to dlopen dependent CUDA toolkit libraries: NPP, cuFFT and nvJPEG (#3519)
- Fix test code to be compatible with python 3.6 (#3550)
- Fix a typo in warp jupyter notebook. (#3554)
- Add Cast and CoinFlip GPU benchmarks (#3541)
- Fix DALI TL3 test for 21.11 (#3529)
- Pad operator: Add support for per-sample axes and fill_value arguments, and negative axes (#3534)
- Add FlipGPU and GaussianBlurGPU benchmarks (#3538)
- Make bundle-wheel.sh more configurable (#3539)
- Enable DALI test on python 3.9 and add 3.10 support (#3522)
- Add transform parameter to convolution cpu (#3535)
- Remove nvJPEG leak sanitizer workaround in tests (#3532)
- Dependency update Nov 2021 (#3523)
- Add support for per-sample axes and negative axes in Slice (#3516)
- Refactor ArgValue to support empty samples and batch shape expectations (#3528)
- Move to CUDA 11.5 update 1 (#3526)
- Add Copy GPU benchmark (#3517)
- Move to CUDA_CALL for nvJPEG, nvJPEG2k, and NPP (#3521)
- Silence warning in LookupTable (#3508)
- Move unfold_outer_dim to common utilities. (#3486)
- Remove Context from memory resources. (#3485)
- Set minimum python version to 3.7 for TF 2.7 (#3489)
- Allow video inputs to JpegCompressionDistortion (#3482)
- Bump up TensorFlow version to 2.7 in tests (#3475)
- Change the way how NVML wrapper is linked internally (#3481)
- Add support for file_filters in FileReader (#3459)
- Allow video inputs to JpegCompressionDistortion (#3447)
- Move to Ubuntu 20.04 for cuda 10.2 toolkit image (#3477)
- Move to Ubuntu 20.04 for cuda toolkit image (#3476)
- Pin Keras version for TensorFlow 2.6 (#3474)
- Add support for BatchInfo in experimental TF DALI Dataset (#3468)
Bug Fixes
- Replace equality with EqualEpsRel in Laplacian kernel tests (#3565)
- Synchronize CUDA stream once in operator benchmark (#3525)
- Ensure that num_devices and device are stored in correct order. (#3560)
- Fix conda test for CUDA 10.x (#3556)
- Fix race condition when initializing per-device default memory resources (#3555)
- Fix data race when copying outputs in TF plugin (#3547)
- CUDA VM resource bugfixes (#3545)
- Fix build of DALI TensorFlow plugin during installation (#3546)
- Fix issues found during static analysis (#3524)
- Fix lack of proper device id used to obtain relevant cuda stream in paddle plugin (#3498)
- Add type check to last_batch_policy argument (#3490)
- Fix DALI paddle plugin stream synchronization error (#3487)
- Reuse GaussianBlur windows between iterations (#3484)
- Add synchronization when destroying the Executor. Make all destructors noexcept. (#3492)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.9.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.9.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.9.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.9.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.9.0-3647996-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.9.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.9.0-3647997-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.9.0-3647997-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.9.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.8.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added batch mode support to
external_source
operator with parallel callback. (#3420 and #3397) - Extended
crop_mirror_normalize
operator to support per-sample normalization parameters. (#3455) - Improved error messages when trying to decode images with unsupported format. (#3445)
- Documentation improvements. (#3448 and #3439)
Fixed Issues
This DALI release includes the following fixes:
- Fixed unsound interpretation of the aspect ratio parameter in the
random_bbox_crop
operator, when input shape is provided. (#3425) - Fixed incorrect output shape in the
experimental.readers.video
operator. (#3460)
Improvements
- Remove reseeding of numpy in RandomlyShapedDataIterator (#3466)
- Add indexing information to TF external source tests (#3467)
- Extend setup_packages.py to bing package with its dependencies (#3464)
- Update dependency versions (#3457)
- Optionally load plugins global symbols. (#3462)
- Add NVIDIA Video Codec SDK - NVDECODE API (#3458)
- CropMirrorNormalize: Add support for per-sample normalization arguments (#3455)
- Support batch mode in parallel external source (#3397)
- Turn off part of TL0_FW_iterators tests when sanitizers are enabled (#3456)
- Read ArgValue constant arguments only once (#3453)
- Rename InputRef/OutputRef to Input/Output in workspace API (#3451)
- Reduce number of Workspace Input/Output APIs (#3446)
- Fix error reporting in image factory (#3445)
- Update custom op example for newer CMake (#3448)
- Update TF dataset to 2.8 (#3442)
- Fix documentation of CropMirrorNormalize dtype argument (#3439)
- Bump up nvJPEG2k version to 0.4 (#3440)
- Enable CUDA 11.5 builds (#3436)
- Enable sanitizers in regular CI runs (#3422)
- Improve the way how available python version is available (#3438)
- RandomBBoxCrop: Fix interpretation of aspect ratio, when input shape is provided (#3425)
- Change the
permute
function to infer the output size from the indices. (#3434) - Move to the upstream deb packages for JetPack compilation (#3432)
- Change C++ standard to c++17 for non-CUDA sources (#3423)
- Add epoch number to SampleInfo and introduce BatchInfo (#3420)
- Separate type setting from data access in Buffer (#3414)
- Make SBSA build compatible with all armv8-a CPUs (#3417)
- Update TF plugin for future API change (#3415)
- Replace pointers with references for ShareData parameter (#3408)
- Code cleanup: remove unused variables, fix buffer overflow (#3410)
- Enable usage of sanitizers in tests (#3377)
Bug Fixes
- Update tensorflow version in conda build (#3471)
- Fix STRING_VEC default arguments presentation in docs (#3470)
- Remove broken class method from DALI Dataset (#3465)
- Fix experimental.readers.video output shape (#3460)
- Fix static analysis detected issues (#3444)
- Silence output from build_per_python_lib cmake utility (#3454)
- Make Workspace::Input return const reference (#3452)
- Update imports from collections to collections.abc where needed (#3429)
- Install boost/preprocessor headers (#3443)
- Fix ShareData for TensorVector with no elements (#3435)
- Update GCC version in conda recipe to 7.5 to workaround GCC bug 82461. (#3431)
- Add a missing state destruction for the NVJPEG HW decoder (#3416)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.8.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.8.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.8.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.8.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.8.0-3362432-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.8.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.8.0-3362434-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.8.0-3362434-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.8.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.7.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- New operators:
- Performance improvements:
- Added the
DALI_DISABLE_NVML
andDALI_RESTRICT_PINNED_MEM
environment variables. These variables allow you to limit the use of NVML and pinned memory and enable DALI on more platforms (#3404 and #3382).
Fixed Issues
This DALI release includes the following fixes:
- Fixed an issue in the
pad
operator that caused a crash when the operator was used with a variable batch size (#3354). - Fixed a race condition that occurred in the
readers.video
operator (#3355). - Fixed a bug in the C API that caused invalid memory access in some use cases (#3350).
Improvements
- Add more logging to FramesDecoder (#3412)
- Reduce the TensorList and TensorVector API scope (#3403)
- Add an env variable DALI_DISABLE_NVML to disable NVML usage on demand (#3404)
- Enable BUILD_LDMB by default (#3406)
- Add error message checking into existing python tests (#3401)
- Bump up Nvidia TensorFlow version in tests to 21.09 (#3383)
- Add VideoReaderDecoder (#3391)
- Webdataset automatic index file inference (#3385)
- Add an environment variable that determines whether pinned memory usage should be restricted. (#3382)
- Notebook with an example of webdataset usage (#3372)
- Add frames decoder (#3362)
- Move to libtar fork - https://github.com/tklauser/libtar (#3375)
- Remove possibility of access to contiguous TL buffer (#3373)
- Add error message checks (#3371)
- Update libcudacxx to include fix for build with ASAN. (#3374)
- Specialize warp kernels for common numbers of channels. (#3370)
- Webdataset performance and cosmetic optimizations (#3360)
- Update documentation about enabling sanitizers (#3365)
- general perf changes alongside WDS perf (#3363)
- Update CUTLASS and Google Benchmark (#3361)
- Remove access to contiguous TL buffer from Coco Reader tests (#3351)
- Remove access to contiguous TL buffer from BoxEncoder, Resize, Shapes and Warp (#3339)
- Bump clang version to 12.0.1 in deps image (#3342)
- Use DALIDataType where possible. (#3338)
- Update asserts in python tests (#3336)
- Webdataset reader operator implementation (#3306)
- Work around PyTorch internal fragmentation in L3 SSD test. (#3343)
- Make view converters operate on samples only (#3325)
- Add an ability to avoid class remapping in coco reader (#3333)
- Remove access to underlying contiguous TL buffer from tests (#3319)
Bug Fixes
- Fix the Webdataset documentation formatting (#3395)
- Fix documentation formating (#3369)
- Fix sharding and shuffling in VideoLoaderDecoder (#3411)
- Fix pool process tracking in parallel ES tests, cleanup batches properly (#3400)
- Fix ownership issues in Share APIs for Tensor, TL and TV (#3407)
- Fix memory leak in async_pool destructor. (#3402)
- Fix off build (#3399)
- Fix HW decoder overwriting growth factor for CPU buffers (#3398)
- Fix libtiff build (#3392)
- Fix the memory kind stored in AllocInfo in nvjpeg memory. (#3393)
- Fix bug in TensorList test (#3388)
- Adjust default eps in video test (#3389)
- Fix FFMPEG conda build (#3386)
- Fix errors in TF YOLO example (#3379)
- Adjust growth and shrink threshold for cpu buffers (#3378)
- Fix error reporting in TL3_EfficientDet_convergence and TL3_YOLO_convergence (#3376)
- Fix problems detected by asan and lsan (#3367)
- Fix Coverity issues (#3366)
- Fix EfficientDet docs link (#3364)
- Fix Video reader race condition (#3355)
- Fix variable batch size handling in pad operator (#3354)
- Fix bugs in C API and refactor tests (#3350)
- Fix and optimize name handling in TypeInfo. (#3349)
- Fix sequence rearrange python test (#3353)
- Handle SIGV situation when trying to load prebuild DALI TF Plugin (#3347)
- Fix DeviceBuffer copy - use proper copy function. (#3344)
- Skip Keras TF tests in versions with broken execption handling (#3341)
- Fix squeeze operator test on Python3.7 and earlier (#3337)
- Use memory resources in DeviceBuffer and TestTensorList. (#3334)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.7.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.7.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.7.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.7.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.7.0-3161365-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.7.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.7.0-3161358-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.7.0-3161358-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.7.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.6.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Added support for lambdas and local functions as callback in parallel
external_source
operator (#3270, #3269). - Added the following tutorials:
- Added DALI preprocessing to the
EfficientDet
example (#3118).
Fixed issues
This DALI release includes the following fixes:
- Fixed a crash that happened in the
gaussian_blur
operator for inputs where one of the dimensions equals 1 (#3291). - Fixed random Python crashes on the process teardown when the
external_source
operator was used (#3245). - Fixed
readers.video
hanging on some HEVC samples (#3247).
Improvements
- Add error message checking in python tests (#3324)
- Optimize bundling wheel by using multiprocessing in build_helper.sh (#3323)
- Changed "accross" to "across" in README.rst (#3329)
- Move to CUDA 11.4 update 2 (#3322)
- Fix FFmpeg vulnerabilities (CVE-2020-22037, CVE-2021-38171, CVE-2021-38291) (#3315)
- Rework diplacement filter to sample-based approach (#3311)
- Remove kernels/alloc.h (#3309)
- Adjust usage of rasies and assert_raises in tests (#3318)
- Move static UserStream variable to the Get function inside the class (#3242)
- Adjust usage of raise and assert_raises (#3316)
- Update README with third parties dependencies (#3320)
- Add input type validation to feed_ndarray in MXNet and PyTorch (#3308)
- Add parameters checks when deserializing a pipeline (#3253)
- Extend BlockSetup with 1-dim specialization (#3304)
- Move back to upstream libtar from conda (#3301)
- Rework LUT to batch processing and remove access to TL buffer (#3298)
- Add checking a message of the expected exception against a pattern in nose tests (#3302)
- Use libcu++ interfaces. (#3297)
- Update third party dependencies (#3300)
- Pin nvJPEG2000 and GPU Direct dependencies (#3299)
- Bump up nvidia tensorflow version to 21.08 in tests (#3296)
- Implement InputDatasets for DALIDataset (#3292)
- Remove access to underlying contiguous TL buffer in bb_flip op (#3283)
- Make memory kind a tag type instead of an enum value. (#3290)
- Add examples on serialization to parallel external source notebook (#3270)
- Support lambdas and local functions as callbacks in parallel ExternalSource (#3269)
- TarArchive::TellArchie implementation + renaming (#3286)
- Remove access to underlying contiguous TL buffer in Flip op (#3280)
- Remove access to underlying contiguous TL buffer in Normalize op (#3281)
- Use default resources for allocating tensors (#2948)
- Remove access to underlying contiguous TL buffer in Constant op (#3276)
- TarArchive additional features (#3273)
- Add ScatterGatherCPU and rework Copy op to batch processing (#3266)
- Change the way how start and end timestamps are converted to frames (#3252)
- Update RMM to an up-to-date & version with interface rework applied. (#3254)
- Test fused decoder out-of-bounds error (#3175)
- Bump supported tested TensorFlow versions (#3250)
- Update supported CUDA version in docker/build.sh (#3248)
- Adjust capitalization in tutorials (#3246)
- Remove not applicable aclaratory note from PyTorch and Paddle iterators (#3235)
- Add tutorial about TF DALI Dataset input handling (#3212)
- Add tutorials for Parallel External Source (#3199)
- Add DALI to EfficientDet example (#3118)
- Use fn.random module in tests and examples (#3174)
Bug Fixes
- Improve tests for expected errors + fix PythonFunction (#3332)
- Fix incorrect use of a global variable in the test of operator Shapes. (#3310)
- Rework Cast to batch processing (#3278)
- Fix HEVC video handling (#3247)
- Fix infinite loop for convolution with extent equal 1 (#3291)
- Add yaml as a Webdataset test dependency, adjust to new WDS API (#3295)
- Fix missing condition variable include (#3289)
- Remove the inclusion of scatter_gather.h from types.h (#3288)
- Fix cast warning in ScatterGather (#3284)
- Clear to_dealloc and notify under a lock. (#3282)
- Fix notification method in deferred deallocation. (#3279)
- Fix race condition when initializing plain host memory resource. (#3268)
- Fix alignment constraints in CUDA VM resource. (#3274)
- Fix missing sizeof in Tensor Test (#3267)
- Fix hw decoder tests disabled on old drivers (#3257)
- Don't increase alignment to upstream alignment when retrying to allocate (#3264)
- Avoid creating primary context for synchronization. (#3263)
- Avoid upstream allocation stampede by retrying to allocate from free after gaining the upstream lock. (#3258)
- Remove excessive synchronization in AsyncPool. (#3256)
- Ensure keeping py_pool alive until pipline is garbage collected (#3245)
- Fix running Python core tests (#3249)
- Fix an assigment of py::none() to py::dict in backend_impl.cc (#3244)
- Fix interoperation between DALI and PyTorch lightning due to buffering (#3239)
- Reduce number of iterations in L0 tests (#3173)
- Fix memory leak in backend_impl.cc caused by PyObject_GetAttr (#3233)
- Fix FFmpeg CVE-2021-38114 (#3231)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.6.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.6.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.6.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.6.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.6.0-2993095-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.6.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.6.0-2993096-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.6.0-2993096-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.6.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.5.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Extended
decoders.image
to support WebP decoding (#3206) - Added indexing (NumPy-like) API for tensor slicing (#3200 and #3195)
- Extended
external_source
to supportsource
argument in TensorFlow DALI Dataset (#3215, #3193, #3177 and #3176) - Added examples:
Fixed issues
This DALI release includes the following fixes:
- Fixed include paths that prevented including some parts of DALI in other C/C++ projects (#3210)
- Fixed a crash when only anchors and no shapes were provided in
multi_paste
(#3166) - In the
spectrogram
operator, extracted windows are now correctly centered before FFT calculation, when thenfft
argument is bigger than length of the window. (#3180) - Fixed a minor memory leak in
decoders.image
(#3148)
Improvements
- Add documentation for indexing. (#3200)
- Move to CUDA 11.4U1 (#3213)
- Add WebP support to image decoder (#3206)
- libtar API implementation (#3198)
- Tensor indexing (#3195)
- Make TF graph-mode tests faster (#3204)
- Add support for ES
source
in TF DALI Dataset (#3177) - Add tensorflow YOLOv4 example (#2883)
- Refactor Python External Source code (#3176)
- Update third party dependencies to latest release versions (#3184)
- Add deferred deallocation to
cuda_vm_resource
. (#3154) - Adjust test scripts and section header for webadataset notebook (#3162)
- Add Webdataset-ExternalSource Jupyter notebook (#3153)
- Update PR template (#3150)
- Update PR template (#3129)
Bug Fixes
- Fix failing TarArchive tests (#3226)
- Build custom libtar in conda (#3223)
- Improve validation in DALIDataset (#3215)
- Update DALI_DEPS_VERSIOn to include NVIDIA/DALI_deps#19 (#3224)
- Fix identity check in _is_generator_function which. Add test. (#3216)
- Fix unused imports in test_utils.py (#3214)
- Remove the usage of ManagedMemory from the OpticalFlow tests (#3211)
- Suppress test using unified memory when it is not supported (#3209)
- Remove include prefix from include paths (#3210)
- Fix CVE-2021-3246 in libsnd (#3208)
- Fix pytorch-lighting test (#3196)
- Fix coverity issues + skip tests involving managed memory when not supported. (#3190)
- Disable NVJPEG HW decoder for driver < 455 due to performance reason (#3189)
- Fix compilation with newer GCC (#3188)
- Disallow some types of sources for parallel ES explicitly (#3193)
- Center windows when extracting windows to a bigger output window (#3180)
- Add a compute cap value before running the GDS test (#3185)
- MultiPaste to adjust the region shape to cover up to the end of the input shape (#3166)
- Fix wording in docs (#3165)
- Fix image decode (#3148)
- Fix LastBatchPolicy doc and update Parallel ES wording (#3152)
- Fix some errors (#3147)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.5.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.5.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.5.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.5.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.5.0-2725759-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.5.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.5.0-2725760-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.5.0-2725760-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.5.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.4.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
readers.numpy
improvements:- DALI Dataset improvements:
- Video reader improvements:
- Added CPU parallelization to the
Slice
andSliceFlipNormalizePermutePad
kernels. (#3062, #3068, and #3080) - Added an option to
readers.nemo_asr
to return indices of the entries in the manifest (#3085). - Improved the performance in the GPU image decoder by optimizing the memory allocations. (#3067).
Fixed issues
This DALI release includes the following fixes:
- Fixed a crash that happened when a
functools.partial
result was passed as asource
toexternal_source
(#3143). - Fixed the hardware image decoder to fall back to the hybrid implementation for unsupported file formats instead of throwing an error (#3086).
Improvements
- Add NumpyReader tutorial to the rendered documentation page (#3139)
- Update docs analytics tracking (#3135)
- VM async_pool - refactoring & tests (#3117)
- Extend the video loader error message for vfr videos on how to disable the check in case of false positives (#3125)
- Integer literal suffixes (#3122)
- SliceCPU kernel to run plain memcpy when applicable (#3110)
- CUDA VM memory resource (#3114)
- Add Numpy Reader Tutorial (#3095)
- Bump TensorFlow version in tests (#3107)
- Efficient det code drop (#3115)
- Move to CUDA 11.4 build (#3109)
- Add batch support to DALI Dataset (#3089)
- Update third party dependencies (#3093)
- Add bitmask::append. (#3101)
- Free list API cleanup. (#3100)
- NemoAsrReader to optionally return indices of the entries in the manifest. (#3085)
- Paralellize reading in NumpyReader CPU (#3077)
- Bit mask utility (#3083)
- Add ExecutionEngine to SliceFlipNormalizePermutePad CPU kernel, to allow parallel execution (#3080)
- Add an ability to pad missing frames in the Video reader sequence (#3002)
- Rework the TF DALIDataset input API (#3063)
- Add ExecutionEngine to Slice CPU kernel, to allow parallel execution (#3068)
- Use HW NVJPEG decoder memory pool even if size hint is not set (#3067)
- CUDA Virtual Memory API wrappers. (#3064)
- Add information about installing CUDA 10.2 DALI version (#3066)
- Add image decoder memory hints for nvJPEG in DALI examples (#3029)
- Add split shape utility (#3062)
- Add ROI support to NumpyReader GPU (#3034)
- Enable no_copy mode handling in TF DALI Dataset (#3058)
- Add support for VP8 and MJPEG videos (#3045)
- Make pytorch lightning example work with multiple GPUs (#3037)
- Add override flags for no_copy option of External Source (#3041)
- Add NumpyFileWrapper to numpy loader (#3054)
- Add a mention of CPU-only arguments inputs in docs (#3039)
- Minor changes in Slice GPU kernels, before reusing them in NumpyReader GPU (#3040)
Bug fixes
- Fix hint handling: (#3145)
- Add support for functools.partial in ExternalSource. (#3143)
- Install libcufile (for GDS) as a part of the cuda base build step (#3142)
- Add check of strerror_r return value in CUFile HandleIOError (#3141)
- Disable VMAsyncPool CrossStream test on incompatible platforms. (#3140)
- Fix the lack of execution of variable batch size test (#3134)
- Throw std::bad_alloc when ordinary host memory runs out + tests for xxx_malloc resources. (#3131)
- Fix allocation hint handling in CUDA VM resource (#3128)
- Revert change from python to Python_EXECUTABLE (#3126)
- Coverity issue fixes - bulk drop, July 2021 (#3124)
- Make nvJPEG detect corrupted stream before offloading to HW decoder (#3113)
- Add
--no-index
option to TL1_tensorflow-dali_test test (#3112) - Minor fixes (#3119)
- DALI TF install tool: Copy files for import check, rather than symlink (#3116)
- minor fixes (#3108)
- Dali TF installation: check import before completing the installation (#3104)
- Remove no longer applicable sed command from RN50 MXNet test (#3103)
- Use DALI_extra instead of example_audio_file in the spectrogram example (#3106)
- Unify apt-get invocations (#3094)
- Make DALI extra download optional in tests (#3102)
- Remove pre CUDA 10.0 support in TL1_tensorflow-dali_test (#3099)
- Bug fixes (#3096)
- MMUtilFixes: (#3098)
- Fix override no copy flags for External Source C API (#3097)
- Fix HW decoder fallback to the hybrid decoder (#3086)
- Fix DALI installation for python 3.9 version (#3092)
- Fix python test on aarch64 platform (#3091)
- Move pycocotools to regular pip packages in SSD test (#3090)
- Use PEP 503 compatible extra url index to install PyTorch (#3079)
- Remove compiler name subdirectory in prebuilt DALI TF prebuilt directory (#3078)
- Disable MNIST dataset download for DALI pipelines (#3075)
- Fix known FFmpeg n4.4 vulnerabilities (#3071)
- Fix DALI TF Plugin build in TF 2.6 (#3074)
- Fix error handling in Executor (#3069)
- Fix typo inout -> input (#3070)
- Fix error message when creating a TensorShape from iterators with more elements than expected (#3060)
- Add warning about not using external_inputs in proto (#3057)
- Fix usage of removed _ExternalSource in test (#3059)
- Make the Python test utilities have local random state (#3055)
- Fix batch size handling in PermuteBatch. (#3026)
- Update FFmpeg to address CVE-2021-33815 (#3053)
- Remove duplicated ExternalSource implementation (#3033)
- Build the latest clang from source (#3025)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Note: Starting from version 1.4.0, DALI will be providing CUDA 10.2 builds instead of CUDA 10.0
Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.4.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.4.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.4.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.4.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.4.0-2575284-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.4.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.4.0-2575285-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.4.0-2575285-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.4.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.3.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- New operator:
- Added experimental support for inputs via
external_source
in TensorFlow DALIDataset (#2949, #2993, and #2997). - Numpy reader improvements:
- Improved CPU
color_space_conversion
operator performance (#2987). - Improved
brightness
andcontrast
operators performance (#2981). - Added a C API call to check backend of an operator (#3031 and #3050).
- Documentation improvements (#2936, #2960, #2979, #2972, #3013, and #3035).
Fixed issues
This DALI release includes the following fixes:
- Fixed an issue in
readers.nemo_asr
that caused a system error due to keeping too many open files (#3003). - Fixed a bug that caused out of bound memory access in
mel_filter_bank
(#2986). - Fixed a
cudaErrorLaunchOutOfResources
error that appeared in transpose operator on some GPUs (#2971). - Fixed handling of non-existing entries in
readers.tfrecord
(#2952).
Improvements
- Rework numpy reader tests (#3036)
- Extend HW decoder bench tool (#3043)
- Remove space from file name (#3038)
- Add experimental input support to TF DALIDataset (#2997)
- Use BrightnessContrast as implementation of Brightness and Contrast ops (#2981)
- Add C API call to check backend of an operator (#3031)
- Fix Video reader documentation (#3035)
- Enable DALI to build for CUDA 10.2 (#3007)
- NumpyReader: Add support for ROI (#3016)
- Add git hooks (#3023)
- Update third party (#3009)
- Add channel count checking in Dump Image (#3020)
- Add parallel chunking support in GPU variant of the numpy reader operator (#3010)
- NumpyReader to use HostWorkspace (#3011)
- Update documentation of random.uniform to reflect data type conversion behavior (#3013)
- Adjust tf code for experimental Dataset with inputs (#2993)
- Add best-fit free tree. (#2996)
- Refine torch audio pipeline tests: adding frame splicing, fix sequence length calculation, reflect pad start/end of the signal (#2992)
- Rename free_tree to coalescing_free_tree. (#2995)
- Use thread_pool in ColorSpaceConversion (#2987)
- Move to CUDA 11.3 update 1 (#2990)
- pool_resource: upstream lock & refactoring (#2988)
- Add tests to cover OGG Vorbis, and FLAC audio formats (#2980)
- Add synchronization and deferred deallocation to pool_resource (#2983)
- Update FFmpeg, fix video container tests (#2918)
- Add Preemphasis border policy (#2984)
- Numba function operator, docs update (#2972)
- Add a link to the DALI roadmap in the main readme and the documentation (#2979)
- Add BOOL_SWITCH (#2974)
- Add libopus to the binaries distributed with the wheel (#2969)
- Add SaltAndPepper GPU operator (#2956)
- Update documenation about supported TensorFlow versions by DALI (#2960)
- Guard changes to default resources with a mutex. (#2955)
- Add Salt and Pepper noise CPU operator (#2889)
- Core allocation functions - improve alignment handling (#2947)
- Add portable FP16 type & tests. (#2941)
- RNGBase: Separate noise generation and application steps (#2934)
- Add information about Open-CE effort that provides DALI (#2936)
Bug fixes
- Remove mixed image decoder from GetBackendTest (#3050)
- Fix pip download folder usage (#3028)
- Avoid pre-commit hook for merge commits (#3032)
- Coverity issue fixes. (#3021)
- Add more connection attempts in setup_packages.py and increase the timeout to 100s (#3024)
- Add 60s timeout for URL request in setup_packages.py (#3018)
- Check CUDA API return values in device-side test helper. (#3017)
- Run baseline pipelines on separate devices (#3012)
- Multi paste refactor & fix (#3008)
- Remove outdated warning about not supported ROI HW decoding (#2998)
- NemoAsrLoader: Close file handles after reading metadata (#3003)
- Improve Element Extract Op (#3004)
- Temporarily disable test due to incompatible free list. (#3001)
- Work around large alignas bug - align manually. (#3000)
- Lifts the sm limitation that is tested in the numpy reader test (#2994)
- MultiPaste: Fix in_ids argument type in the schema (#2965)
- Fix a buffer overrun when the trailing dimension is collapsed. (#2986)
- Add missing #include (#2985)
- Enable SaltAndPepper GPU variable batch size tests (#2976)
- Add missing tests to test_dali_variable_batch_size.py (#2982)
- Change all reference to the master branch in the documentation (#2977)
- Add missing tests to test_dali_cpu_only.py (#2964)
- Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOutOfResources (#2971)
- Fix deps docker with custom DALI_deps SHA (#2970)
- Add coverage test for CPU only and variable batch size test (#2962)
- Enable variable batch size tests (#2957)
- Fix returning memory to upstream from pool resource #2961
- Fix handling of non_existing entries in TFRecord reader (#2952)
- Enable pool to return memory to the upstream upon Out-of-Memory. (#2951)
- Fix mixed indent in tf.py (#2949)
- Fix bug in default constructed curand_uniform_dist (#2946)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.3.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.3.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.3.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.3.0
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-1.3.0-2471498-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda100/nvidia-dali-tf-plugin-cuda100-1.3.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.3.0-2471497-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.3.0-2471497-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.3.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.2.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- New operators:
- New mathematical operations (#2853):
- Square and cubic root (
sqrt
,rsqrt
, andcbrt
) - Logarithms of different bases (
log2
andlog10
) - Power (** operator and
pow
function) - Absolute value (
abs
andfabs
) - Roundings (
ceil
andfloor
) - Trigonometric functions (
sin
,cos
, andtan
) - Inverse trigonometric functions (
asin
,acos
,atan
, andatan2
) - Hyperbolic functions (
sinh
,cosh
, andtanh
) - Inverse hyperbolic functions (
asinh
,acosh
, andatanh
)
- Square and cubic root (
- Added a Python wrapper for the
fn.experimental.numba_function
(#2886, #2835, #2903, #2893, and #2887) - Image decoder improvements:
- Updated the CUDA version to 11.3 (#2870).
- Improved the documentation (#2915, #2911, #2927, #2862, and #2858).
Fixed issues
This DALI release includes the following fixes:
- Fixed the
readers.numpy
cache issue (#2932). - Fixed an error in
readers.nemo_asr
(#2928). - Fixed a bug that caused the video reader hang (#2916).
Improvements
- Improve Tensors docs (#2915)
- DALI core allocation functions (#2930)
- Update FFmpeg build guide and update DALI_deps version (#2911)
- Default memory resources (#2890)
- Better error message when insufficient data in cache (#2924)
- Add a link to the TensorFlow ResNet50 training script in the Readme (#2927)
- Numba func notebook (#2886)
- Enable HW decoder ROI support (#2734)
- Use a custom color space conversion kernel for all conversions (#2907)
- Update packages used for DALI tests (#2906)
- Refactor TF Dataset code and lint it (#2909)
- Add ShotNoise CPU and GPU operators (#2861)
- Remove workaround for the problem with patchelf changing TLS alignment for CUDA < 10.2 and > 11.1 (#2879)
- Add dali_data_type_vec (#2887)
- Composite resource + renaming. (#2891)
- Update deps in third_party and conda (#2878)
- Python wrapper for numba (#2835)
- Image Decoder: Unified behavior across backends,Alpha channel support in PNG and JP2, YCbCr support in JP2 (#2867)
- Better error handling in pipeline.py (#2864)
- Update DALI deps (#2876)
- Enable CUDA 11.3 based builds (#2870)
- Updates MXNet plugin documentation regarding
last_batch_policy
(#2862) - README update with GTC2021 materials (#2860)
- RNGBase to be used as base for noise augmentations + Add GaussianNoise operator (as an example) (#2846)
- Pinned async resource (#2858)
- Add more mathematical operations (#2853)
- Add JpegCompressionDistortion CPU and GPU operators (#2823)
- Split Python tests into smaller chunks (#2847)
- Asynchronous pool memory resource (#2814)
Bug fixes
- Add missing opencv-python dependency to TL2_FW_iterators_perf test (#2939)
- Fix numpy reader header cache (#2932)
- NemoAsrReader: Call Reset() on tensor vector holding the batch, to clear any previous shared data pointer. (#2928)
- Fix DALI compilation for CUDA 11 pre 11.3 version (#2925)
- Make dynlink_xxx use statically linked functions to load symbols. (#2931)
- Fix test_detection_pipeline.py (#2929)
- Add a missing av_bsf_flush call to a VideoRader seek function (#2916)
- Run Optical Flow on stream 0 when running driver > 460. (#2914)
- Fix nvcc warning about unused arguments in ResampleDepth_Channels (#2913)
- Fix CUDA 10.0 compilation (#2917)
- Use stream 0 in VideoDecoder when running driver >460 / CUDA >= 11.3. (#2902)
- Fix docs and rename numba_func to numba_function (#2903)
- Allow to specify optional args of Python-only types (#2898)
- DALI TF install tool: Verify that a compatible prebuilt plugin is available for the required TF version before proceeding to attempt installation (#2882)
- Fix coverity issues by adding lacking CUDA_CALL (#2888)
- Fix failing test for Numba Func (#2893)
- Fix double accumulation in horizontal resampling. Add test. (#2871)
- Add espilon to math function tests and adjust epsilon for rsqrt. (#2865)
- Make not schedule any pipeline run when the iterator has
prepare_first_batch=False
(#2859) - Adjust the filenames of decoder test files and update licenses (#2844)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.2.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.2.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.2.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.2.0
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-1.2.0-2353277-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda100/nvidia-dali-tf-plugin-cuda100-1.2.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.2.0-2356513-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.2.0-2356513-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.2.0.tar.gz
FFmpeg source code:
Libsndfile source code:
DALI v1.1.0
Key Features and Enhancements
This DALI release includes the following key features and enhancements.
- Documentation improvements (#2834, #2824, #2831, #2758, #2820, and #2822).
- The following operators were added:
- The following kernels were added:
- Enabled CUDA kernels compression to decrease the DALI binaries size (#2833).
- Added the
src_dims
argument to the reshape operator (#2788).
Fixed issues
This DALI release includes the following fixes:
- Fixed a race condition in
readers.nemo_asr
whenpad_last_batch
is set to True (#2828). - Fixed the optical flow initialization issue (#2816).
- Fixed a race condition in the data loader (#2773).
Improvements
- Remove 0 default value from mean/std arguments of normalize. (#2834)
- Add JpegCompressionDistortionGPU kernel (#2830)
- Updates the pipeline docs page (#2824)
- Enable CUDA kernels compression in the final binary (#2833)
- Updates build documentation (#2831)
- Update key visual (#2822)
- Add NumbaFunc operator (#2804)
- Add JPEG distortion kernel (#2801)
- Add AddArg overloads for enum types (#2819)
- Update third party dependencies to latest release versions (#2811)
- Add an ability to provide a custom DALI_extra sha via env variable (#2810)
- Move all deps into subrepos (#2756)
- Reshape, Reinterpret, Squeeze and ExpandDims tutorial. (#2791)
- Separate creation of dependency creation and CUDA installation (#2786)
- Remove intermediate stage from CUDA toolkit dockerfile (#2803)
- Add Expand dims operator (#2800)
- Update TensorFlow ResNet50 example to the latest horovod 21.03 (#2793)
- Add squeeze operator (#2792)
- Add JPEG color conversion and chroma subsampling kernel (#2771)
- Add src_dims to reshape operator (#2788)
- GPU MultiPaste (#2681)
- Add --upgrade to pip install commands in documentation (#2758)
- Use flattened view of the array for copying to shared memory. (#2783)
Bug fixes
- Fix JPEG distortion kernel quality parameter handling (#2839)
- Fix typo "funcions" <- "funcions" in math doc (#2820)
- Update DALI_deps to include FLAC security patch (#2826)
- Fix coverity issues (#2812)
- Fix optical flow parameter initialization. (#2816)
- Add host fallback when nvjpegDecodeJpegDevice and nvjpegDecodeJpegHost fail (#2805)
- ExternalSource - discard data from all callbacks when one raises StopIteration (#2784)
- Exclude PyTorch-lighting test with MNIST (#2785)
- Fix iteration number tracking with pipeline.reset (#2777)
- Fix a race when loader starts reading even the metadata is not ready yet (#2773)
- Fix race condition in NemoAsrReader when pad_last_batch is set to True (#2828)
Breaking API changes
There are no breaking changes in this DALI release.
Deprecated features
There are no deprecated features in this DALI release.
Known issues:
- The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
- The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker
Binary builds
Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.1.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.1.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.1.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.1.0
Or use direct download links (CUDA 10.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda100/nvidia_dali_cuda100-1.1.0-2159051-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda100/nvidia-dali-tf-plugin-cuda100-1.1.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.1.0-2159930-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.1.0-2159930-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.1.0.tar.gz
FFmpeg source code:
Libsndfile source code: