Improve Tensors docs (#2915)

* Expose the CUDA and NumPy Array Interface * Add links to the above * Add a paragraph about the memory being wrapped * Mention that the memory is invalidated in subsequent iteration * Add better cross-links * Turn on plugin for section labels (easy links in one .rst) * Download the latest dali.png from repo. Signed-off-by: Krzysztof Lecki <[email protected]>
NVIDIA · May 10, 2021 · b00f2f8 · b00f2f8
1 parent 6939669
commit b00f2f8
Show file tree

Hide file tree

Showing 5 changed files with 47 additions and 19 deletions.
diff --git a/README.rst b/README.rst
@@ -12,7 +12,7 @@ for built in data loaders and data iterators in popular deep learning frameworks
 
 Deep learning applications require complex, multi-stage data processing pipelines
 that include loading, decoding, cropping, resizing, and many other augmentations.
-These data processing pipelines, which are currently executed on the CPU, have become a 
+These data processing pipelines, which are currently executed on the CPU, have become a
 bottleneck, limiting the performance and scalability of training and inference.
 
 DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the
@@ -23,11 +23,11 @@ are handled transparently for the user.
 In addition, the deep learning frameworks have multiple data pre-processing implementations,
 resulting in challenges such as portability of training and inference workflows, and code
 maintainability. Data processing pipelines implemented using DALI are portable because they
-can easily be retargeted to TensorFlow, PyTorch, MXNet and PaddlePaddle. 
+can easily be retargeted to TensorFlow, PyTorch, MXNet and PaddlePaddle.
 
 .. image:: /dali.png
     :width: 800
-    :align: center 
+    :align: center
     :alt: DALI Diagram
 
 Highlights
@@ -64,7 +64,7 @@ To install the latest DALI release for the latest CUDA version (11.x)::
 
     pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda110
 
-DALI comes preinstalled in the TensorFlow, PyTorch, and MXNet containers on `NVIDIA GPU Cloud <https://ngc.nvidia.com>`_ 
+DALI comes preinstalled in the TensorFlow, PyTorch, and MXNet containers on `NVIDIA GPU Cloud <https://ngc.nvidia.com>`_
 (versions 18.07 and later).
 
 For other installation paths (TensorFlow plugin, older CUDA version, nightly and weekly builds, etc),

diff --git a/dali/python/backend_impl.cc b/dali/python/backend_impl.cc
@@ -301,7 +301,7 @@ void ExposeTensor(py::module &m) {
           Python object to be checked
       )code");
 
-  py::class_<Tensor<CPUBackend>>(m, "TensorCPU", py::buffer_protocol())
+  auto tensor_cpu_binding = py::class_<Tensor<CPUBackend>>(m, "TensorCPU", py::buffer_protocol())
     .def(py::init([](py::capsule &capsule, string layout = "") {
           auto t = std::make_unique<Tensor<CPUBackend>>();
           FillTensorFromDlPack(capsule, t.get(), layout);
@@ -310,7 +310,7 @@ void ExposeTensor(py::module &m) {
       "object"_a,
       "layout"_a = "",
       R"code(
-      DLPack of Tensor residing in the CPU memory.
+      Wrap a DLPack Tensor residing in the CPU memory.
 
       object : DLPack object
             Python DLPack object
@@ -366,7 +366,7 @@ void ExposeTensor(py::module &m) {
       "layout"_a = "",
       "is_pinned"_a = false,
       R"code(
-      Tensor residing in the CPU memory.
+      Wrap a Tensor residing in the CPU memory.
 
       b : object
             the buffer to wrap into the TensorListCPU object
@@ -424,10 +424,17 @@ void ExposeTensor(py::module &m) {
       )code")
     .def_property("__array_interface__", &ArrayInterfaceRepr<CPUBackend>, nullptr,
       R"code(
-      Returns array interface representation of TensorCPU.
+      Returns Array Interface representation of TensorCPU.
       )code");
+  tensor_cpu_binding.doc() = R"code(
+      Class representing a Tensor residing in host memory. It can be used to access individual
+      samples of a :class:`TensorListCPU` or used to wrap CPU memory that is intended
+      to be passed as an input to DALI.
 
-  py::class_<Tensor<GPUBackend>>(m, "TensorGPU")
+      It is compatible with `Python Buffer Protocol <https://docs.python.org/3/c-api/buffer.html>`_
+      and `NumPy Array Interface <https://numpy.org/doc/stable/reference/arrays.interface.html>`_.)code";
+
+  auto tensor_gpu_binding = py::class_<Tensor<GPUBackend>>(m, "TensorGPU")
     .def(py::init([](py::capsule &capsule, string layout = "") {
           auto t = std::make_unique<Tensor<GPUBackend>>();
           FillTensorFromDlPack(capsule, t.get(), layout);
@@ -436,7 +443,7 @@ void ExposeTensor(py::module &m) {
       "object"_a,
       "layout"_a = "",
       R"code(
-      DLPack of Tensor residing in the GPU memory.
+      Wrap a DLPack Tensor residing in the GPU memory.
 
       object : DLPack object
             Python DLPack object
@@ -452,10 +459,10 @@ void ExposeTensor(py::module &m) {
       "layout"_a = "",
       "device_id"_a = -1,
       R"code(
-      Tensor residing in the GPU memory.
+      Wrap a Tensor residing in the GPU memory that implements CUDA Array Interface.
 
       object : object
-            Python object that implement CUDA Array Interface
+            Python object that implements CUDA Array Interface
       layout : str
             Layout of the data
       device_id: int
@@ -540,8 +547,14 @@ void ExposeTensor(py::module &m) {
       )code")
     .def_property("__cuda_array_interface__",  &ArrayInterfaceRepr<GPUBackend>, nullptr,
       R"code(
-      Returns cuda array interface representation of TensorGPU.
+      Returns CUDA Array Interface (Version 2) representation of TensorGPU.
       )code");
+  tensor_gpu_binding.doc() = R"code(
+      Class representing a Tensor residing in GPU memory. It can be used to access individual
+      samples of a :class:`TensorListGPU` or used to wrap GPU memory that is intended
+      to be passed as an input to DALI.
+
+      It is compatible with `CUDA Array Interface <https://numba.pydata.org/numba-doc/dev/cuda/cuda_array_interface.html>`_.)code";
 }
 
 template <typename Backend>

diff --git a/docs/conf.py b/docs/conf.py
@@ -102,6 +102,7 @@
     'IPython.sphinxext.ipython_console_highlighting',
     'nbsphinx',
     'sphinx.ext.intersphinx',
+    'sphinx.ext.autosectionlabel',
 ]
 
 # Add any paths that contain templates here, relative to this directory.
@@ -171,7 +172,7 @@
 subprocess.call(["wget", "-O", favicon_rel_path, "https://docs.nvidia.com/images/nvidia.ico"])
 html_favicon = favicon_rel_path
 
-subprocess.call(["wget", "-O", "dali.png", "https://developer.nvidia.com/sites/default/files/akamai/dali.png"])
+subprocess.call(["wget", "-O", "dali.png", "https://raw.githubusercontent.com/NVIDIA/DALI/master/dali.png"])
 
 # Custom sidebar templates, must be a dictionary that maps document names
 # to template names.

diff --git a/docs/data_types.rst b/docs/data_types.rst
@@ -5,10 +5,21 @@ Types
 
 TensorList
 ----------
-.. currentmodule:: nvidia.dali.pipeline
+.. currentmodule:: nvidia.dali
 
-TensorList represents a batch of tensors. TensorLists are the return values of `Pipeline.run`
-or `Pipeline.share_outputs`
+TensorList represents a batch of tensors. TensorLists are the return values of :meth:`Pipeline.run`,
+:meth:`Pipeline.outputs` or :meth:`Pipeline.share_outputs`.
+
+Subsequent invocations of the mentioned functions (or :meth:`Pipeline.release_outputs`) invalidate
+the TensorList (as well as any DALI :ref:`Tensors<Tensor>` obtained from it) and indicate to DALI
+that the memory can be used for something else.
+
+TensorList wraps the outputs of current iteration and is valid only for the duration of the
+iteration. Using the TensorList after moving to the next iteration is not allowed.
+If you wish to retain the data you need to copy it before indicating DALI that you released it.
+
+For typicall use-cases, for example when DALI is used through :ref:`DL Framework Plugins`,
+no additionall memory bookkeeping is necessary.
 
 .. currentmodule:: nvidia.dali.backend
 
@@ -33,14 +44,14 @@ TensorCPU
 .. autoclass:: TensorCPU
    :members:
    :undoc-members:
-   :special-members: __init__
+   :special-members: __init__, __array_interface__
 
 TensorGPU
 ^^^^^^^^^
 .. autoclass:: TensorGPU
    :members:
    :undoc-members:
-   :special-members: __init__
+   :special-members: __init__, __cuda_array_interface__
 
 
 .. _layout_str_doc:

diff --git a/docs/framework_plugins.rst b/docs/framework_plugins.rst
@@ -1,3 +1,6 @@
+
+.. _DL Framework Plugins
+
 DL Framework Plugins
 ====================