More improvements to the library guide (#1835)

* Align terminology and improve wording related to execution policies * Add information about -fsycl-pstl-offload and improve the known issue note * Add the limitation on the use of pointer-to-member objects * Clarify the need for policies and data storage to match * Formatting and stylistic improvements
uxlfoundation · Nov 15, 2024 · 69214e2 · 69214e2
1 parent 7fdb5e5
commit 69214e2
Show file tree

Hide file tree

Showing 8 changed files with 158 additions and 130 deletions.
diff --git a/documentation/library_guide/api_for_sycl_kernels/tested_standard_cpp_api.rst b/documentation/library_guide/api_for_sycl_kernels/tested_standard_cpp_api.rst
@@ -463,9 +463,9 @@ C++ Standard API                      libstdc++  libc++     MSVC
 These tests were done for the following versions of the standard C++ library:
 
 ============================================= =============================================
-libstdc++(GNU)                                Provided with GCC*-7.5.0, GCC*-9.3.0
+libstdc++ (GNU)                               Provided with GCC*-7.5.0, GCC*-9.3.0
 --------------------------------------------- ---------------------------------------------
-libc++(LLVM)                                  Provided with Clang*-11.0
+libc++ (LLVM)                                 Provided with Clang*-11.0
 --------------------------------------------- ---------------------------------------------
 Microsoft Visual C++* (MSVC) Standard Library Provided with Microsoft Visual Studio* 2017;
                                               Microsoft Visual Studio 2019; and Microsoft 

diff --git a/documentation/library_guide/common_cross_document_links.txt b/documentation/library_guide/common_cross_document_links.txt
@@ -1,26 +1,32 @@
-.. |onedpl_library_guide| replace:: Intel® oneAPI DPC++ Library Guide
-.. _onedpl_library_guide: https://www.intel.com/content/www/us/en/docs/onedpl/developer-guide/2022-7/overview.html
-
-.. |dpcpp_gsg| replace:: Get Started with the Intel® oneAPI DPC++/C++ Compiler
-.. _dpcpp_gsg: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/get-started-guide/2024-2/overview.html
-
-.. |dpcpp_cpp_with_gsg_link| replace:: Intel® oneAPI DPC++/C++ Compiler
-.. _dpcpp_cpp_with_gsg_link: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/get-started-guide/2024-2/overview.html
-
-.. |dpcpp_cmake_support| replace:: CMake support documentation for the Intel® oneAPI DPC++/C++ Compiler
-.. _dpcpp_cmake_support: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2024-2/use-cmake-with-the-compiler.html
-
-.. |vector_pragma| replace:: pragma vector documentation in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference
-.. _vector_pragma: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2024-2/vector.html
-
-.. |unroll_pragma| replace:: unroll Pragma
-.. _unroll_pragma: https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-2/unroll-pragma.html
-
-.. |loop_analysis| replace:: Loop Analysis
-.. _loop_analysis: https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-2/loop-analysis.html
-
-.. |fpga_handbook| replace:: Intel® oneAPI FPGA Handbook
-.. _fpga_handbook: https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-2/intel-oneapi-fpga-handbook.html
-
-.. |yocto_layers| replace:: Layers for Yocto* Project
-.. _yocto_layers: https://www.intel.com/content/www/us/en/docs/oneapi-iot-toolkit/get-started-guide-linux/2024-0/adding-oneapi-components-to-yocto-project-builds.html
+.. |onedpl_library_guide| replace:: Intel® oneAPI DPC++ Library Guide
+.. _onedpl_library_guide: https://www.intel.com/content/www/us/en/docs/onedpl/developer-guide/2022-7/overview.html
+
+.. |dpcpp_gsg| replace:: Get Started with the Intel® oneAPI DPC++/C++ Compiler
+.. _dpcpp_gsg: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/get-started-guide/2024-2/overview.html
+
+.. |dpcpp_cpp_with_gsg_link| replace:: Intel® oneAPI DPC++/C++ Compiler
+.. _dpcpp_cpp_with_gsg_link: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/get-started-guide/2024-2/overview.html
+
+.. |dpcpp_cmake_support| replace:: CMake support documentation for the Intel® oneAPI DPC++/C++ Compiler
+.. _dpcpp_cmake_support: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2024-2/use-cmake-with-the-compiler.html
+
+.. |vector_pragma| replace:: pragma vector documentation in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference
+.. _vector_pragma: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2024-2/vector.html
+
+.. |pstl_offload_option| replace:: ``-fsycl-pstl-offload`` option
+.. _pstl_offload_option: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2024-2/fsycl-pstl-offload.html
+
+.. |esimd_sycl_extension| replace:: Explicit SIMD SYCL extension
+.. _esimd_sycl_extension: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2024-2/explicit-simd-sycl-extension.html
+
+.. |unroll_pragma| replace:: unroll Pragma
+.. _unroll_pragma: https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-2/unroll-pragma.html
+
+.. |loop_analysis| replace:: Loop Analysis
+.. _loop_analysis: https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-2/loop-analysis.html
+
+.. |fpga_handbook| replace:: Intel® oneAPI FPGA Handbook
+.. _fpga_handbook: https://www.intel.com/content/www/us/en/docs/oneapi-fpga-add-on/developer-guide/2024-2/intel-oneapi-fpga-handbook.html
+
+.. |yocto_layers| replace:: Layers for Yocto* Project
+.. _yocto_layers: https://www.intel.com/content/www/us/en/docs/oneapi-iot-toolkit/get-started-guide-linux/2024-0/adding-oneapi-components-to-yocto-project-builds.html
diff --git a/documentation/library_guide/introduction.rst b/documentation/library_guide/introduction.rst
@@ -22,7 +22,7 @@ page for:
 * Known Issues and Limitations
 * Previous Release Notes
 
-Install the `Intel® oneAPI Base Toolkit (Base Kit) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html#gs.xaontv>`_
+Install the `Intel® oneAPI Base Toolkit (Base Kit) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html>`_
 to use |onedpl_short|.
 
 System Requirements
@@ -35,7 +35,7 @@ C++17 is the minimal supported version of the C++ standard.
 That means, any use of |onedpl_short| may require a C++17 compiler.
 While some APIs of the library may accidentally work with earlier versions of the C++ standard, it is no more guaranteed.
 
-To call Parallel API with the C++ standard policies, you need to install the following software:
+To call Parallel API with the C++ standard aligned policies, you need to install the following software:
 
 * A C++ compiler with support for OpenMP* 4.0 (or higher) SIMD constructs
 * Depending on what parallel backend you want to use, install either:
@@ -70,10 +70,21 @@ Follow the steps below to build your code with |onedpl_short|:
 Below is an example of a command line used to compile code that contains |onedpl_short| parallel algorithms
 on Linux* (depending on the code, parameters within [] could be unnecessary):
 
-.. code:: cpp
+.. code::
 
   icpx [-fsycl] [-fiopenmp] program.cpp [-ltbb] -o program
 
+You may also use the |pstl_offload_option|_ of |dpcpp_cpp| powered by |onedpl_short|
+to build the standard C++ code for execution on a SYCL device:
+
+.. code::
+
+  icpx -fsycl -fsycl-pstl-offload=gpu program.cpp -o program
+
+This option redirects C++ parallel algorithms invoked with the ``std::execution::par_unseq`` policy
+to |onedpl_short| algorithms. It does not change the behavior of the |onedpl_short| execution policies and algorithms
+that are directly used in the code.
+
 Useful Information
 ==================
 
@@ -85,23 +96,23 @@ Difference with Standard C++ Parallel Algorithms
 * Function objects passed in to algorithms executed with device policies must provide ``const``-qualified ``operator()``.
   `The SYCL specification <https://registry.khronos.org/SYCL/>`_ states that writing to such an object during a SYCL
   kernel is undefined behavior.
-* For the following algorithms, par_unseq and unseq policies do not result in vectorized execution:
+* For the following algorithms, ``par_unseq`` and ``unseq`` policies do not result in SIMD execution:
   ``includes``, ``inplace_merge``, ``merge``, ``set_difference``, ``set_intersection``,
   ``set_symmetric_difference``, ``set_union``, ``stable_partition``, ``unique``.
 * The following algorithms require additional O(n) memory space for parallel execution:
   ``copy_if``, ``inplace_merge``, ``partial_sort``, ``partial_sort_copy``, ``partition_copy``,
   ``remove``, ``remove_if``, ``rotate``, ``sort``, ``stable_sort``, ``unique``, ``unique_copy``.
 
-
 Restrictions
 ************
 
-When called with |dpcpp_short| execution policies, |onedpl_short| algorithms apply the same restrictions as
-|dpcpp_short| does (see the |dpcpp_short| specification and the SYCL specification for details), such as:
+When called with device execution policies, |onedpl_short| algorithms apply the same restrictions as
+|dpcpp_short| does (see the |dpcpp_cpp| documentation and the SYCL specification for details), such as:
 
 * Adding buffers to a lambda capture list is not allowed for lambdas passed to an algorithm.
 * Passing data types, which are not trivially copyable, is only allowed via USM,
   but not via buffers or host-allocated containers.
+* Objects of pointer-to-member types cannot be passed to an algorithm.
 * The definition of lambda functions used with parallel algorithms should not depend on preprocessor macros
   that makes it different for the host and the device. Otherwise, the behavior is undefined.
 * When used within SYCL kernels or transferred to/from a device, a container class can only hold objects
@@ -111,10 +122,9 @@ When called with |dpcpp_short| execution policies, |onedpl_short| algorithms app
 Known Limitations
 *****************
 
-* When compiled with ``-fsycl-pstl-offload`` option of Intel oneAPI DPC++/C++ compiler and with
-  ``libstdc++`` version 8 or ``libc++``, ``oneapi::dpl::execution::par_unseq`` offloads
-  standard parallel algorithms to the SYCL device similarly to ``std::execution::par_unseq``
-  in accordance with the ``-fsycl-pstl-offload`` option value.
+* The ``oneapi::dpl::execution::par_unseq`` policy is affected by ``-fsycl-pstl-offload`` option of |dpcpp_cpp|
+  when |onedpl_short| substitutes this policy for the ``std::execution::par_unseq`` policy
+  missing in a standard C++ library, particularly in libstdc++ version 8 and in libc++.
 * For ``transform_exclusive_scan`` and ``exclusive_scan`` to run in-place (that is, with the same data
   used for both input and destination) and with an execution policy of ``unseq`` or ``par_unseq``,
   it is required that the provided input and destination iterators are equality comparable.
@@ -124,9 +134,9 @@ Known Limitations
   convertible to the type of the initial value if one is provided, otherwise it is convertible to the type of values
   in the processed data sequence: ``std::iterator_traits<IteratorType>::value_type``.
 * ``exclusive_scan`` and ``transform_exclusive_scan`` algorithms may provide wrong results with
-  vector execution policies when building a program with GCC 10 and using ``-O0`` option.
-* Compiling ``reduce`` and ``transform_reduce`` algorithms with the Intel DPC++ Compiler, versions 2021 and older,
-  may result in a runtime error. To fix this issue, use an Intel DPC++ Compiler version 2022 or newer.
+  unsequenced execution policies when building a program with GCC 10 and using ``-O0`` option.
+* Compiling ``reduce`` and ``transform_reduce`` algorithms with |dpcpp_cpp| versions 2021 and older
+  may result in a runtime error. To fix this issue, use |dpcpp_cpp| version 2022 or newer.
 * When compiling on Windows, add the option ``/EHsc`` to the compilation command to avoid errors with oneDPL's experimental
   ranges API that uses exceptions.
 * The ``using namespace oneapi;`` directive in a |onedpl_short| program code may result in compilation errors
@@ -139,12 +149,12 @@ Known Limitations
   for double precision.
 * ``exclusive_scan``, ``inclusive_scan``, ``exclusive_scan_by_segment``,
   ``inclusive_scan_by_segment``, ``transform_exclusive_scan``, ``transform_inclusive_scan``,
-  when used with C++ standard policies, impose limitations on the initial value type if an
+  when used with C++ standard aligned policies, impose limitations on the initial value type if an
   initial value is provided, and on the value type of the input iterator if an initial value is
   not provided.
   Firstly, it must satisfy the ``DefaultConstructible`` requirements.
   Secondly, a default-constructed instance of that type should act as the identity element for the binary scan function.
-* ``reduce_by_segment``, when used with C++ standard policies, imposes limitations on the value type.
+* ``reduce_by_segment``, when used with C++ standard aligned policies, imposes limitations on the value type.
   Firstly, it must satisfy the ``DefaultConstructible`` requirements.
   Secondly, a default-constructed instance of that type should act as the identity element for the binary reduction function.
 * The initial value type for ``exclusive_scan``, ``inclusive_scan``, ``exclusive_scan_by_segment``,
@@ -154,10 +164,3 @@ Known Limitations
   the dereferenced value type of the provided iterators should satisfy the ``DefaultConstructible`` requirements.
 * For ``remove``, ``remove_if``, ``unique`` the dereferenced value type of the provided
   iterators should be ``MoveConstructible``.
-* The algorithms that process uninitialized storage: ``uninitialized_copy``, ``uninitialized_copy_n``, ``uninitialized_fill``, ``uninitialized_fill_n``, ``uninitialized_fill_n``, ``uninitialized_move``,
-  ``uninitialized_move_n``, ``uninitialized_default_construct``, ``uninitialized_default_construct_n``, ``uninitialized_value_construct``, ``uninitialized_value_construct_n``
-  should be called with a device policy when using device data and should be called with a host policy when using host data. Otherwise, the result is undefined.
-* The algorithms that destroy data: ``destroy`` and ``destroy_n`` should be called with a host policy when using host data that was initialized on the host, and should be called with a device policy when using device data that was initialized on the device. Otherwise, the result is undefined.
-
-
-.. _`Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes`: https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html
diff --git a/documentation/library_guide/kernel_templates/esimd_main.rst b/documentation/library_guide/kernel_templates/esimd_main.rst
@@ -1,9 +1,7 @@
 ESIMD-Based Kernel Templates
 ############################
 
-The ESIMD kernel templates are based on `Explicit SIMD SYCL extension
-<https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2024-0/explicit-simd-sycl-extension.html>`_
-of Intel® oneAPI DPC++/C++ Compiler.
+The ESIMD kernel templates are based on |esimd_sycl_extension|_ of |dpcpp_cpp|.
 This technology only supports Intel GPU devices.
 
 These templates are available in the ``oneapi::dpl::experimental::kt::gpu::esimd`` namespace. The following are implemented: