Skip to content

Commit

Permalink
Update from github actions
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions committed Sep 30, 2024
0 parents commit e6740ba
Show file tree
Hide file tree
Showing 3,206 changed files with 1,583,187 additions and 0 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
Empty file added .nojekyll
Empty file.
9 changes: 9 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="refresh" content="7; url='https://uxlfoundation.github.io/oneAPI-spec/spec'" />
</head>
<body>
<p>Please follow <a href="https://uxlfoundation.github.io/oneAPI-spec/spec">this link</a>.</p>
</body>
</html>
905 changes: 905 additions & 0 deletions spec/404.html

Large diffs are not rendered by default.

Binary file added spec/_images/bf16_programming.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/critical_path_in_graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/data_analytics_stages.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/data_management_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/dataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/dep_graph.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/e2eframeworks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/error_functions_plot.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/extbrc_async.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/frame_cmplx.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<map id="%3" name="%3">
</map>
Binary file added spec/_images/half_edges.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/img_bf16_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/img_execution_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/img_programming_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/int8_programming.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added spec/_images/inverse_error_functions_plot.jpg
Binary file added spec/_images/message_flow_graph.jpg
Binary file added spec/_images/oneapi-architecture.png
Binary file added spec/_images/programming_concepts.png
Binary file added spec/_images/quad_uv.png
Binary file added spec/_images/rng-leapfrog.png
Binary file added spec/_images/rng-skip-ahead.png
Binary file added spec/_images/sdk_function_naming_convention.png
Binary file added spec/_images/structured_spherical_coords.png
Binary file added spec/_images/table_accessor_usage_example.png
Binary file added spec/_images/triangle_uv.png
Binary file added spec/_images/unrolled_stack_rnn.jpg
Binary file added spec/_images/vdb_structure.png
Binary file added spec/_images/vpp_region_of_interest_operation.png
14 changes: 14 additions & 0 deletions spec/_sources/404.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
..
.. SPDX-License-Identifier: CC-BY-4.0
==============
Page Not Found
==============

We cannot find the page. Please try:

- Starting the navigation from `spec.oneapi.io <https://spec.oneapi.io>`__
- Clearing your `browser cache <https://clear-my-cache.com/>`__ and
starting the navigation from `spec.oneapi.io <https://spec.oneapi.io>`__
- Filing an issue in `Github <https://github.com/uxlfoundation/oneapi-spec/issues>`__
15 changes: 15 additions & 0 deletions spec/_sources/404.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
..
.. SPDX-License-Identifier: CC-BY-4.0
==============
Page Not Found
==============

We cannot find the page. Please try:

- Starting the navigation from `spec.oneapi.com <https://spec.oneapi.com>`__
- Clearing your `browser cache <https://clear-my-cache.com/>`__ and
starting the navigation from `spec.oneapi.com <https://spec.oneapi.com>`__
- Filing an issue in `Github <https://github.com/oneapi-src/oneapi-spec/issues>`__
- Emailing to: `[email protected] <mailto:[email protected]>`__
154 changes: 154 additions & 0 deletions spec/_sources/architecture.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
..
.. SPDX-License-Identifier: CC-BY-4.0
Software Architecture
=====================

oneAPI provides a common developer interface across a range of data
parallel accelerators (see the figure below). Programmers use SYCL
for both API programming and direct programming. The capabilities of
a oneAPI platform are determined by the Level Zero interface, which
provides system software a common abstraction for a oneAPI device.

.. image:: oneapi-architecture.png

oneAPI Platform
---------------

A oneAPI platform is comprised of a *host* and a collection of
*devices*. The host is typically a multi-core CPU, and the devices
are one or more GPUs, FPGAs, and other accelerators. The processor
serving as the host can also be targeted as a device by the software.

Each device has an associated command *queue*. A application that
employs oneAPI runs on the host, following standard C++ execution
semantics. To run a *function object* on a device, the application
submits a *command group* containing the function object to the
device’s queue. A function object contains a function definition
together with associated variables. A function object submitted to a
queue is also referred to as a *data parallel kernel* or simply a
*kernel*.

The application running on the host and the functions running on the
devices communicate through *memory*. oneAPI defines several
mechanisms for sharing memory across the platform, depending on the
capabilities of the devices:


========================= ===========
Memory Sharing Mechanism Description
========================= ===========
Buffer objects | An application can create *buffer objects*
| to pass data to devices. A buffer is an
| array of data. A command group will define
| *accessor objects* to identify which
| buffers are accessed in this call to the
| device. The oneAPI runtime will ensure the
| data in the buffer is accessible to the
| function running on the device. The
| buffer-accessor mechanism is available on
| all oneAPI platforms
Unified addressing | Unified addressing guarantees that the host and
| all devices will share a unified address space.
| Pointer values in the unified address space will
| always refer to the same location in memory.
Unified shared memory | Unified shared memory enables data to be shared
| through pointers without using buffers and
| accessors. There are several levels of support
| for this feature, depending on the capabilities
| of the underlying device.
========================= ===========


The *scheduler* determines when a command group is run on a
device. The following mechanisms are used to determine when a command
group is ready to run.

- If the buffer-accessor method is used, the command group is ready
when the buffers are defined and copied to the device as
necessary.

- If an ordered queue is used for a device, the command group is
ready as soon as the prior command groups in the queue are
finished.

- If unified shared memory is used, you must specify a set of event
objects which the command group depends on, and the command group
is ready when all of the events are completed.

The application on the host and the functions on the devices can
*synchronize* through *events*, which are objects that can coordinate
execution. If the buffer-accessor mechanism
is used, the application and device can also synchronize through a
*host accessor*, through the destruction of a buffer object, or
through other more advanced mechanisms.

API Programming Example
-----------------------

API programming requires the programmer to specify the target device and the
memory communication strategy. In the following example, we call the
oneMKL matrix multiply routine, GEMM. We are writing in SYCL and
omitting irrelevant details.

We create a queue initialized with a *gpu_selector* to specify that we
want the computation performed on a GPU, and we define buffers to hold the
arrays allocated on the host. Compared to a standard C++ GEMM call,
we add a parameter to specify the queue, and we replace the references
to the arrays with references to the buffers that contain the arrays.
Otherwise this is the standard GEMM C++ interface.

.. code:: cpp
using namespace cl::sycl;
// declare host arrays
double *A = new double[M*N];
double *B = new double[N*P];
double *C = new double[M*P];
{
// Initializing the devices queue with a gpu_selector
queue q{gpu_selector()};
// Creating 1D buffers for matrices which are bound to host arrays
buffer<double, 1> a{A, range<1>{M*N}};
buffer<double, 1> b{B, range<1>{N*P}};
buffer<double, 1> c{C, range<1>{M*P}};
mkl::transpose nT = mkl::transpose::nontrans;
// Syntax
// void gemm(queue &exec_queue, transpose transa, transpose transb,
// int64_t m, int64_t n, int64_t k, T alpha,
// buffer<T,1> &a, int64_t lda,
// buffer<T,1> &b, int64_t ldb, T beta,
// buffer<T,1> &c, int64_t ldc);
// call gemm
mkl::blas::gemm(q, nT, nT, M, P, N, 1.0, a, M, b, N, 0.0, c, M);
}
// when we exit the block, the buffer destructor will write result back to C.
Direct Programming Example
--------------------------

With direct programming, we specify the target device and the memory
communication strategy, as we do for API programming. In addition, we
must define and submit a command group to perform the computation.
In the following example, we write a simple data parallel matrix
multiply. We are writing in SYCL and omitting irrelevant
details.

We create a queue initialized with a *gpu_selector* to specify that the
command group should run on the GPU, and we define buffers to hold the
arrays allocated on the host. We then submit the command group to the
queue to perform the computation. The command group defines accessors
to specify we are reading arrays A and B and writing to C. We then
write a C++ lambda to create a function object that computes one
element of the resulting matrix multiply. We specify this function
object as a parameter to a :code:`parallel_for` which maps the
function across the arrays :code:`A` and :code:`B` in parallel. When
we leave the scope, the destructor for the buffer object holding
:code:`C` writes the data back to the host array.

.. literalinclude:: example.cpp
154 changes: 154 additions & 0 deletions spec/_sources/architecture.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
..
.. SPDX-License-Identifier: CC-BY-4.0
Software Architecture
=====================

oneAPI provides a common developer interface across a range of data
parallel accelerators (see the figure below). Programmers use SYCL
for both API programming and direct programming. The capabilities of
a oneAPI platform are determined by the Level Zero interface, which
provides system software a common abstraction for a oneAPI device.

.. image:: oneapi-architecture.png

oneAPI Platform
---------------

A oneAPI platform is comprised of a *host* and a collection of
*devices*. The host is typically a multi-core CPU, and the devices
are one or more GPUs, FPGAs, and other accelerators. The processor
serving as the host can also be targeted as a device by the software.

Each device has an associated command *queue*. A application that
employs oneAPI runs on the host, following standard C++ execution
semantics. To run a *function object* on a device, the application
submits a *command group* containing the function object to the
device’s queue. A function object contains a function definition
together with associated variables. A function object submitted to a
queue is also referred to as a *data parallel kernel* or simply a
*kernel*.

The application running on the host and the functions running on the
devices communicate through *memory*. oneAPI defines several
mechanisms for sharing memory across the platform, depending on the
capabilities of the devices:


========================= ===========
Memory Sharing Mechanism Description
========================= ===========
Buffer objects | An application can create *buffer objects*
| to pass data to devices. A buffer is an
| array of data. A command group will define
| *accessor objects* to identify which
| buffers are accessed in this call to the
| device. The oneAPI runtime will ensure the
| data in the buffer is accessible to the
| function running on the device. The
| buffer-accessor mechanism is available on
| all oneAPI platforms
Unified addressing | Unified addressing guarantees that the host and
| all devices will share a unified address space.
| Pointer values in the unified address space will
| always refer to the same location in memory.
Unified shared memory | Unified shared memory enables data to be shared
| through pointers without using buffers and
| accessors. There are several levels of support
| for this feature, depending on the capabilities
| of the underlying device.
========================= ===========


The *scheduler* determines when a command group is run on a
device. The following mechanisms are used to determine when a command
group is ready to run.

- If the buffer-accessor method is used, the command group is ready
when the buffers are defined and copied to the device as
necessary.

- If an ordered queue is used for a device, the command group is
ready as soon as the prior command groups in the queue are
finished.

- If unified shared memory is used, you must specify a set of event
objects which the command group depends on, and the command group
is ready when all of the events are completed.

The application on the host and the functions on the devices can
*synchronize* through *events*, which are objects that can coordinate
execution. If the buffer-accessor mechanism
is used, the application and device can also synchronize through a
*host accessor*, through the destruction of a buffer object, or
through other more advanced mechanisms.

API Programming Example
-----------------------

API programming requires the programmer to specify the target device and the
memory communication strategy. In the following example, we call the
oneMKL matrix multiply routine, GEMM. We are writing in SYCL and
omitting irrelevant details.

We create a queue initialized with a *gpu_selector* to specify that we
want the computation performed on a GPU, and we define buffers to hold the
arrays allocated on the host. Compared to a standard C++ GEMM call,
we add a parameter to specify the queue, and we replace the references
to the arrays with references to the buffers that contain the arrays.
Otherwise this is the standard GEMM C++ interface.

.. code:: cpp
using namespace cl::sycl;
// declare host arrays
double *A = new double[M*N];
double *B = new double[N*P];
double *C = new double[M*P];
{
// Initializing the devices queue with a gpu_selector
queue q{gpu_selector()};
// Creating 1D buffers for matrices which are bound to host arrays
buffer<double, 1> a{A, range<1>{M*N}};
buffer<double, 1> b{B, range<1>{N*P}};
buffer<double, 1> c{C, range<1>{M*P}};
mkl::transpose nT = mkl::transpose::nontrans;
// Syntax
// void gemm(queue &exec_queue, transpose transa, transpose transb,
// int64_t m, int64_t n, int64_t k, T alpha,
// buffer<T,1> &a, int64_t lda,
// buffer<T,1> &b, int64_t ldb, T beta,
// buffer<T,1> &c, int64_t ldc);
// call gemm
mkl::blas::gemm(q, nT, nT, M, P, N, 1.0, a, M, b, N, 0.0, c, M);
}
// when we exit the block, the buffer destructor will write result back to C.
Direct Programming Example
--------------------------

With direct programming, we specify the target device and the memory
communication strategy, as we do for API programming. In addition, we
must define and submit a command group to perform the computation.
In the following example, we write a simple data parallel matrix
multiply. We are writing in SYCL and omitting irrelevant
details.

We create a queue initialized with a *gpu_selector* to specify that the
command group should run on the GPU, and we define buffers to hold the
arrays allocated on the host. We then submit the command group to the
queue to perform the computation. The command group defines accessors
to specify we are reading arrays A and B and writing to C. We then
write a C++ lambda to create a function object that computes one
element of the resulting matrix multiply. We specify this function
object as a parameter to a :code:`parallel_for` which maps the
function across the arrays :code:`A` and :code:`B` in parallel. When
we leave the scope, the destructor for the buffer object holding
:code:`C` writes the data back to the host array.

.. literalinclude:: example.cpp
16 changes: 16 additions & 0 deletions spec/_sources/elements/element_list.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.. SPDX-FileCopyrightText: 2019-2020 Intel Corporation
..
.. SPDX-License-Identifier: CC-BY-4.0
- :ref:`oneDPL-section`: A companion to the DPC++ Compiler for
programming oneAPI devices with APIs from C++ standard library,
Parallel STL, and extensions.
- :ref:`oneDNN-section`: High performance implementations of
primitives for deep learning frameworks
- :ref:`oneCCL-section`: Communication primitives for scaling deep
learning frameworks across multiple devices
- :ref:`oneDAL-section`: Algorithms for accelerated data science
- :ref:`oneTBB-section`: Library for adding thread-based parallelism
to complex applications on multiprocessors
- :ref:`oneMKL-section`: High performance math routines for science,
engineering, and financial applications
Loading

0 comments on commit e6740ba

Please sign in to comment.