Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Clarify requirements for running on GPU and supported configurations #2286

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
d37b104
update conda install instructions
david-cortes-intel Jan 27, 2025
ffe7ea3
simplify tables
david-cortes-intel Jan 28, 2025
202df88
further simplify
david-cortes-intel Jan 28, 2025
0bb6d30
clarify requirements for running on GPU
david-cortes-intel Jan 29, 2025
f9a8124
typos
david-cortes-intel Jan 29, 2025
91ba107
add note about support for FPGAs and SPMD
david-cortes-intel Jan 29, 2025
1bbc302
clarify optional aspect of dpctl
david-cortes-intel Jan 29, 2025
657a4ae
solve merge conflicts
david-cortes-intel Jan 29, 2025
2d4c9da
clarify requirements for SPMD, clarify conda-forge status
david-cortes-intel Jan 29, 2025
c0d6ba5
more clarifications about requirements
david-cortes-intel Jan 29, 2025
f82ccba
small details
david-cortes-intel Jan 29, 2025
327a6f8
wording
david-cortes-intel Jan 29, 2025
7c06604
order
david-cortes-intel Jan 29, 2025
425484d
link to dpctl docs programmatically
david-cortes-intel Jan 29, 2025
66e47a2
don't use automated section naming
david-cortes-intel Jan 29, 2025
e05d3f6
use unnamed reference for mpi4py
david-cortes-intel Jan 29, 2025
352688c
more corrections
david-cortes-intel Jan 29, 2025
701e65f
clarify sycl device support
david-cortes-intel Jan 30, 2025
e72a5a7
clarify dpctl not required for spmd
david-cortes-intel Jan 30, 2025
2318cc2
clarify integrated chipsets not supported
david-cortes-intel Jan 30, 2025
17da146
typo
david-cortes-intel Jan 30, 2025
bccfbdb
single link occurrence for dpctl
david-cortes-intel Jan 30, 2025
092d4b0
clarify mpi4py requirements
david-cortes-intel Jan 30, 2025
e096dc7
more spmd details
david-cortes-intel Jan 30, 2025
30cd3ae
wording
david-cortes-intel Jan 30, 2025
7eb6f80
mention detail about queues
david-cortes-intel Jan 30, 2025
2d181df
correction oneMath->oneMKL
david-cortes-intel Jan 30, 2025
aa9fb20
more corrections
david-cortes-intel Jan 30, 2025
9141240
copyright header
david-cortes-intel Jan 30, 2025
58cd64c
update main page too
david-cortes-intel Jan 31, 2025
0a96e77
more improvements
david-cortes-intel Jan 31, 2025
cd3a183
spacing
david-cortes-intel Jan 31, 2025
ded1d29
mention multi-gpu setups in readme
david-cortes-intel Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ The software acceleration is achieved with vector instructions, AI hardware-spec

With Intel(R) Extension for Scikit-learn, you can:

* Speed up training and inference by up to 100x with the equivalent mathematical accuracy
* Benefit from performance improvements across different Intel(R) hardware configurations
* Speed up training and inference by up to 100x with equivalent mathematical accuracy
* Benefit from performance improvements across different Intel(R) hardware configurations, including GPUs and multi-GPU configurations
* Integrate the extension into your existing Scikit-learn applications without code modifications
* Continue to use the open-source scikit-learn API
* Enable and disable the extension with a couple of lines of code or at the command line
Expand All @@ -71,12 +71,14 @@ Intel(R) Extension for Scikit-learn is also a part of [Intel(R) AI Tools](https:
from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
```

- **Enable Intel(R) GPU optimizations**

_Note: executing on GPU has [additional system software requirements](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html) - see [details](https://uxlfoundation.github.io/scikit-learn-intelex/latest/oneapi-gpu.html)._

```py
import numpy as np
import dpctl
Expand All @@ -86,7 +88,7 @@ Intel(R) Extension for Scikit-learn is also a part of [Intel(R) AI Tools](https:
from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with config_context(target_offload="gpu:0"):
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
```
Expand Down
5 changes: 4 additions & 1 deletion doc/sources/algorithms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@
.. See the License for the specific language governing permissions and
.. limitations under the License.

.. include:: substitutions.rst
.. _sklearn_algorithms:

####################
Supported Algorithms
####################

Applying |intelex| impacts the following scikit-learn algorithms:
Applying |intelex| impacts the following |sklearn| estimators:

on CPU
------
Expand Down Expand Up @@ -380,6 +381,8 @@ Other Tasks
- All parameters are supported
- Only dense data is supported

.. _spmd-support:

SPMD Support
------------

Expand Down
1 change: 1 addition & 0 deletions doc/sources/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@

intersphinx_mapping = {
"sklearn": ("https://scikit-learn.org/stable/", None),
"dpctl": ("https://intelpython.github.io/dpctl/latest", None),
# from scikit-learn, in case some object in sklearnex points to them:
# https://github.com/scikit-learn/scikit-learn/blob/main/doc/conf.py
"python": ("https://docs.python.org/{.major}".format(sys.version_info), None),
Expand Down
82 changes: 64 additions & 18 deletions doc/sources/distributed-mode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,39 +12,85 @@
.. See the License for the specific language governing permissions and
.. limitations under the License.

.. include:: substitutions.rst

.. _distributed:

Distributed Mode
================
Distributed Mode (SPMD)
=======================

|intelex| offers Single Program, Multiple Data (SPMD) supported interfaces for distributed computing.
Several `GPU-supported algorithms <https://uxlfoundation.github.io/scikit-learn-intelex/latest/oneapi-gpu.html#>`_
also provide distributed, multi-GPU computing capabilities via integration with ``mpi4py``. The prerequisites
Several :doc:`GPU-supported algorithms <oneapi-gpu>`
also provide distributed, multi-GPU computing capabilities via integration with |mpi4py|. The prerequisites
match those of GPU computing, along with an MPI backend of your choice (`Intel MPI recommended
<https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.dcan6r>`_, available
via ``impi-devel`` python package) and the ``mpi4py`` python package. If using |intelex|
via ``impi_rt`` python package) and the |mpi4py| python package. If using |intelex|
`installed from sources <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/INSTALL.md#build-from-sources>`_,
ensure that the spmd_backend is built.

Note that |intelex| now supports GPU offloading to speed up MPI operations. This is supported automatically with
some MPI backends, but in order to use GPU offloading with Intel MPI, set the following environment variable (providing
.. important::
SMPD mode requires the |mpi4py| package used at runtime to be compiled with the same MPI backend as the |intelex|. The PyPI and Conda distributions of |intelex| both use Intel's MPI as backend, and hence require an |mpi4py| also built with Intel's MPI - it can be easily installed from Intel's conda channel as follows::

conda install -c https://software.repos.intel.com/python/conda/ mpi4py

It also requires the MPI runtime executable (``mpiexec`` / ``mpirun``) to be from the same library that was used to compile the |intelex| - Intel's MPI runtime library is offered as a Python package ``impi_rt`` and will be installed together with the ``mpi4py`` package if executing the command above, but otherwise, it can be installed separately from different distribution channels:

- Intel's conda channel (recommended)::

conda install -c https://software.repos.intel.com/python/conda/ impi_rt

- Conda-Forge::

conda install -c conda-forge impi_rt

- PyPI (not recommended, might require setting additional environment variables)::

pip install impi_rt

Using other MPI backends (e.g. OpenMPI) requires building |intelex| from source with that backend.
ethanglaser marked this conversation as resolved.
Show resolved Hide resolved

Note that |intelex| supports GPU offloading to speed up MPI operations. This is supported automatically with
some MPI backends, but in order to use GPU offloading with Intel MPI, it is required to set the environment variable ``I_MPI_OFFLOAD`` to ``1`` (providing
data on device without this may lead to a runtime error):

::
- On Linux*::

export I_MPI_OFFLOAD=1

- On Windows*::

set I_MPI_OFFLOAD=1

SMPD-aware versions of estimators can be imported from the ``sklearnex.spmd`` module. Data should be distributed across multiple nodes as
desired, and should be transfered to a |dpctl| or `dpnp <https://github.com/IntelPython/dpnp>`__ array before being passed to the estimator.

Note that SPMD estimators allow an additional argument ``queue`` in their ``.fit`` / ``.predict`` methods, which accept :obj:`dpctl.SyclQueue` objects. For example, while the signature for :obj:`sklearn.linear_model.LinearRegression.predict` would be

.. code-block:: python

def predict(self, X): ...

The signature for the corresponding predict method in ``sklearnex.spmd.linear_model.LinearRegression.predict`` is:

.. code-block:: python

def predict(self, X, queue=None): ...

Examples of SPMD usage can be found in the GitHub repository for the |intelex| under `examples/sklearnex <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/sklearnex>`__.

export I_MPI_OFFLOAD=1
To run on SPMD mode, first create a python file using SPMD estimators from ``sklearnex.spmd``, such as `linear_regression_spmd.py <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/sklearnex/linear_regression_spmd.py>`__.

Estimators can be imported from the ``sklearnex.spmd`` module. Data should be distributed across multiple nodes as
desired, and should be transfered to a dpctl or dpnp array before being passed to the estimator. View a full
example of this process in the |intelex| repository, where many examples of our SPMD-supported estimators are
available: https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/sklearnex/. To run:
Then, execute the file through MPI under multiple ranks - for example:

::
- On Linux*::

mpirun -n 4 python linear_regression_spmd.py

mpirun -n 4 python linear_regression_spmd.py
- On Windows*::

mpiexec -n 4 python linear_regression_spmd.py

Note that additional mpirun arguments can be added as desired. SPMD-supported estimators are listed in the
`algorithms support documentation <https://uxlfoundation.github.io/scikit-learn-intelex/latest/algorithms.html#spmd-support>`_.
Note that additional ``mpirun`` arguments can be added as desired. SPMD-supported estimators are listed in the :ref:`spmd-support` section.

Additionally, daal4py offers some distributed functionality, see
Additionally, ``daal4py`` (previously a separate package, now an importable module within ``scikit-learn-intelex``) offers some distributed functionality, see
`documentation <https://intelpython.github.io/daal4py/scaling.html>`_ for further details.
24 changes: 13 additions & 11 deletions doc/sources/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,29 +12,28 @@
.. See the License for the specific language governing permissions and
.. limitations under the License.

.. |intelex_repo| replace:: |intelex| repository
.. _intelex_repo: https://github.com/uxlfoundation/scikit-learn-intelex
.. include:: substitutions.rst

.. _index:

#########
|intelex|
#########

Intel(R) Extension for Scikit-learn is a **free software AI accelerator** designed to deliver up to **100X** faster performance for your existing scikit-learn code.
|intelex| is a **free software AI accelerator** designed to deliver up to **100X** faster performance for your existing |sklearn| code.
The software acceleration is achieved with vector instructions, AI hardware-specific memory optimizations, threading, and optimizations for all upcoming Intel(R) platforms at launch time.

.. rubric:: Designed for Data Scientists and Framework Designers


Use Intel(R) Extension for Scikit-learn, to:
Use |intelex|, to:

* Speed up training and inference by up to 100x with the equivalent mathematical accuracy
* Benefit from performance improvements across different x86-compatible CPUs or Intel(R) GPUs
* Integrate the extension into your existing Scikit-learn applications without code modifications
* Speed up training and inference by up to 100x with equivalent mathematical accuracy
* Benefit from performance improvements across different x86-64 CPUs and Intel(R) GPUs
* Integrate the extension into your existing |sklearn| applications without code modifications
* Enable and disable the extension with a couple of lines of code or at the command line

Intel(R) Extension for Scikit-learn is also a part of `Intel(R) AI Tools <https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html>`_.
|intelex| is also a part of `Intel(R) AI Tools <https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html>`_.


.. image:: _static/scikit-learn-acceleration.PNG
Expand Down Expand Up @@ -65,11 +64,14 @@ Enable Intel(R) CPU Optimizations
from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Enable Intel(R) GPU optimizations
*********************************

Note: executing on GPU has `additional system software requirements <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`__ - see :doc:`oneapi-gpu`.

::

import numpy as np
Expand All @@ -80,7 +82,7 @@ Enable Intel(R) GPU optimizations
from sklearn.cluster import DBSCAN

X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with config_context(target_offload="gpu:0"):
clustering = DBSCAN(eps=3, min_samples=2).fit(X)

Expand All @@ -101,7 +103,7 @@ Enable Intel(R) GPU optimizations
:maxdepth: 2

algorithms.rst
oneAPI and GPU support <oneapi-gpu.rst>
oneapi-gpu.rst
distributed-mode.rst
non-scikit-algorithms.rst
input-types.rst
Expand Down
Loading
Loading