diff --git a/README.md b/README.md
index 6a045cdaaf..21322a8cb9 100755
--- a/README.md
+++ b/README.md
@@ -45,8 +45,8 @@ The software acceleration is achieved with vector instructions, AI hardware-spec
 
 With Intel(R) Extension for Scikit-learn, you can:
 
-* Speed up training and inference by up to 100x with the equivalent mathematical accuracy
-* Benefit from performance improvements across different Intel(R) hardware configurations
+* Speed up training and inference by up to 100x with equivalent mathematical accuracy
+* Benefit from performance improvements across different Intel(R) hardware configurations, including GPUs and multi-GPU configurations
 * Integrate the extension into your existing Scikit-learn applications without code modifications
 * Continue to use the open-source scikit-learn API
 * Enable and disable the extension with a couple of lines of code or at the command line
@@ -71,12 +71,14 @@ Intel(R) Extension for Scikit-learn is also a part of [Intel(R) AI Tools](https:
     from sklearn.cluster import DBSCAN
 
     X = np.array([[1., 2.], [2., 2.], [2., 3.],
-                [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+                  [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
     clustering = DBSCAN(eps=3, min_samples=2).fit(X)
     ```
 
 - **Enable Intel(R) GPU optimizations**
 
+    _Note: executing on GPU has [additional system software requirements](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html) - see [details](https://uxlfoundation.github.io/scikit-learn-intelex/latest/oneapi-gpu.html)._
+
     ```py
     import numpy as np
     import dpctl
@@ -86,7 +88,7 @@ Intel(R) Extension for Scikit-learn is also a part of [Intel(R) AI Tools](https:
     from sklearn.cluster import DBSCAN
 
     X = np.array([[1., 2.], [2., 2.], [2., 3.],
-                [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+                  [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
     with config_context(target_offload="gpu:0"):
         clustering = DBSCAN(eps=3, min_samples=2).fit(X)
     ```
diff --git a/doc/sources/algorithms.rst b/doc/sources/algorithms.rst
index 611e157156..8a5173de30 100755
--- a/doc/sources/algorithms.rst
+++ b/doc/sources/algorithms.rst
@@ -12,13 +12,14 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
+.. include:: substitutions.rst
 .. _sklearn_algorithms:
 
 ####################
 Supported Algorithms
 ####################
 
-Applying |intelex| impacts the following scikit-learn algorithms:
+Applying |intelex| impacts the following |sklearn| estimators:
 
 on CPU
 ------
@@ -380,6 +381,8 @@ Other Tasks
      - All parameters are supported
      - Only dense data is supported
 
+.. _spmd-support:
+
 SPMD Support
 ------------
 
diff --git a/doc/sources/conf.py b/doc/sources/conf.py
index 4e8101d03d..c2f5ddb7ef 100755
--- a/doc/sources/conf.py
+++ b/doc/sources/conf.py
@@ -73,6 +73,7 @@
 
 intersphinx_mapping = {
     "sklearn": ("https://scikit-learn.org/stable/", None),
+    "dpctl": ("https://intelpython.github.io/dpctl/latest", None),
     # from scikit-learn, in case some object in sklearnex points to them:
     # https://github.com/scikit-learn/scikit-learn/blob/main/doc/conf.py
     "python": ("https://docs.python.org/{.major}".format(sys.version_info), None),
diff --git a/doc/sources/distributed-mode.rst b/doc/sources/distributed-mode.rst
index c78a50d9e0..166cc0f876 100644
--- a/doc/sources/distributed-mode.rst
+++ b/doc/sources/distributed-mode.rst
@@ -12,39 +12,85 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
+.. include:: substitutions.rst
+
 .. _distributed:
 
-Distributed Mode
-================
+Distributed Mode (SPMD)
+=======================
 
 |intelex| offers Single Program, Multiple Data (SPMD) supported interfaces for distributed computing.
-Several `GPU-supported algorithms <https://uxlfoundation.github.io/scikit-learn-intelex/latest/oneapi-gpu.html#>`_
-also provide distributed, multi-GPU computing capabilities via integration with ``mpi4py``. The prerequisites
+Several :doc:`GPU-supported algorithms <oneapi-gpu>`
+also provide distributed, multi-GPU computing capabilities via integration with |mpi4py|. The prerequisites
 match those of GPU computing, along with an MPI backend of your choice (`Intel MPI recommended
 <https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.dcan6r>`_, available
-via ``impi-devel`` python package) and the ``mpi4py`` python package. If using |intelex|
+via ``impi_rt`` python package) and the |mpi4py| python package. If using |intelex|
 `installed from sources <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/INSTALL.md#build-from-sources>`_,
 ensure that the spmd_backend is built.
 
-Note that |intelex| now supports GPU offloading to speed up MPI operations. This is supported automatically with
-some MPI backends, but in order to use GPU offloading with Intel MPI, set the following environment variable (providing
+.. important::
+  SMPD mode requires the |mpi4py| package used at runtime to be compiled with the same MPI backend as the |intelex|. The PyPI and Conda distributions of |intelex| both use Intel's MPI as backend, and hence require an |mpi4py| also built with Intel's MPI - it can be easily installed from Intel's conda channel as follows::
+    
+    conda install -c https://software.repos.intel.com/python/conda/ mpi4py
+
+  It also requires the MPI runtime executable (``mpiexec`` / ``mpirun``) to be from the same library that was used to compile the |intelex| - Intel's MPI runtime library is offered as a Python package ``impi_rt`` and will be installed together with the ``mpi4py`` package if executing the command above, but otherwise, it can be installed separately from different distribution channels:
+
+  - Intel's conda channel (recommended)::
+
+      conda install -c https://software.repos.intel.com/python/conda/ impi_rt
+
+  - Conda-Forge::
+
+      conda install -c conda-forge impi_rt
+
+  - PyPI (not recommended, might require setting additional environment variables)::
+
+      pip install impi_rt
+
+  Using other MPI backends (e.g. OpenMPI) requires building |intelex| from source with that backend.
+
+Note that |intelex| supports GPU offloading to speed up MPI operations. This is supported automatically with
+some MPI backends, but in order to use GPU offloading with Intel MPI, it is required to set the environment variable ``I_MPI_OFFLOAD`` to ``1`` (providing
 data on device without this may lead to a runtime error):
 
-::
+- On Linux*::
+    
+    export I_MPI_OFFLOAD=1
+
+- On Windows*::
+    
+    set I_MPI_OFFLOAD=1
+
+SMPD-aware versions of estimators can be imported from the ``sklearnex.spmd`` module. Data should be distributed across multiple nodes as
+desired, and should be transfered to a |dpctl| or `dpnp <https://github.com/IntelPython/dpnp>`__ array before being passed to the estimator.
+
+Note that SPMD estimators allow an additional argument ``queue`` in their ``.fit`` / ``.predict`` methods, which accept :obj:`dpctl.SyclQueue` objects. For example, while the signature for :obj:`sklearn.linear_model.LinearRegression.predict` would be
+
+.. code-block:: python
+
+    def predict(self, X): ...
+
+The signature for the corresponding predict method in ``sklearnex.spmd.linear_model.LinearRegression.predict`` is:
+
+.. code-block:: python
+
+    def predict(self, X, queue=None): ...
+
+Examples of SPMD usage can be found in the GitHub repository for the |intelex| under `examples/sklearnex <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/sklearnex>`__.
 
-     export I_MPI_OFFLOAD=1
+To run on SPMD mode, first create a python file using SPMD estimators from ``sklearnex.spmd``, such as `linear_regression_spmd.py <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/sklearnex/linear_regression_spmd.py>`__.
 
-Estimators can be imported from the ``sklearnex.spmd`` module. Data should be distributed across multiple nodes as
-desired, and should be transfered to a dpctl or dpnp array before being passed to the estimator. View a full
-example of this process in the |intelex| repository, where many examples of our SPMD-supported estimators are
-available: https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/examples/sklearnex/. To run:
+Then, execute the file through MPI under multiple ranks - for example:
 
-::
+- On Linux*::
+    
+    mpirun -n 4 python linear_regression_spmd.py
 
-  mpirun -n 4 python linear_regression_spmd.py
+- On Windows*::
+    
+    mpiexec -n 4 python linear_regression_spmd.py
 
-Note that additional mpirun arguments can be added as desired. SPMD-supported estimators are listed in the
-`algorithms support documentation <https://uxlfoundation.github.io/scikit-learn-intelex/latest/algorithms.html#spmd-support>`_.
+Note that additional ``mpirun`` arguments can be added as desired. SPMD-supported estimators are listed in the :ref:`spmd-support` section.
 
-Additionally, daal4py offers some distributed functionality, see
+Additionally, ``daal4py`` (previously a separate package, now an importable module within ``scikit-learn-intelex``) offers some distributed functionality, see
 `documentation <https://intelpython.github.io/daal4py/scaling.html>`_ for further details.
diff --git a/doc/sources/index.rst b/doc/sources/index.rst
index 627692b118..e2f624478e 100755
--- a/doc/sources/index.rst
+++ b/doc/sources/index.rst
@@ -12,8 +12,7 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
-.. |intelex_repo| replace:: |intelex| repository
-.. _intelex_repo: https://github.com/uxlfoundation/scikit-learn-intelex
+.. include:: substitutions.rst
 
 .. _index:
 
@@ -21,20 +20,20 @@
 |intelex|
 #########
 
-Intel(R) Extension for Scikit-learn is a **free software AI accelerator** designed to deliver up to **100X** faster performance for your existing scikit-learn code.
+|intelex| is a **free software AI accelerator** designed to deliver up to **100X** faster performance for your existing |sklearn| code.
 The software acceleration is achieved with vector instructions, AI hardware-specific memory optimizations, threading, and optimizations for all upcoming Intel(R) platforms at launch time.
 
 .. rubric:: Designed for Data Scientists and Framework Designers
 
 
-Use Intel(R) Extension for Scikit-learn, to:
+Use |intelex|, to:
 
-* Speed up training and inference by up to 100x with the equivalent mathematical accuracy
-* Benefit from performance improvements across different x86-compatible CPUs or Intel(R) GPUs
-* Integrate the extension into your existing Scikit-learn applications without code modifications
+* Speed up training and inference by up to 100x with equivalent mathematical accuracy
+* Benefit from performance improvements across different x86-64 CPUs and Intel(R) GPUs
+* Integrate the extension into your existing |sklearn| applications without code modifications
 * Enable and disable the extension with a couple of lines of code or at the command line
 
-Intel(R) Extension for Scikit-learn is also a part of `Intel(R) AI Tools <https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html>`_.
+|intelex| is also a part of `Intel(R) AI Tools <https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html>`_.
 
 
 .. image:: _static/scikit-learn-acceleration.PNG
@@ -65,11 +64,14 @@ Enable Intel(R) CPU Optimizations
    from sklearn.cluster import DBSCAN
 
    X = np.array([[1., 2.], [2., 2.], [2., 3.],
-               [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+                 [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
    clustering = DBSCAN(eps=3, min_samples=2).fit(X)
 
 Enable Intel(R) GPU optimizations
 *********************************
+
+Note: executing on GPU has `additional system software requirements <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`__ - see :doc:`oneapi-gpu`.
+
 ::
 
    import numpy as np
@@ -80,7 +82,7 @@ Enable Intel(R) GPU optimizations
    from sklearn.cluster import DBSCAN
 
    X = np.array([[1., 2.], [2., 2.], [2., 3.],
-               [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+                 [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
    with config_context(target_offload="gpu:0"):
        clustering = DBSCAN(eps=3, min_samples=2).fit(X)
 
@@ -101,7 +103,7 @@ Enable Intel(R) GPU optimizations
    :maxdepth: 2
 
    algorithms.rst
-   oneAPI and GPU support <oneapi-gpu.rst>
+   oneapi-gpu.rst
    distributed-mode.rst
    non-scikit-algorithms.rst
    input-types.rst
diff --git a/doc/sources/oneapi-gpu.rst b/doc/sources/oneapi-gpu.rst
index f9808f97e4..3993b3d93c 100644
--- a/doc/sources/oneapi-gpu.rst
+++ b/doc/sources/oneapi-gpu.rst
@@ -12,65 +12,76 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
+.. include:: substitutions.rst
 .. _oneapi_gpu:
 
 ##############################################################
 oneAPI and GPU support in |intelex|
 ##############################################################
 
-|intelex| supports oneAPI concepts, which
-means that algorithms can be executed on different devices: CPUs and GPUs.
-This is done via integration with
-`dpctl <https://intelpython.github.io/dpctl/latest/index.html>`_ package that
-implements core oneAPI concepts like queues and devices.
+|intelex| can execute computations on different devices (CPUs, GPUs) through the SYCL framework in oneAPI.
+
+The device used for computations can be easily controlled through the target offloading functionality (e.g. through ``sklearnex.config_context(target_offload="gpu")`` - see rest of this page for more details), but for finer-grained controlled (e.g. operating on arrays that are already in a given device's memory), it can also interact with objects from package |dpctl|, which offers a Python interface over SYCL concepts such as devices, queues, and USM (unified shared memory) arrays.
+
+While not strictly required, package |dpctl| is recommended for a better experience on GPUs.
+
+.. important:: Be aware that GPU usage requires non-Python dependencies on your system, such as the `Intel(R) GPGPU Drivers <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`_.
 
 Prerequisites
 -------------
 
-For execution on GPU, DPC++ compiler runtime and driver are required. Refer to `DPC++ system
-requirements <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`_ for details.
+For execution on GPUs, DPC++ runtime and GPGPU drivers are required.
 
-DPC++ compiler runtime can be installed either from PyPI or Anaconda:
+DPC++ compiler runtime can be installed either from PyPI or Conda:
 
 - Install from PyPI::
 
      pip install dpcpp-cpp-rt
 
-- Install using Conda via the Intel repository::
+- Install using Conda from Intel's repository::
 
-     conda install dpcpp_cpp_rt -c https://software.repos.intel.com/python/conda/
+     conda install -c https://software.repos.intel.com/python/conda/ dpcpp_cpp_rt
 
-Device offloading
------------------
+- Install using Conda from the conda-forge channel::
 
-|intelex| offers two options for running an algorithm on a
-specific device with the help of dpctl:
+     conda install -c conda-forge dpcpp_cpp_rt
 
-- Pass input data as `dpctl.tensor.usm_ndarray <https://intelpython.github.io/dpctl/latest/docfiles/dpctl/usm_ndarray.html#dpctl.tensor.usm_ndarray>`_ to the algorithm.
+For GPGPU driver installation instructions, see the general `DPC++ system requirements <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`_ sections corresponding to your operating system.
 
-  The computation will run on the device where the input data is
-  located, and the result will be returned as :code:`usm_ndarray` to the same
-  device.
+Device offloading
+-----------------
 
-  .. note::
-    All the input data for an algorithm must reside on the same device.
+|intelex| offers two options for running an algorithm on a specified device:
 
-  .. warning::
-    The :code:`usm_ndarray` can only be consumed by the base methods
-    like :code:`fit`, :code:`predict`, and :code:`transform`.
-    Note that only the algorithms in |intelex| support
-    :code:`usm_ndarray`. The algorithms from the stock version of scikit-learn
-    do not support this feature.
 - Use global configurations of |intelex|\*:
 
-  1. The :code:`target_offload` option can be used to set the device primarily
-     used to perform computations. Accepted data types are :code:`str` and
-     :code:`dpctl.SyclQueue`. If you pass a string to :code:`target_offload`,
-     it should either be ``"auto"``, which means that the execution
-     context is deduced from the location of input data, or a string
-     with SYCL* filter selector. The default value is ``"auto"``.
-
-  2. The :code:`allow_fallback_to_host` option
+  1. The :code:`target_offload` argument (in ``config_context`` and in ``set_config`` / ``get_config``)
+     can be used to set the device primarily used to perform computations. Accepted data types are
+     :code:`str` and :obj:`dpctl.SyclQueue`. Strings must match to device names recognized by
+     the SYCL* device filter selector - for example, ``"gpu"``. If passing ``"auto"``,
+     the device will be deduced from the location of the input data. Examples:
+
+     .. code-block:: python
+        
+        from sklearnex import config_context
+        from sklearnex.linear_model import LinearRegression
+        
+        with config_context(target_offload="gpu"):
+            model = LinearRegression().fit(X, y)
+
+     .. code-block:: python
+        
+        from sklearnex import set_config
+        from sklearnex.linear_model import LinearRegression
+        
+        set_config(target_offload="gpu")
+        model = LinearRegression().fit(X, y)
+
+
+     If passing a string different than ``"auto"``,
+     it must be a device 
+
+  2. The :code:`allow_fallback_to_host` argument in those same configuration functions
      is a Boolean flag. If set to :code:`True`, the computation is allowed
      to fallback to the host device when a particular estimator does not support
      the selected device. The default value is :code:`False`.
@@ -83,16 +94,27 @@ call :code:`sklearnex.get_config()`.
      Functions :code:`set_config`, :code:`get_config` and :code:`config_context`
      are always patched after the :code:`sklearnex.patch_sklearn()` call.
 
-.. rubric:: Compatibility considerations
+- Pass input data as :obj:`dpctl.tensor.usm_ndarray` to the algorithm.
+
+  The computation will run on the device where the input data is
+  located, and the result will be returned as :code:`usm_ndarray` to the same
+  device.
+
+  .. note::
+    All the input data for an algorithm must reside on the same device.
+
+  .. warning::
+    The :code:`usm_ndarray` can only be consumed by the base methods
+    like :code:`fit`, :code:`predict`, and :code:`transform`.
+    Note that only the algorithms in |intelex| support
+    :code:`usm_ndarray`. The algorithms from the stock version of |sklearn|
+    do not support this feature.
 
-For compatibility reasons, algorithms in |intelex| may be offloaded to the device using
-:code:`daal4py.oneapi.sycl_context`. However, it is recommended to use one of the options
-described above for device offloading instead of using :code:`sycl_context`.
 
 Example
 -------
 
-An example on how to patch your code with Intel CPU/GPU optimizations:
+A full example of how to patch your code with Intel CPU/GPU optimizations:
 
 .. code-block:: python
 
@@ -102,11 +124,11 @@ An example on how to patch your code with Intel CPU/GPU optimizations:
    from sklearn.cluster import DBSCAN
 
    X = np.array([[1., 2.], [2., 2.], [2., 3.],
-               [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
+                 [8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
    with config_context(target_offload="gpu:0"):
       clustering = DBSCAN(eps=3, min_samples=2).fit(X)
 
 
-.. note:: Current offloading behavior restricts fitting and inference of any models to be
-     in the same context or absence of context. For example, a model trained in the GPU context with
-     target_offload="gpu:0" throws an error if the inference is made outside the same GPU context.
+.. note:: Current offloading behavior restricts fitting and predictions (a.k.a. inference) of any models to be
+     in the same context or absence of context. For example, a model whose ``.fit()`` method was called in a GPU context with
+     ``target_offload="gpu:0"`` will throw an error if a ``.predict()`` call is then made outside the same GPU context.
diff --git a/doc/sources/quick-start.rst b/doc/sources/quick-start.rst
index 084039f321..2ec236424f 100644
--- a/doc/sources/quick-start.rst
+++ b/doc/sources/quick-start.rst
@@ -12,19 +12,18 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
-.. |intelex_repo| replace:: |intelex| repository
-.. _intelex_repo: https://github.com/uxlfoundation/scikit-learn-intelex
+.. include:: substitutions.rst
 
 ####################
 Quick Start
 ####################
 
-Get ready to elevate your scikit-learn code with |intelex| and experience the benefits of accelerated performance in just a few simple steps.
+Get ready to elevate your |sklearn| code with |intelex| and experience the benefits of accelerated performance in just a few simple steps.
 
 Compatibility with Scikit-learn*
 ---------------------------------
 
-Intel(R) Extension for Scikit-learn is compatible with the last four versions of scikit-learn.
+|intelex| is compatible with the latest stable releases of |sklearn| - see :ref:`software-requirements` for more details.
 
 Integrate |intelex|
 --------------------
@@ -32,10 +31,10 @@ Integrate |intelex|
 Patching
 **********************
 
-Once you install Intel*(R) Extension for Scikit-learn*, you replace algorithms that exist in the scikit-learn package with their optimized versions from the extension.
-This action is called ``patching``. This is not a permanent change so you can always undo the patching if necessary.
+Once you install the |intelex|, you can replace estimator classes (algorithms) that exist in the ``sklearn`` module from |sklearn| with their optimized versions from the extension.
+This action is called `patching`. This is not a permanent change so you can always undo the patching if necessary.
 
-To patch Intel® Extension for Scikit-learn, use one of these methods:
+To patch |sklearn| with the |intelex|, the following methods can be used:
 
 .. list-table::
    :header-rows: 1
@@ -71,7 +70,7 @@ They support different enabling scenarios while producing the same result.
 
 **Example**
 
-This example shows how to patch Intel(R) extension for Scikit-Learn by modifing your script. To make sure that patching is registered by the scikit-learn estimators, always import scikit-learn after these lines.
+This example shows how to patch |sklearn| by modifing your script. To make sure that patching is registered by the scikit-learn estimators, always import module ``sklearn`` after these lines.
 
 .. code-block:: python
   :caption: Example: Drop-In Patching
@@ -85,7 +84,7 @@ This example shows how to patch Intel(R) extension for Scikit-Learn by modifing
 
     # The use of the original Scikit-learn is not changed
     X = np.array([[1,  2], [1,  4], [1,  0],
-                [10, 2], [10, 4], [10, 0]])
+                  [10, 2], [10, 4], [10, 0]])
     kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
     print(f"kmeans.labels_ = {kmeans.labels_}")
 
@@ -93,7 +92,7 @@ This example shows how to patch Intel(R) extension for Scikit-Learn by modifing
 Global Patching
 **********************
 
-You can also use global patching to patch all your scikit-learn applications without any additional actions.
+You can also use global patching to patch all your |sklearn| applications without any additional actions.
 
 Before you begin, make sure that you have read and write permissions for Scikit-learn files.
 
@@ -160,10 +159,10 @@ With global patching, you can:
 Unpatching
 **********************
 
-To undo the patch (also called `unpatching`) is to return scikit-learn to original implementation and
-replace patched algorithms with the stock scikit-learn algorithms.
+To undo the patch (also called `unpatching`) is to return the ``sklearn`` module to the original implementation and
+replace patched estimators with the stock |sklearn| estimators.
 
-To unpatch successfully, you must reimport the scikit-learn package::
+To unpatch successfully, you must reimport the ``sklearn`` module(s)::
 
   sklearnex.unpatch_sklearn()
   # Re-import scikit-learn algorithms after the unpatch
@@ -191,62 +190,29 @@ To install |intelex|, run:
 **Supported Configurations**
 
 .. list-table::
-   :header-rows: 1
    :align: left
 
-   * - OS / Python version
-     - Python 3.9
-     - Python 3.10
-     - Python 3.11
-     - Python 3.12
-   * - Linux* OS
-     - [CPU, GPU]
-     - [CPU, GPU]
-     - [CPU, GPU]
-     - [CPU, GPU]
-   * - Windows* OS
-     - [CPU, GPU]
-     - [CPU, GPU]
-     - [CPU, GPU]
-     - [CPU, GPU]
+   * - Operating systems
+     - Windows*, Linux*
+   * - Python versions
+     - 3.9, 3.10, 3.11, 3.12, 3.13
+   * - Devices
+     - CPU, GPU
+   * - Modes
+     - Single, SPMD
 
+.. tip:: Running on GPU involves additional dependencies, see :doc:`oneapi-gpu`. SPMD mode has additional requirements on top of GPU ones, see :doc:`distributed-mode` for details.
 
+.. note:: Wheels are only available for x86-64 architecture.
 
 Install from Anaconda* Cloud
 ********************************************
 
 To prevent version conflicts, we recommend installing `scikit-learn-intelex` into a new conda environment.
 
-.. tabs::
-
-   .. tab:: Conda-Forge channel
-
-      Recommended by default.
-
-      To install, run::
-
-        conda install scikit-learn-intelex -c conda-forge
-
-      .. list-table:: **Supported Configurations**
-         :header-rows: 1
-         :align: left
-
-         * - OS / Python version
-           - Python 3.9
-           - Python 3.10
-           - Python 3.11
-           - Python 3.12
-         * - Linux* OS
-           - [CPU]
-           - [CPU]
-           - [CPU]
-           - [CPU]
-         * - Windows* OS
-           - [CPU]
-           - [CPU]
-           - [CPU]
-           - [CPU]
+*Note: the main Anaconda channel also provides distributions of scikit-learn-intelex, but it does not provide the latest versions, nor does it provide GPU-enabled builds. It is highly recommended to install it from either Intel's channel or conda-forge instead.*
 
+.. tabs::
 
    .. tab:: Intel channel
 
@@ -254,62 +220,49 @@ To prevent version conflicts, we recommend installing `scikit-learn-intelex` int
 
       To install, run::
 
-        conda install scikit-learn-intelex -c https://software.repos.intel.com/python/conda/
+        conda install -c https://software.repos.intel.com/python/conda/ scikit-learn-intelex
 
       .. list-table:: **Supported Configurations**
-         :header-rows: 1
          :align: left
 
-         * - OS / Python version
-           - Python 3.9
-           - Python 3.10
-           - Python 3.11
-           - Python 3.12
-         * - Linux* OS
-           - [CPU, GPU]
-           - [CPU, GPU]
-           - [CPU, GPU]
-           - [CPU, GPU]
-         * - Windows* OS
-           - [CPU, GPU]
-           - [CPU, GPU]
-           - [CPU, GPU]
-           - [CPU, GPU]
+         * - Operating systems
+           - Windows*, Linux*
+         * - Python versions
+           - 3.9, 3.10, 3.11, 3.12, 3.13
+         * - Devices
+           - CPU, GPU
+         * - Modes
+           - Single, SPMD
 
 
-
-   .. tab:: Main channel
+   .. tab:: Conda-Forge channel
 
       To install, run::
 
-        conda install scikit-learn-intelex
+        conda install -c conda-forge scikit-learn-intelex
 
       .. list-table:: **Supported Configurations**
-         :header-rows: 1
          :align: left
 
-         * - OS / Python version
-           - Python 3.9
-           - Python 3.10
-           - Python 3.11
-           - Python 3.12
-         * - Linux* OS
-           - [CPU]
-           - [CPU]
-           - [CPU]
-           - [CPU]
-         * - Windows* OS
-           - [CPU]
-           - [CPU]
-           - [CPU]
-           - [CPU]
+         * - Operating systems
+           - Windows*, Linux*
+         * - Python versions
+           - 3.9, 3.10, 3.11, 3.12, 3.13
+         * - Devices
+           - CPU, GPU
+         * - Modes
+           - Single, SPMD
 
+.. tip:: Running on GPU involves additional dependencies, see :doc:`oneapi-gpu`.  SPMD mode has additional requirements on top of GPU ones, see :doc:`distributed-mode` for details.
 
+.. note:: Packages are only available for x86-64 architecture.
+
+.. _build-from-sources:
 
 Build from Sources
 **********************
 
-See `Installation instructions <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/INSTALL.md>`_ to build |intelex| from the sources.
+See `Installation instructions <https://github.com/uxlfoundation/scikit-learn-intelex/blob/main/INSTALL.md#build-from-sources>`_ to build |intelex| from the sources.
 
 Install Intel*(R) AI Tools
 ****************************
@@ -319,7 +272,7 @@ Download the Intel AI Tools `here <https://www.intel.com/content/www/us/en/devel
 Release Notes
 -------------------
 
-See the `Release Notes <https://github.com/uxlfoundation/scikit-learn-intelex/releases>`_ for each version of Intel® Extension for Scikit-learn*.  
+See the `Release Notes <https://github.com/uxlfoundation/scikit-learn-intelex/releases>`_ for each version of |intelex|.
 
 System Requirements
 --------------------
@@ -331,23 +284,24 @@ Hardware Requirements
 
    .. tab:: CPU
 
-      All processors with ``x86`` architecture with at least one of the following instruction sets:
+      Any processor with ``x86-64`` architecture with at least one of the following instruction sets:
 
         - SSE2
         - SSE4.2
         - AVX2
         - AVX512
 
-      .. note:: ARM* architecture is not supported.
+      .. note::
+        Note: pre-built packages are not provided for other CPU architectures. See :ref:`build-from-sources` for ARM.
 
    .. tab:: GPU
 
-      - All Intel® integrated and discrete GPUs
-      - Intel® GPU drivers
+      - Any Intel® GPU supported by both `DPC++ <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`_ and `oneMKL <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/oneapi-math-kernel-library-system-requirements.html>`_
 
 
 .. tip:: Intel(R) processors provide better performance than other CPUs. Read more about hardware comparison in our :ref:`blogs <blogs>`.
 
+.. _software-requirements:
 
 Software Requirements
 **********************
@@ -362,25 +316,28 @@ Software Requirements
 
    .. tab:: GPU
 
-      - Linux* OS: Ubuntu* 18.04 or newer
-      - Windows* OS 10 or newer
-      - Windows* Server 2019 or newer
+      - A Linux* or Windows* version supported by DPC++ and oneMKL
+      - Intel® GPGPU drivers
+      - DPC++ runtime libraries
 
       .. important::
 
-         If you use accelerators, refer to `oneAPI DPC++/C++ Compiler System Requirements <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`_.
+         If you use accelerators (e.g. GPUs), refer to `oneAPI DPC++/C++ Compiler System Requirements <https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-dpcpp-system-requirements.html>`_.
 
-Intel(R) Extension for Scikit-learn is compatible with the last four versions of scikit-learn:
+|intelex| is compatible with the latest stable releases of |sklearn|:
 
 * 1.0.X
 * 1.1.X
 * 1.2.X
 * 1.3.X
+* 1.4.X
+* 1.5.X
+* 1.6.X
 
 Memory Requirements
 **********************
 By default, algorithms in |intelex| run in the multi-thread mode. This mode uses all available threads.
-Optimized scikit-learn algorithms can consume more RAM than their corresponding unoptimized versions.
+Optimized scikit-learn estimators can consume more RAM than their corresponding unoptimized versions.
 
 .. list-table::
    :header-rows: 1
@@ -390,7 +347,7 @@ Optimized scikit-learn algorithms can consume more RAM than their corresponding
      - Single-thread mode
      - Multi-thread mode
    * - SVM
-     - Both Scikit-learn and |intelex| consume approximately the same amount of RAM.
+     - Both |sklearn| and |intelex| consume approximately the same amount of RAM.
      - In |intelex|, an algorithm with ``N`` threads consumes ``N`` times more RAM.
 
 In all |intelex| algorithms with GPU support, computations run on device memory.
diff --git a/doc/sources/substitutions.rst b/doc/sources/substitutions.rst
new file mode 100644
index 0000000000..e7cdef0832
--- /dev/null
+++ b/doc/sources/substitutions.rst
@@ -0,0 +1,19 @@
+.. Copyright contributors to the oneDAL project
+..
+.. Licensed under the Apache License, Version 2.0 (the "License");
+.. you may not use this file except in compliance with the License.
+.. You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. |dpctl| replace:: :external+dpctl:doc:`dpctl <index>`
+.. |sklearn| replace:: :external+sklearn:doc:`scikit-learn <index>`
+.. |intelex_repo| replace:: |intelex| repository
+.. _intelex_repo: https://github.com/uxlfoundation/scikit-learn-intelex
+.. |mpi4py| replace:: `mpi4py <https://mpi4py.readthedocs.io/en/stable/>`__