Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributing a SYCL version #127

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions src_sycl/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Modifications Copyright (C) 2023 Intel Corporation

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


SPDX-License-Identifier: BSD-3-Clause
81 changes: 81 additions & 0 deletions src_sycl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# tsne

tsne implements [FIt-SNE algorithm](https://github.com/KlugerLab/FIt-SNE) for various GPU architectures (original CUDA source code is from [here](https://github.com/CannyLab/tsne-cuda)).

## SYCL version

- The CUDA code was converted to SYCL using Intel's DPC++ Compatiblity Tool (DPCT) available [here](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html).
- The same SYCL code runs on Intel GPUs & CPUs as well as NVIDIA (tested on A100 and H100) and AMD (tested on MI100 and MI250) GPUs. See build instructions below for more details.
- NOTE #1: This version bypasses use of FAISS by running input images through an offline Python version of FAISS and using its output as input to this SYCL version. So this is more suitable for hardware and framework (SYCL, CUDA, HIP) benchmarking.
- NOTE #2: This version also does not use fft from MKL. Instead it uses a manually implemented fft. For apples-to-apples comparison, we do have a corresponding (modified) CUDA version available [here](https://github.com/oneapi-src/Velocity-Bench/tree/main/tsne) in [Velocity-Bench](https://github.com/oneapi-src/Velocity-Bench). I am happy to add that CUDA version here, if that will be useful.

# Current Version:
- Initial release of the workload

# Build Instructions
Notes
- icpx compiler mentioned below is included in the oneAPI Base Toolkit available [here](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
- clang++ compiler mentioned below is available [here](https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md).


For Intel GPU -
First source icpx compiler. Then,

```
cd src_sycl/SYCL
mkdir build
cd build
CXX=icpx cmake -DGPU_AOT=pvc ..
make -sj
```
Note:
- To enable AOT compilation, please use the flag `-DGPU_AOT=pvc` for PVC.

For AMD GPU -
First source clang++ compiler. Then,
```
cd src_sycl/SYCL
mkdir build
cd build
CXX=clang++ cmake -DUSE_AMDHIP_BACKEND=gfx90a ..
make -sj
```
Note:
- We use the flag `-DUSE_AMDHIP_BACKEND=gfx90a` for MI250. Use the correct value for your GPU.

For NVIDIA GPU -
First source clang++ compiler. Then,
```
cd src_sycl/SYCL
mkdir build
cd build
CXX=clang++ cmake -DUSE_NVIDIA_BACKEND=YES -DUSE_SM=80 ..
make -sj
```
Note:
- We use the flag `-DUSE_SM=80` for A100 or `-DUSE_SM=90` for H100.

# Run instructions

After building, to run the workload, cd into the SYCL/build folder, if not already there. Then

```
# PVC 1 tile:
ONEAPI_DEVICE_SELECTOR=level_zero:0.0 ./tsne
```
```
# PVC 2 tiles:
ONEAPI_DEVICE_SELECTOR=level_zero:0 ./tsne
```
```
# AMD GPU:
ONEAPI_DEVICE_SELECTOR=hip:0 ./tsne
```
```
# NVIDIA GPU:
ONEAPI_DEVICE_SELECTOR=cuda:0 ./tsne
```

# Output

Output gives the total time for running the whole workload.
131 changes: 131 additions & 0 deletions src_sycl/SYCL/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Modifications Copyright (C) 2023 Intel Corporation
#
# Redistribution and use in source and binary forms, with or without modification,
# are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
# 3. Neither the name of the copyright holder nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS
# BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
# OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
# OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
# OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
#
# SPDX-License-Identifier: BSD-3-Clause
#

cmake_minimum_required(VERSION 3.10)

project(tsne LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)

option(ENABLE_KERNEL_PROFILING "Build using kernel profiling" OFF)
option(GPU_AOT "Build AOT for Intel GPU" OFF)
option(USE_NVIDIA_BACKEND "Build for NVIDIA backend" OFF)
option(USE_AMDHIP_BACKEND "Build for AMD HIP backend" OFF)

if(ENABLE_KERNEL_PROFILING)
message("-- Enabling kernel profiling")
add_compile_options(-DENABLE_KERNEL_PROFILING)
endif()

set(INTEL_GPU_CXX_FLAGS " -O2 -std=c++17 -fsycl -ffast-math -Wall -Wextra -Wno-unused-parameter -Wno-sign-compare -Wno-unknown-pragmas -Wno-unused-local-typedef ")
set(NVIDIA_GPU_CXX_FLAGS " -O3 -std=c++17 -fsycl -ffast-math -Wall -Wextra -Wno-unused-parameter -Wno-sign-compare -Wno-unknown-pragmas -Wno-unused-local-typedef ")
set(AMD_GPU_CXX_FLAGS " -O3 -std=c++17 -fsycl -ffast-math -Wall -Wextra -Wno-unused-parameter -Wno-sign-compare -Wno-unknown-pragmas -Wno-unused-local-typedef ")

set(USE_DEFAULT_FLAGS ON)
if("${CMAKE_CXX_FLAGS}" STREQUAL "")
message(STATUS "Using DEFAULT compilation flags")
else()
message(STATUS "OVERRIDING DEFAULT compilation flags")
set(USE_DEFAULT_FLAGS OFF)
endif()

# JIT compilation
if(GPU_AOT)
message(STATUS "Enabling INTEL backend")
if(USE_DEFAULT_FLAGS)
set(CMAKE_CXX_FLAGS "${INTEL_GPU_CXX_FLAGS}") # Default flags for Intel backend
endif()
if( (${GPU_AOT} STREQUAL "pvc") OR (${GPU_AOT} STREQUAL "PVC") )
message(STATUS "Enabling Intel GPU AOT compilation for ${GPU_AOT}")
string(APPEND CMAKE_CXX_FLAGS " -fsycl-targets=spir64_gen -Xs \"-device 0x0bd5 -revision_id 0x2f\" ")
else()
message(STATUS "Using custom AOT compilation flag ${GPU_AOT}")
string(APPEND CMAKE_CXX_FLAGS " ${GPU_AOT} ") # User should be aware of advanced AOT compilation flags
endif()
elseif(USE_NVIDIA_BACKEND)
message(STATUS "Enabling NVIDIA backend")
if(USE_DEFAULT_FLAGS)
set(CMAKE_CXX_FLAGS "${NVIDIA_GPU_CXX_FLAGS}") # Default flags for NV backend
endif()
string(APPEND CMAKE_CXX_FLAGS " -fsycl-targets=nvptx64-nvidia-cuda ") # -O3 will be used, even though -O2 was set earlier
elseif(USE_AMDHIP_BACKEND)
message(STATUS "Enabling AMD HIP backend for ${USE_AMDHIP_BACKEND} AMD architecture")
if(USE_DEFAULT_FLAGS)
set(CMAKE_CXX_FLAGS "${AMD_GPU_CXX_FLAGS}") # Default flags for AMD backend (gfx908 for MI100)
endif()
string(APPEND CMAKE_CXX_FLAGS " -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=${USE_AMDHIP_BACKEND} ")
endif()

if(GPU_AOT)
set(MKL_LINK static)
set(MKL_THREADING sequential)
find_package(MKL CONFIG REQUIRED HINTS "$ENV{MKLROOT}/lib/cmake/mkl")
endif()

# Project Setup
#-------------------------------------------------------------------------------
set(SOURCES
# # Utils
${CMAKE_SOURCE_DIR}/src/utils/debug_utils.dp.cpp
${CMAKE_SOURCE_DIR}/src/utils/cuda_utils.dp.cpp
${CMAKE_SOURCE_DIR}/src/utils/distance_utils.dp.cpp
${CMAKE_SOURCE_DIR}/src/utils/math_utils.dp.cpp
${CMAKE_SOURCE_DIR}/src/utils/matrix_broadcast_utils.dp.cpp
# ${CMAKE_SOURCE_DIR}/src/utils/reduce_utils.dp.cpp

# # Kernels
${CMAKE_SOURCE_DIR}/src/kernels/apply_forces.dp.cpp
${CMAKE_SOURCE_DIR}/src/kernels/attr_forces.dp.cpp
${CMAKE_SOURCE_DIR}/src/kernels/rep_forces.dp.cpp
${CMAKE_SOURCE_DIR}/src/kernels/perplexity_search.dp.cpp
${CMAKE_SOURCE_DIR}/src/kernels/nbodyfft.dp.cpp

# Method files
${CMAKE_SOURCE_DIR}/src/fit_tsne.dp.cpp

${CMAKE_SOURCE_DIR}/src/exe/main.dp.cpp
)

include_directories(
${CMAKE_SOURCE_DIR}/src/
${CMAKE_SOURCE_DIR}/src/include
/nfs/pdx/home/mgrabban/oneDPL/include
/nfs/pdx/home/mgrabban/oneTBB/include
)

add_executable(tsne ${SOURCES})

if(GPU_AOT)
target_compile_options(tsne PUBLIC $<TARGET_PROPERTY:MKL::MKL_DPCPP,INTERFACE_COMPILE_OPTIONS>)
target_include_directories(tsne PUBLIC $<TARGET_PROPERTY:MKL::MKL_DPCPP,INTERFACE_INCLUDE_DIRECTORIES>)
target_link_libraries(tsne PUBLIC $<LINK_ONLY:MKL::MKL_DPCPP>)
endif()
139 changes: 139 additions & 0 deletions src_sycl/SYCL/src/exe/main.dp.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
/* Modifications Copyright (C) 2023 Intel Corporation
*
* Redistribution and use in source and binary forms, with or without modification,
* are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
* 3. Neither the name of the copyright holder nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
* OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
* OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
* OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
* EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
*
* SPDX-License-Identifier: BSD-3-Clause
*/

// This file exposes a main file which does most of the testing with command line
// args, so we don't have to re-build to change options.

// Detailed includes
#include <sycl/sycl.hpp>
#include <time.h>
#include <string>
#include "include/fit_tsne.h"
#include "include/options.h"

// Option parser
#include "include/cxxopts.hpp"

#define TIMER_START() time_start = std::chrono::steady_clock::now();
#define TIMER_END() \
time_end = std::chrono::steady_clock::now(); \
time_total = std::chrono::duration<double, std::milli>(time_end - time_start).count();
#define TIMER_PRINT(name) std::cout << name <<": " << (time_total - time_total_) / 1e3 << " s\n";

// #ifndef DEBUG_TIME
// #define DEBUG_TIME
// #endif

#define STRINGIFY(X) #X

#define FOPT(x) result[STRINGIFY(x)].as<float>()
#define SOPT(x) result[STRINGIFY(x)].as<std::string>()
#define IOPT(x) result[STRINGIFY(x)].as<int>()
#define BOPT(x) result[STRINGIFY(x)].as<bool>()

int main(int argc, char** argv)
{
std::chrono::steady_clock::time_point time_start;
std::chrono::steady_clock::time_point time_end;
double time_total = 0.0;
double time_total_ = 0.0;

TIMER_START()

try {
// Setup command line options
cxxopts::Options options("TSNE-CUDA","Perform T-SNE in an optimized manner.");
options.add_options()
("l,learning-rate", "Learning Rate", cxxopts::value<float>()->default_value("200"))
("p,perplexity", "Perplexity", cxxopts::value<float>()->default_value("50.0"))
("e,early-ex", "Early Exaggeration Factor", cxxopts::value<float>()->default_value("12.0"))
("s,data", "Which program to run on <cifar10,cifar100,mnist,sim>", cxxopts::value<std::string>()->default_value("sim"))
("k,num-points", "How many simulated points to use", cxxopts::value<int>()->default_value("60000"))
("u,nearest-neighbors", "How many nearest neighbors should we use", cxxopts::value<int>()->default_value("32"))
("n,num-steps", "How many steps to take", cxxopts::value<int>()->default_value("1000"))
("i,viz", "Use interactive visualization", cxxopts::value<bool>()->default_value("false"))
("d,dump", "Dump the output points", cxxopts::value<bool>()->default_value("false"))
("m,magnitude-factor", "Magnitude factor for KNN", cxxopts::value<float>()->default_value("5.0"))
("t,init", "What kind of initialization to use <unif,gauss>", cxxopts::value<std::string>()->default_value("gauss"))
("f,fname", "File name for loaded data...", cxxopts::value<std::string>()->default_value("../train-images.idx3-ubyte"))
("c,connection", "Address for connection to vis server", cxxopts::value<std::string>()->default_value("tcp://localhost:5556"))
("q,dim", "Point Dimensions", cxxopts::value<int>()->default_value("50"))
("j,device", "Device to run on", cxxopts::value<int>()->default_value("0"))
("h,help", "Print help");

// Parse command line options
auto result = options.parse(argc, argv);

if (result.count("help"))
{
std::cout << options.help({""}) << std::endl;
exit(0);
}

tsnecuda::TSNE_INIT init_type = tsnecuda::TSNE_INIT::UNIFORM;
if (SOPT(init).compare("unif") == 0) {
init_type = tsnecuda::TSNE_INIT::UNIFORM;
} else {
init_type = tsnecuda::TSNE_INIT::GAUSSIAN;
}

// Do the T-SNE
printf("Starting TSNE calculation with %u points.\n", IOPT(num-points));

// Construct the options
tsnecuda::Options opt(nullptr, IOPT(num-points), IOPT(dim), nullptr);
opt.perplexity = FOPT(perplexity);
opt.learning_rate = FOPT(learning-rate);
opt.early_exaggeration = FOPT(early-ex);
opt.iterations = IOPT(num-steps);
opt.iterations_no_progress = IOPT(num-steps);
opt.magnitude_factor = FOPT(magnitude-factor);
opt.num_neighbors = IOPT(nearest-neighbors);
opt.initialization = init_type;

if (BOPT(dump)) {
opt.enable_dump("dump_ys.txt", 1);
}
if (BOPT(viz)) {
opt.enable_viz(SOPT(connection));
}

// Do the t-SNE
time_total_ = tsnecuda::RunTsne(opt);
std::cout << "\nDone!\n";
} catch (std::exception const& e) {
std::cout << "Exception: " << e.what() << "\n";
}

TIMER_END()
TIMER_PRINT("tsne - total time for whole calculation")

return 0;
}
Loading