Exploring Parallel T-distributed Stochastic Neighbor Embedding - Midterm Report

Completed work

We thoroughly investigated the tSNE algorithm and the open source implementations using Barnet-Hut approximation from https://github.com/lvdmaaten/bhtsne/ and https://github.com/DmitryUlyanov/Multicore-TSNE. We converted the entire codebase from double to float in consideration of better vectorization.

We created a testbench using Python and Numpy. This testbench includes 5 test fixtures created from MNIST (https://www.kaggle.com/datasets/oddrationale/mnist-in-csv) and CLIP (https://github.com/openai/CLIP) model embeddings of various sizes for us to quickly test our implementation as well as benchmarking its performance.

Test case	Source	Number of points	Size of point dimension
simple_200x64	Scikit Learn digit data	200	64
easy_1797x64	Scikit Learn digit data	1797	64
medium_10000x768	MNIST (https://www.kaggle.com/datasets/oddrationale/mnist-in-csv)	10000	768
hard_60000x768		60000	768
hard_62656x512	CLIP (https://github.com/openai/CLIP) image embeddings of a custom dataset	62656	512

The testbench also defines a single-precision floating-point matrix serialization format with reader/writer implemented in both C++ and Python such that we can easily create new test cases and load the computed t-SNE embeddings into Python for visualization and verification.

We reviewed the codebase and profiled the sequential version of the program to identify parallelizable parts. Using perf, we identified two computation-heavy loops suitable for parallelization: (1) computing pairwise distance for K Nearest Neighbors search on a ball tree, and (2) gradient computation. By parallelizing these two tasks using OpenMP and ISPC, we achieved significant speedup. Preliminary results are discussed in a later section.

Updated schedule

We are following our proposed schedule so far - we have finished most implementations for OpenMP and ISPC. The next step is to produce some benchmark results and start working on CUDA.

Date	Plan	Assigned to
Apr 9 - Apr 13	Benchmark OpenMP and ISPC on PSC machine	Yue
Apr 13 - Apr 15	Identify computation for CUDA acceleration	Zhe, Yue
Apr 16 - Apr 22	Implement CUDA acceleration	Zhe, Yue
Apr 23 - Apr 25	Benchmark CUDA on PSC	Yue
Apr 25 - Apr 28	Evaluation; Try IPSC+OpenMP+CUDA together	Zhe
Apr 29 - May 5	Report and presentation	Zhe, Yue

Poster session expectation

We plan to have the following content at the poster session: problem description, parallelization approaches (OpenMP, ISPC, and CUDA), results and evaluation.

Demo: we are not sure if there will be a demo at this stage. We might include a video demo of the embeddings during gradient descent.

We will include figures showing the results of our parallelization approaches.

Preliminary results

OpenMP speedup

	2	4	8
Step 1 (K Nearest Neighbors on Ball tree) 47.9s with 1 thread	1.99x	4x	6.6x
Step 2 (gradient descent) 45.6s with 1 thread	1.53x	2.25x	2.8x
Total 93.7s with 1 thread	1.74x	2.8x	4x

ISPC vectorized Euclidean distance computation

	AVX1 8x	AVX2 8x	AVX2 16x
Step 1 (K Nearest Neighbors on Ball tree) 47.9s with 1 thread and no vectorization	3.30x	3.80x	4.02x

Issues and concerns

We have 2 major concerns:

Our general direction of work so far is to locate potentially parallelizable code in the existing implementation and try to get better speedup using available tools for parallelization. Do we need to research and implement more parallelizable algorithms to replace existing code, such as better kNN search?
The CUDA version of this program could be challenging to implement. For example, we understand that CUDA doesn’t support tree structures very well. We wonder if there’s any good resources for us to learn more about CUDA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

midterm_report.md

midterm_report.md

Exploring Parallel T-distributed Stochastic Neighbor Embedding - Midterm Report

Completed work

Updated schedule

Poster session expectation

Preliminary results

Issues and concerns

Files

midterm_report.md

Latest commit

History

midterm_report.md

File metadata and controls

Exploring Parallel T-distributed Stochastic Neighbor Embedding - Midterm Report

Completed work

Updated schedule

Poster session expectation

Preliminary results

Issues and concerns