ML Performance Optimization Labs

Overview

This repository contains the code and lab work from a semester-long project focused on performance analysis and optimization techniques for machine learning workloads, all implemented in C and run on CPUs.

The project involves building a foundational machine learning library and systematically optimizing it through several techniques, culminating in a GPT-2 implementation for CPU.

Project Highlights

Initial Implementation
- Developed a foundational machine learning library in C for the forward pass of a small CNN.
Performance Analysis
- Conducted bottleneck analysis and profiled code performance.
Optimizations Applied
- Tiling and Blocking
- Sparse Matrix Multiplication
- Multithreading using pthreads and OpenMP
Final Model
- Implemented GPT-2 optimized to run efficiently on CPU.

Repository Structure

Branches
Each branch corresponds to a specific stage of the project:
- base: Initial CNN forward-pass implementation
- tiling-blocking: Optimized with tiling and blocking
- sparse-matrix-mul: Sparse matrix multiplication
- multithreading: Multithreaded implementation
- gpt2-cpu: GPT-2 final optimized implementation

Usage Instructions

Cloning the Repo

git clone https://github.com/rachnaumesh/ML-Perf-Labs.git
cd ML-Perf-Labs

Switching Branches

To check out a specific stage:

git checkout <branch-name>

Building and Running

All projects include a Makefile.
To build:

make

To run:

./<executable-name>

Tools and Techniques

Profiling Tools

gprof
perf
toplev from pmu-tools

Programming Techniques

Blocking
Tiling
Multithreading

Libraries/Frameworks

pthreads
OpenMP

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
kernel		kernel
perf		perf
profiling		profiling
tests		tests
utils		utils
.DS_Store		.DS_Store
Makefile		Makefile
README.md		README.md
test_conv		test_conv
test_linear		test_linear
test_matmul		test_matmul
test_relu		test_relu
test_softmax		test_softmax

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Performance Optimization Labs

Overview

Project Highlights

Repository Structure

Usage Instructions

Cloning the Repo

Switching Branches

Building and Running

Tools and Techniques

Profiling Tools

Programming Techniques

Libraries/Frameworks

About

Releases

Packages

Languages

rachnaumesh/ML-Perf-Labs

Folders and files

Latest commit

History

Repository files navigation

ML Performance Optimization Labs

Overview

Project Highlights

Repository Structure

Usage Instructions

Cloning the Repo

Switching Branches

Building and Running

Tools and Techniques

Profiling Tools

Programming Techniques

Libraries/Frameworks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages