Skip to content

Commit

Permalink
[Dev] Improve benchmark scripts (#99)
Browse files Browse the repository at this point in the history
* Refactor BatchMatMulEmitter and BatchMatMulSelector for improved readability and maintainability

* Refactor import statements for improved readability and maintainability

* Refactor import statements for improved readability and maintainability

* disable failure email for ci

* remove email notifications.

* move relax pass from testing to mlc_llm

* Refactor scripts with se check_eual_ref_scripts_with_emitter function

* Lint Fix

* Refactor scripts with se check_eual_ref_scripts_with_emitter function

* bug fix in test

* lint fix.

* test cuda i4 kernel

* Refactor copyright notice in i4matmul.hpp

* Refactor BitBLASLinear test module for improved readability and maintainability

* refactor test as version below python 3.9 cannot handle int32 overflow.

* format lint for test

* Refactor test_int4b_fp16_convert.py for improved readability and maintainability

* remove unused design file

* move tile device from package to base

* dummy impl for codegen

* Refactor file structure for ladder_permutate module

* Refactor backend class and fix typos in comments

* Deep refactor Lib related code.

* remove ci pull.

* LintFix

* refactor builder for whl build

* Refactor TIRWrapper.wrap() method to include an assertion for the optimized module

* Refactor lib_generator to set library and source paths

* lint fix

* BitNet vllm integration

* chore: update codespell to version 2.3.0

* Lintfix

* Bump version to 0.0.1.dev13

* lint fix

* disable fast decoding [u]int4xint8 by default.

* optimize from dict design in Hint

* Implement SplitK

* bitnet benchmark generation.

* Add benchmark script for BitNet integration

* AtomicAdd Support

* LintFix

* ci fix when 3rdparty tvm is initialized.

* bug fix for setup

* fix a bug in block reduce

* typo fix

* BUG Fix for block reduce.

* Lint fix

* Refactor block reduce schedule template

* transform branch from bitblas to bitblas_tl

* Fix subproject commit reference in 3rdparty/tvm

* chore: update submodule branch from bitblas to bitblas_tl

* force update config.cmake

* Bug fix

* Fix subproject commit reference in 3rdparty/cutlass

* chore: Add submodule for cutlass library

* update tl cutlass path

* Refactor BitBLASLinear test module for improved readability and maintainability

* format fix

* Copy CUTLASS to the package directory

* Refactor setup.py to include additional TVM header files

* lint fix

* bug fix

* Refactor BitBLASLinear test module for improved readability and maintainability

* Implement Matmul Benchmark Design

* chore: Update BitBLAS Matmul benchmark script

* lint fix

* Refactor BitBLASMatmulOpsBenchmark for improved readability and maintainability

* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run

* lint fix

* Benchmark bot test

* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run

* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run

* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run

* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run

* Refactor BitBLASMatmulOpsBenchmark to disable tuning during benchmark run
  • Loading branch information
LeiWang1999 authored Jul 23, 2024
1 parent a2d3bb0 commit 75ce23e
Show file tree
Hide file tree
Showing 4 changed files with 213 additions and 25 deletions.
64 changes: 60 additions & 4 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
types: [created]

jobs:
benchmark_main:
benchmark:
if: github.event.issue.pull_request != '' && contains(github.event.comment.body, '/run-benchmark')
runs-on: self-hosted

Expand All @@ -14,21 +14,77 @@ jobs:
uses: actions/checkout@v2
with:
ref: main

- name: Get base branch commit ID
id: get_base_commit
run: echo "BASE_COMMIT_ID=$(git rev-parse HEAD)" >> $GITHUB_ENV

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.9'

- name: Activate virtual environment and install dependencies
run: |
source bitblas_benchmark/bin/activate
python -m pip install --upgrade pip
if [ -f requirements-dev.txt ]; then python -m pip install -r requirements-dev.txt; fi
- name: Install project in wheel mode
run: |
source bitblas_benchmark/bin/activate
python -m pip install .
- name: Matmul Benchmark
run: |
source bitblas_benchmark/bin/activate
cd benchmark/operators
python ./benchmark_ops_matmul.py
- name: Get PR branch commit ID
id: get_pr_commit
run: echo "PR_COMMIT_ID=$(git rev-parse HEAD)" >> $GITHUB_ENV

- name: Create virtual environment
run: python -m venv bitblas_benchmark

- name: Activate virtual environment and install dependencies
run: |
source bitblas_ci/bin/activate
source bitblas_benchmark/bin/activate
python -m pip install --upgrade pip
if [ -f requirements-dev.txt ]; then python -m pip install -r requirements-dev.txt; fi
- name: Install project in wheel mode
run: |
source bitblas_ci/bin/activate
python -m pip install .
source bitblas_benchmark/bin/activate
python -m pip install .
- name: Matmul Benchmark
run: |
source bitblas_benchmark/bin/activate
cd benchmark/operators
python ./benchmark_ops_matmul.py
- name: Compare benchmark results
run: |
source bitblas_benchmark/bin/activate
cd benchmark/operators
python ./compare_benchmark.py --base ${{ env.BASE_COMMIT_ID }} --head ${{ env.PR_COMMIT_ID }} 2>&1 | tee compare_results.txt
- name: Install GitHub CLI
run: |
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-key C99B11DEB97541F0
sudo apt-add-repository https://cli.github.com/packages
sudo apt update
sudo apt install gh
- name: Authenticate GitHub CLI
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
gh auth login --with-token <<< $GITHUB_TOKEN
- name: Post benchmark results
run: |
cat compare_results.txt
gh pr comment ${{ github.event.issue.number }} --body "$(cat compare_results.txt)"
55 changes: 41 additions & 14 deletions benchmark/operators/benchmark_ops_matmul.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from tabulate import tabulate
import json
from os import path, makedirs
from typing import Tuple
from typing import Tuple, Dict, List, Union

set_log_level("DEBUG")

Expand Down Expand Up @@ -87,9 +87,16 @@ def serialize_results(self) -> None:
)

# Save benchmark shapes into JSON
shapes = [(config.M, config.N, config.K)
for name, results in self.benchmark_results.items() for i, _ in enumerate(results)
for config in [self.benchmark_sets[name][i][1]]]
shapes: Dict[List[List[Union[List, int], int, int]]] = {}

# Iterate through the benchmark results to extract the shapes
for name, results in self.benchmark_results.items():
shapes[name] = []
for i, _ in enumerate(results):
config = self.benchmark_sets[name][i][1]
dyn_prof_shape = self.benchmark_sets[name][i][2]
shapes[name].append([config.M, config.N, config.K, dyn_prof_shape])

self._save_json(shapes, path.join(log_commit_path, self.BENCHMARK_SHAPES_FILE))

# Save device info into JSON
Expand All @@ -103,20 +110,40 @@ def _save_json(self, data, file_path):
with open(file_path, "w") as f:
json.dump(data, f)

def deserialize_results(self, log_path: str) -> None:
@classmethod
def deserialize_from_logs(cls, commit_id: str) -> None:
"""Deserialize benchmark results from JSON files."""
self.benchmark_results = self._load_json(path.join(log_path, self.BENCHMARK_RESULTS_FILE))
benchmark = cls()
commit_id_path = f"CommitID_{commit_id}"
log_commit_path = path.join(benchmark.log_path, commit_id_path)

shapes_file = path.join(log_path, self.BENCHMARK_SHAPES_FILE)
with open(shapes_file, "r") as f:
shapes = json.load(f)
# TODO: Reconstruction of benchmark_sets from shapes
del shapes
benchmark.benchmark_results = cls._load_json(
path.join(log_commit_path, cls.BENCHMARK_RESULTS_FILE))

self.benchmark_target = self._load_json(path.join(log_path,
self.BENCHMARK_DEVICE_FILE))["device"]
shapes_file = path.join(log_commit_path, cls.BENCHMARK_SHAPES_FILE)

def _load_json(self, file_path):
with open(shapes_file, "r") as f:
shapes = json.load(f)
for name, shape_list in shapes.items():
for shape in shape_list:
M, N, K, dyn_prof_shape = shape
benchmark.add_benchmark_set(
name,
[
benchmark.generate_op_unit(
benchmark.generate_operator_config(name, M, N, K),
dynamic_profiling_shape=dyn_prof_shape,
)
],
)

benchmark.benchmark_target = cls._load_json(
path.join(log_commit_path, cls.BENCHMARK_DEVICE_FILE))["device"]

return benchmark

@staticmethod
def _load_json(file_path):
"""Helper function to load JSON data from a file."""
with open(file_path, "r") as f:
return json.load(f)
Expand Down
102 changes: 102 additions & 0 deletions benchmark/operators/compare_benchmark.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

import argparse
from benchmark_ops_matmul import BitblasMatmulOpsBenchmark, HELPER_MESSAGE
from tabulate import tabulate
from typing import Tuple


def compare(base: BitblasMatmulOpsBenchmark, head: BitblasMatmulOpsBenchmark):
"""Generate and print a report of the benchmark results."""
for name, results in head.benchmark_results.items():
table_data = [
["TAG:", name, "Device:", head.benchmark_target],
[
"Shape (M-N-K / N-K_M)",
"Time (ms)",
"Throughput (TFLOPS)",
"Tune Time (s)",
],
]

def get_suffix(base, head):
symbol = "↑" if head > base else "↓" if head < base else "="
ratio = f"{((head - base) / base) * 100:.2f}%" if base is not None else "N/A"
return f"{symbol}({ratio})"

def legalize_shape(M, N, K, dyn_prof_shape):
"""Generate a string representation of the operator shape.
Args:
M: The M dimension (can be an int or a tuple).
N: The N dimension (must be an int).
K: The K dimension (must be an int).
dyn_prof_shape: The dynamic profiling shape (dict with 'M' key if M is dynamic).
Returns:
A string representing the shape in either 'M-N-K' or 'N-K_M' format.
"""
if isinstance(M, int):
return f"{M}-{N}-{K}"
elif dyn_prof_shape and "M" in dyn_prof_shape:
return f"{N}-{K}_{dyn_prof_shape['M']}"
else:
# Calculate the average of tuple M
opt_m = sum(M) / len(M)
return f"{N}-{K}_{opt_m}"

for i, (latency, tuning_time) in enumerate(results):
op_config = head.benchmark_sets[name][i][1]
dyn_prof_shape = head.benchmark_sets[name][i][2]
shape = legalize_shape(op_config.M, op_config.N, op_config.K, dyn_prof_shape)

benchmark_M = (
sum(op_config.M) /
len(op_config.M) if isinstance(op_config.M, Tuple) else op_config.M)

base_latency = base.benchmark_results[name][i][0]
if latency is not None:
throughput = (2 * benchmark_M * op_config.N * op_config.K / (latency * 1e-3) / 1e12)
base_throughput = (2 * benchmark_M * op_config.N * op_config.K /
(base_latency * 1e-3) / 1e12)
throughput = f"{throughput:.3f}{get_suffix(base_throughput, throughput)}"
else:
throughput = "N/A"

if base_latency is not None:
latency_str = f"{latency:.3f}{get_suffix(base_latency, latency)}"
else:
latency_str = "N/A"

base_tuning_time = base.benchmark_results[name][i][1]
if tuning_time is not None:
tuning_time_str = f"{tuning_time:.3f}{get_suffix(base_tuning_time, tuning_time)}"
else:
tuning_time_str = "N/A"

table_data.append([shape, latency_str, throughput, tuning_time_str])

print(tabulate(table_data, headers="firstrow", tablefmt="fancy_grid"))
print(HELPER_MESSAGE)


if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
"--base",
type=str,
help="the base commit id",
)
parser.add_argument(
"--head",
type=str,
help="the head commit id",
)
args = parser.parse_args()

base_benchmark = BitblasMatmulOpsBenchmark.deserialize_from_logs(args.base)

head_benchmark = BitblasMatmulOpsBenchmark.deserialize_from_logs(args.head)

compare(base_benchmark, head_benchmark)
17 changes: 10 additions & 7 deletions bitblas/benchmark/operator/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class BitblasOperatorBenchmarkBase(ABC):
enable_hardware_aware_tuning: bool = False

# Log path
log_path: Optional[str] = None
log_path: Optional[str] = path.join(get_default_cache_path(), "benchmark")

@abstractmethod
def prepare_benchmark_sets(self):
Expand All @@ -53,7 +53,6 @@ def add_benchmark_set(

def run(self, report=True, serialize=True, enable_tuning: bool = False):
"""Run the benchmark process."""
self.log_path = path.join(get_default_cache_path(), "benchmark")

if not path.exists(self.log_path):
makedirs(self.log_path)
Expand Down Expand Up @@ -135,11 +134,6 @@ def serialize_results(self) -> None:
"""Serialize the benchmark results."""
pass

@abstractmethod
def deserialize_results(self) -> None:
"""Deserialize the benchmark results."""
pass

def enable_tuning(self):
"""Enable hardware-aware tuning."""
self.enable_hardware_aware_tuning = True
Expand All @@ -151,3 +145,12 @@ def disable_tuning(self):
def set_log_path(self, log_path: str):
"""Set the log path."""
self.log_path = log_path

def set_benchmark_target(self, target: str):
"""Set the benchmark target."""
self.benchmark_target = target

def set_benchmark_results(self, results: Dict[str, List[Tuple[Optional[float],
Optional[float]]]]):
"""Set the benchmark results."""
self.benchmark_results = results

0 comments on commit 75ce23e

Please sign in to comment.