Gen Backend with CI for Fork and Upstream Tests

This is a combination of 58 commits. Gen Backend with CI Gen Backend that works with GEN folder Gen Backend that works with GEN folder This is a combination of 8 commits. Gen Backend that works This is a combination of 3 commits. Gen Backend that works This is a combination of 170 commits. Buildable backend This is a combination of 73 commits. Gen HipBackend This is a combination of 48 commits. gen TritonGPUROCMDialect This is a combination of 2 commits. Gen TritonGPUROCMDialect This is a combination of 33 commits. add scripts remove hip dir check and remove hip folder save ref code chmod 777 copied files traverse asts for files process everything print preproc directives edit def and ifdef but second edit corrupt good edits edit defines add clang parse add lib ref rewrite includes add rewrite namespace function reparse code after rewrite rewrite use_rocm rework analysis folder fix types.cpp error add debug flag fix bug in rewrite_namespace. seperate out types rewrite just replace gpu:: to gpu_rocm for now add cmake parsing select file to process add cmake rewrites add tablegen translate TritonGPUDialect rewrite tablen include, class and parent classes change #ifdefs in tablegen code strings map functions and types in cpp split up tablegen rewrites search tablegen files for MAP entries split translation map by language and type of translation clean rewrite cpp define follows same template new TritonGPUROCM IR This is a combination of 3 commits. add main conversion pass and dialect transform remove conversions new TritonGPUROCM IR rewrite specific file fix dir rewrites use AMD instead of ROCM fix cmake bug back to GPUROCM for now throw exceptsion on failed rewrite rewrite namespace_alias_definition fixes improve tablegen string rewrite new IR/Transforms headers Debug Lists, Flags, Fix Namespace Issue, Pretty Print fix utility.h bug good transform files can use paths from root now add lang to cpp rewrite funcs rewrite lambdas black and add_edit function simplify namespace renames rewrite error nodes more transforms transforms done good cmake for transform add Conversion Ref good conversion cmake TritonToTrtionGPU stuff 1 tritontotpgu header triton to tritongpu conversion some fixes clean up namespace good ttir to ttgir cpp ttgir conversion pass done bring TritonGPUROCMToLLVM TritonGPUToLLVM headers working work TritonGPUToLLVM cpp files save good cpp add second to last cpp cpp namespace function runs at the end fix double rewrites finish conversion passes add AnalysisROCM from triton-mlir good Analysis good analyisi rewrite field declarations save debug target runs on debug update includes save before refactor refactor apply edits stage restructre code remove duplication fix utiltly.h put in everything back remove chmod and minor issues remove everything bring back triton-mlir gen_backend remove translate map stage in cpp. USe in specific rewrite functions format script headers are good remove double rewrites in HSACO run all make script use gitpython and also run relative to current dir save cmake chmod rewritten file strict check on for cmake args chmod python/triton/third_party/hip/CMakeLists.txt define TRITONGPU_DEFAULT_WARPSIZE chmod works on file add argparse create src -> dst map smart dst generation works gen abs paths fix issue with path rewrite fix git clone issue map ASMBuilder and MLIRGPUops cmake modules rewrite mlir::triton::gpu also just make ifu level upstream commit ignore TMAMetadataTy rewrite base class add path arg replace TritonGPU in using declrations narrow TMAMetadataTy ignore fix rewrite issues with Fields get original source path throw error if path gen fails rewrite return types rewrite function params general class rewrite check every rewrite more subtle rewrite check that use rewrite tuple simplify namespace rewrite smart namespace rewrite simplify type rewrites create traverse_node_and_exec_fn better valid namespace check rewrite functions simplified rewrite namespaces in fields fix bug with function args rewrite rewrite namespace defintions rewrite calls in functions fix bug with nested namespaces nice unify range checks traverse only qualified_identifier in namespace rewrite rewrite namespace in function_declarators deal with namespace_alias_definition rewrite qualified function names namespace rewrite in macro args use triton:: for any tritongpu usage simpler is_in_range check and add special rule for bad file Allocation.cpp field_initializer can start with template method remove print ignore TMAInfo rename addExternalLibs to addExternalLibsROCM rewrite the args simplify rewrite_cmake_arguments New Name for Docs try again pass rewrite map as arg make sure source is utf-8 change triton.cc encoding add filter function rename check if it builds declar function use rewrite map from json, gen rocm_backend_for_triton.cc merge language and file rewrite rules run everything format script save triton_rocm.cc fix path bugs disable llvmir import nice print save remove isrocm with out removing comma remove comma ignore createTritonGPURewriteTensorPointerPass add archinfo rewrite define gen SysROCM copy hsa headers remove print gen LLVMROCMIR improve LLVMROCMIR gen and also unify defines fix include bug add include for l comment out createConvertNVGPUToLLVMPass add TritonLLVMROCMIR in build step comment out code that is not needed try renaming update comment out init_triton_runtime comment out stuff fix import create hip_bindings.py gen hip_bindings.py save compiler.py change rewrite python bindings change import fix relative import save remove bad imports remove ref to runtime remove translate_llvmir_to_ptx rewrite relative imports copy gpu_matrix_core_version get new context remove arith dialect remove everything except TritonGPUROCMDialect add LLVM dialect update comment out registeration comment out regisreation empty context gen TritonROCM rewrite headers use TritonGPU fix build bug use TTROCM_ fix tablegen fix tablegen include bug fix more bugs handle path bug rewrite tablegen values fix no ascii chars fix allocation.cpp asci issue fix TritonOps ascii char bug fix TritonDialect bug tablegen deps are TritonROCMDialect fix asci bug in TritonGPUops use triton_rocm namespace more triton_rocm rewrite tablegen dialect strictly seperate out comment out gen PointerType correctly change cpp namespaces remove triton:: in td files skip nvgpu save mp code rename trait functions fix in tablegen try again rewrite friends add MLIRTypes use MLIRTypesROCM rewrite preproc fund def more bug fixes rewrite in for range nvgpu enable again back rewrite identifer in qualifed id add base class fix fix tablegen string bug user triton for nv_gpu rewrite on pointer exps fix bug add TritonNvidiaGPU dialect start tritonnv stuff add nvgpu just gen everything gen NVGPUROCM fix TritonNvidiaGPUROCMTransforms include save bug fix save save fix more rewrite TritonNVGPU funcs tqdm create gen expand more qua_identifer rewrites rewrite preproc args fix more qi issue fix cmake module bugs rewrite_map is all files save cpp progress add filter arg working td good td better td clean up tablgen string rewrites add PTX aswell! gen ptxrocm gen all fix bug fix symbol issue just pass everything gen code_generator_rocm gen more python try gen python folders rewrite dirs with dest in map test it out add more files uncomment out all functions remove inits add everything add inits back commend out get_backend imports rewrite imports add print add dialects move add_plan_cta_pass to cuda gen debug output create new context! remove print rename types and imports for py get src path from dest path better import rewrites parallel flag minor bugs working py imports? rename py strings loss chmod 777 everything fix namespace reg issue. Just ttir to file and then parse again comment out prints triton_gpu_rocm. to triton_gpu. remove ttir stuff use temp files move shift code to backend add shift ops to the right file update gened code min diff update gen convert clusterinfo get attrs update gened files just use _triton remove gened stuff save use gen folder rewrite filenames rewrite cmake files correctly fix relative import fix more import errors regen backend regen after IFU rm -rf tree_sitter repo gen include, lib, python chmod 660 fix namespace bug fix python issues fix bug skip test_bin_op_constexpr tests fix bug remove prints remove prints clean up scripts clean up HSACOTranslation.h update gen add upstream test fix bug cancel in progress run on rocm restore offline tests skip pre_commit for now and add precommit script add script to diff upstream update tests fix bugs just one test minor bugs deal with empty path minor bug clone submodules on upstream test bug with diff Check out Shared Commit minor bug use my fork of upstream minor bug keep name fo diffs try again show commit and dir pwd no upstream test install torch to detect ROCM run everything query warp size have default for queried warp size skip test_gemm failure skip test_print bug match jenkins ci just test_core_amd.py clean everything before test don't clear /tmp print archinfo Update hip_backend to use libhsa-runtime for arch info, brings in path changes for pytorch triton wheels print failure remove try block print getArchInfo call minor bug fail if arch_info is none docker action disable fork test run on pr try hi leave it for now gracefully checkout fix minor bug wrapper for get_arch_info run all of test_core_amd add test_subprocess add build upstream script update gen update build_upstream script check for TRITON_CODEGEN_AMD_HIP_BACKEND clean up build upstream script update upstream build step fix cmake if bug print build upstream step skip test_cas fetch before backend checkout just run upstream test fetch all checkout branch name use github env variable try head ref add set branch name show git log use patch use absolute path for script path git status try with quotes install on system run and test pick python remove workerss try personal branch fix bug don't checkout just run pytest try what they do
ROCm · Dec 21, 2023 · 3f09070 · 3f09070
1 parent 1e2fd0d
commit 3f09070
Show file tree

Hide file tree

Showing 308 changed files with 52,482 additions and 8,830 deletions.
diff --git a/.github/workflows/amd-offline-tests.yml b/.github/workflows/amd-offline-tests.yml
@@ -1,4 +1,4 @@
-name: AMD Offline Tests
+name: AMD Tests
 
 on:
   workflow_dispatch:
@@ -7,13 +7,34 @@ on:
       - main
       - triton-mlir
 
+concurrency:
+  group: ${{ github.ref }}
+  cancel-in-progress: true
+
 jobs:
-  Integration-Tests:
-    runs-on: "ubuntu-latest"
+  Runner-Preparation:
+    runs-on: ubuntu-latest
+    outputs:
+      matrix-required: ${{ steps.set-matrix.outputs.matrix-required }}
+    steps:
+      - name: Prepare runner matrix
+        id: set-matrix
+        run: |
+          if [ x"${{ github.repository }}" == x"ROCmSoftwarePlatform/triton" ]; then
+            echo '::set-output name=matrix-required::[["self-hosted", "rocm"]]'
+          else
+            echo '::set-output name=matrix-required::["ubuntu-latest"]'
+          fi
+
+  Fork-Tests:
+    if: false
+    needs: Runner-Preparation
 
-    container:
-      image: ubuntu:22.04
-      options: --user root
+    runs-on: ${{ matrix.runner }}
+
+    strategy:
+      matrix:
+        runner: ${{fromJson(needs.Runner-Preparation.outputs.matrix-required)}}
 
     steps:
       - name: Checkout
@@ -22,36 +43,39 @@ jobs:
       - name: Clear cache
         run: |
           rm -rf ~/.triton
+          rm -rf python/triton.egg-info
+          rm -rf python/.pytest_cache
+          rm -rf python/tests/__pycache__
+          rm -rf python/build
+          # rm -rf /tmp/* # NOTE: no permission to do this in CI
 
       - name: Update PATH
         run: |
           echo "PATH=${HOME}/.local/bin:${PATH}" >> "${GITHUB_ENV}"
 
-      - name: Prerequisite
-        run: |
-          apt update
-          apt install -y libpython3-dev python3-pip git wget
-          # get rocm related utilities
-          wget https://repo.radeon.com/amdgpu-install/5.5/ubuntu/jammy/amdgpu-install_5.5.50500-1_all.deb
-          apt-get install -y ./amdgpu-install_5.5.50500-1_all.deb
-          amdgpu-install -y --usecase=rocm --no-dkms
-          # remove unused packages to free some disk space
-          cp /opt/rocm/llvm/bin/ld.lld .
-          apt purge -y rocblas rocm-llvm
-          mkdir -p /opt/rocm/llvm/bin
-          mv ld.lld /opt/rocm/llvm/bin/ld.lld
-          # install pytorch
-          pip3 install torch==2.0.1 --index-url https://download.pytorch.org/whl/rocm5.4.2
-          # uninstall system triton to eliminate conflicts with testing version
-          pip3 uninstall -y pytorch-triton-rocm
+      - name: Clean runner
+        run: |
+          pip3 uninstall -y triton
+  
+      - name: Install Dependencies
+        run: |
+          python3 -m pip install --upgrade pip
+          python3 -m pip install ninja
+          python3 -m pip install cmake==3.24
+          python3 -m pip install torch==1.13.1 --index-url https://download.pytorch.org/whl/rocm5.2 # important for detecting ROCM!
 
       - name: Install Triton
         run: |
           cd python
-          pip3 install ninja
           # Install in system, because need to override system triton. Otherwise lit tests will use wrong version
           DEBUG=TRUE TRITON_USE_ROCM=TRUE TRITON_USE_ASSERT_ENABLED_LLVM=TRUE python3 -m pip install --no-build-isolation -vvv -e .
-
+      
+      - name: Run Unit Tests
+        run: |
+          pytest -n 32 --capture=tee-sys -rfs --verbose \
+            python/test/unit/language/test_core_amd.py \
+            python/test/unit/language/test_subprocess.py
+    
       - name: Run lit tests
         run: |
           python3 -m pip install lit
@@ -67,3 +91,115 @@ jobs:
           cd python
           cd "build/$(ls build | grep -i cmake)"
           ctest
+
+  Upstream-Tests:
+    if: true
+    needs: Runner-Preparation
+
+    runs-on: ${{ matrix.runner }}
+
+    strategy:
+      matrix:
+        runner: ${{fromJson(needs.Runner-Preparation.outputs.matrix-required)}}
+
+    steps:
+      # - name: Checkout
+      #   uses: actions/checkout@v3
+
+      - name: Clear cache
+        run: |
+          rm -rf ~/.triton
+          rm -rf python/triton.egg-info
+          rm -rf python/.pytest_cache
+          rm -rf python/tests/__pycache__
+          rm -rf python/build
+          rm -rf /tmp/tmp*
+          rm -rf /tmp/triton_upstream
+          rm -rf .pytest_cache
+
+      - name: Update PATH
+        run: |
+          echo "PATH=${HOME}/.local/bin:${PATH}" >> "${GITHUB_ENV}"
+
+      - name: Check pre-commit
+        # TODO: enable pre-commit tests
+        if: false
+        run: |
+          python3 -m pip install --upgrade pre-commit
+          python3 -m pre_commit run --all-files --verbose
+
+      - name: Clean runner
+        run: |
+          pip3 uninstall -y triton
+  
+      - name: Install Dependencies
+        run: |
+          python3 -m pip install --upgrade pip
+          python3 -m pip install ninja
+          python3 -m pip install cmake==3.24
+          python3 -m pip install torch==1.13.1 --index-url https://download.pytorch.org/whl/rocm5.2 # important for detecting ROCM!
+      
+      - name: Set branch name
+        run: |
+          if [ "${{ github.event_name }}" = "pull_request" ]; then
+            echo "BRANCH_NAME=${{ github.head_ref }}" >> $GITHUB_ENV
+          else
+            echo "BRANCH_NAME=$(git rev-parse --abbrev-ref HEAD)" >> $GITHUB_ENV
+          fi
+      
+      - name: Build Upstream with Branch as Backend
+        run: |
+          set -x
+          SCRIPT_PATH=`realpath "./scripts/amd"`
+
+          # get current branch
+          BRANCH_NAME=${{ env.BRANCH_NAME }}
+          echo $BRANCH_NAME
+          
+          # get the last commit that AMD's fork shares with upstream triton
+          pip install gitpython
+          output=$(python3 $SCRIPT_PATH/diff_upstream.py --upstream https://github.com/openai/triton --fork https://github.com/ROCmSoftwarePlatform/triton)
+          upstreamcommit_hash_line=$(echo "$output" | grep 'SHARED_COMMIT_HASH=')
+          UPSTREAM_COMMIT_HASH=$(echo "$upstreamcommit_hash_line" | cut -d '=' -f2)
+          echo $UPSTREAM_COMMIT_HASH
+          
+          # check out the upstream at shared last commit
+          UPSTREAM_REPO_DIR=/tmp/triton_upstream
+          rm -rf $UPSTREAM_REPO_DIR
+          # git clone --recurse-submodules https://github.com/openai/triton $UPSTREAM_REPO_DIR #TODO: use when backend is upstream
+          git clone --recurse-submodules --branch update_backend_current_ifu https://github.com/micmelesse/triton $UPSTREAM_REPO_DIR #TODO: remove when backend is upstream
+          cd $UPSTREAM_REPO_DIR
+          # git checkout $UPSTREAM_COMMIT_HASH #TODO: use when backend is upstream
+          git log -1 --pretty=format:"%H, %an, %ad, %s"
+          # git apply $SCRIPT_PATH/new_backend.patch # apply patch until upstreamed
+          git status
+          
+          # checkout backend in upstream
+          cd $UPSTREAM_REPO_DIR/third_party/amd_hip_backend
+          echo `pwd`
+          git fetch --all
+          git checkout $BRANCH_NAME
+          git log -1 --pretty=format:"%H, %an, %ad, %s"
+          
+          # build upstream with this branch as a backend
+          cd $UPSTREAM_REPO_DIR
+          cd $UPSTREAM_REPO_DIR/python
+          pip uninstall -y triton
+          # pip install -U matplotlib pandas filelock tabulate
+          
+          echo `pwd`
+          export TRITON_CODEGEN_AMD_HIP_BACKEND=1
+          python3 -m pip install --no-build-isolation -vvv -e .
+
+          cd $UPSTREAM_REPO_DIR
+          cd python/test/unit/language
+          python3 -m pytest --capture=tee-sys -rfs --verbose "test_core.py"
+
+      # - name: Test Upstream
+      #   run: |
+      #     set -x
+      #     cd /tmp/triton_upstream
+      #     git status
+      #     pytest -n 32 --capture=tee-sys -rfs --verbose "python/test/unit/language/test_core.py::test_empty_kernel[float32]"
+
+
diff --git a/.gitignore b/.gitignore
@@ -44,3 +44,5 @@ docs/getting-started/tutorials
 /compile_commands.json
 .vscode
 .vs
+log*
+*.diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -1,14 +1,23 @@
 cmake_minimum_required(VERSION 3.18)
 
-# build third_party backend for upstream and exit
-# TODO: check if this is working in CI
-if(HIP_BACKEND_MODE)
+# ///////////////////////////////////////////////////// Triton AMD backend Build ///////////////////////////////////////////////////////////////////
+# print all env variables
+execute_process(COMMAND "${CMAKE_COMMAND}" -E environment OUTPUT_VARIABLE ENV_VARS)
+message(STATUS "Environment Variables:\n${ENV_VARS}")
+
+# set a CMake variable to the value of the environment variable
+set(TRITON_CODEGEN_AMD_HIP_BACKEND $ENV{TRITON_CODEGEN_AMD_HIP_BACKEND})
+
+# check if the CMake variable is set and equal to "1"
+if(TRITON_CODEGEN_AMD_HIP_BACKEND AND TRITON_CODEGEN_AMD_HIP_BACKEND STREQUAL "1")
     message(STATUS "ROCM Third Party Backend Mode is ON")
+    set(ENV{TRITONGPU_DEFAULT_WARPSIZE} "64")
     add_subdirectory(python/triton/third_party/hip)
     return()
 endif()
 
-# stand alone build
+
+# ///////////////////////////////////////////////////// Triton AMD Standalone Build ///////////////////////////////////////////////////////////////////
 if(POLICY CMP0116)
 # Introduced in cmake 3.20
 # https://cmake.org/cmake/help/latest/policy/CMP0116.html

diff --git a/include/triton/Dialect/Triton/IR/TritonOps.td b/include/triton/Dialect/Triton/IR/TritonOps.td
@@ -691,7 +691,7 @@ def FuncOp : TT_Op<"func", [AffineScope, AutomaticAllocationScope, CallableOpInt
     attribute like SymbolRefAttr). An external function declaration (used when
     referring to a function declared in some other module) has no body. While
     the MLIR textual form provides a nice inline syntax for function arguments,
-    they are internally represented as “block arguments” to the first block in
+    they are internally represented as "block arguments" to the first block in
     the region.
 
     Only dialect attribute names may be specified in the attribute dictionaries

diff --git a/include/triton/Dialect/TritonGPU/IR/TritonGPUOps.td b/include/triton/Dialect/TritonGPU/IR/TritonGPUOps.td
@@ -99,7 +99,7 @@ def TTG_InsertSliceOp : TTG_Op<"insert_slice",
   let summary = "insert slice";
 
   let description = [{
-      This operation inserts a tensor `$src` into another tensor `$dst` as specified by the operation’s
+      This operation inserts a tensor `$src` into another tensor `$dst` as specified by the operation's
       `$index` argument and `$axis` attribute.
 
       It returns a copy of `$dst` with the proper slice updated with the value of `$src`.
@@ -255,7 +255,7 @@ def TTG_InsertSliceAsyncOp : TTG_Op<"insert_slice_async",
   let summary = "insert slice async";
 
   let description = [{
-      This operation inserts a tensor `$src` into another tensor `$dst` as specified by the operation’s
+      This operation inserts a tensor `$src` into another tensor `$dst` as specified by the operation's
       `$index` argument and `$axis` attribute.
 
       It returns a copy of `$dst` with the proper slice updated asynchronously with the value of `$src`.

diff --git a/include/triton/Target/HSACO/HSACOTranslation.h b/include/triton/Target/HSACO/HSACOTranslation.h
@@ -39,37 +39,10 @@ class LLVMContext;
 
 namespace mlir {
 namespace triton {
-
-// add external libs to modules
-void addExternalLibsROCM(mlir::ModuleOp &module,
-                     const std::vector<std::string> &names,
-                     const std::vector<std::string> &paths);
-
-// Translate Triton dialect to TritonGPU, return null if failed.
-void translateTritonToTritonGPUROCM(mlir::ModuleOp &module, int computeCapability,
-                                int numWarps, int numStages);
-
-// Translate Triton GPU to mlir LLVM dialect, return null if failed.
-void translateTritonGPUROCMToLLVMDialect(mlir::ModuleOp &module,
-                                     int computeCapability, bool isROCM);
-
-// Translate mlir LLVM dialect to LLVMIR, return null if failed.
-std::unique_ptr<llvm::Module>
-translateLLVMDialectToLLVMIR(llvm::LLVMContext *llvmContext,
-                             mlir::ModuleOp module, bool isROCM);
-
 // Translate LLVMIR to HSACO code.
 std::tuple<std::string, std::string>
 translateLLVMIRToHSACO(llvm::Module &module, std::string gfx_arch,
                        std::string gfx_triple, std::string gfx_features);
-
-std::tuple<std::string, std::string>
-translateTritonIRToHSACO(mlir::ModuleOp module, std::string gfx_arch,
-                         std::string gfx_triple, std::string gfx_features,
-                         int numWarps, int numStages,
-                         const std::vector<std::string> &names,
-                         const std::vector<std::string> &paths);
-
 } // namespace triton
 } // namespace mlir
 

diff --git a/lib/Analysis/Allocation.cpp b/lib/Analysis/Allocation.cpp
@@ -265,7 +265,7 @@ class AllocationAnalysis {
         // benzh@maybe alignment should be passed in.
         // Software swizzling calculates phase based on offset, while hardware
         // swizzling do that based on physical address. Thus only by setting the
-        // alignment to 1024 can ensure the correctness. 
+        // alignment to 1024 can ensure the correctness.
         if (bytes > 256)
           kAlignment = 1024;
         allocation->addBuffer<BufferT::BufferKind::Explicit>(result, bytes,