[Dev] Fix bugs for ROCm with our default TileLang Backend #282
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request includes significant changes to enhance the target detection functionality and streamline the codebase. The most important changes involve the introduction of a new
auto_detect_target
function, the removal of redundant code, and the consolidation of target detection logic.Enhancements to target detection:
bitblas/utils/target_detector.py
: Introduced theauto_detect_target
function to detect the computing target (CUDA or ROCm) based on the environment, replacing the previousauto_detect_nvidia_target
function in various parts of the codebase. [1] [2] [3]Codebase simplification:
bitblas/__init__.py
: Replacedauto_detect_nvidia_target
withauto_detect_target
in the import statement.bitblas/base/arch/__init__.py
: Removed the temporaryauto_infer_current_arch
function and added the newauto_detect_target
function. [1] [2] [3]bitblas/benchmark/operator/__init__.py
: Updated theBitblasOperatorBenchmarkBase
class to useauto_detect_target
instead ofauto_detect_nvidia_target
. [1] [2]bitblas/cache/operator.py
: Updated theload_global_ops_cache
function to useauto_detect_target
instead ofauto_detect_nvidia_target
. [1] [2]Note that currently only support consistent precision, dequantize op is coming soon.