Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU/OpenCL] Check fp16(half) support #2608

Merged
merged 1 commit into from
Jun 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions nntrainer/opencl/opencl_context_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,31 @@ bool ContextManager::CreateDefaultGPUDevice() {
device_id_ = devices[0];
platform_id_ = platform_id_;

#ifdef ENABLE_FP16
// check for fp16 (half) support available on device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which half-precision does it support? _Float16, __fp16, or half

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally all of _Float16, __fp16 and half can be used interchangeably inside kernels when cl_khr_fp16 extension is enabled. I have tested with __fp16 and half myself, they worked exactly the same. However I did not test with __Float16 but it is supported by OpenCL 2.1 or later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

half is specifically designed for use in OpenCL kernels. Some are architecture specific (e.g: __fp16 for ARM). So half will be more portable and benefit from further optimizations..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the clarification! if using the half type has advantages, should it be added as a tensor data type?

#ifdef ENABLE_FP16
#ifdef USE__FP16
#define _FP16 __fp16
#else
#define _FP16 _Float16
#endif
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

half will only work inside OpenCL kernels. So I think adding it to tensor data type won't be useful in this scenario.

// getting extensions
size_t extension_size;
status =
clGetDeviceInfo(device_id_, CL_DEVICE_EXTENSIONS, 0, NULL, &extension_size);
if (status != CL_SUCCESS) {
ml_loge("clGetDeviceInfo returned %d", status);
return false;
}

char extensions[extension_size];
status = clGetDeviceInfo(device_id_, CL_DEVICE_EXTENSIONS, extension_size,
extensions, NULL);
if (status != CL_SUCCESS) {
ml_loge("clGetDeviceInfo returned %d", status);
return false;
}

if (std::string(extensions).find("cl_khr_fp16") == std::string::npos) {
ml_loge("fp16 (half) is not supported by device");
return false;
}
#endif

return true;
}

Expand Down
2 changes: 2 additions & 0 deletions nntrainer/opencl/opencl_loader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ bool LoadOpenCL() {
void LoadOpenCLFunctions(void *libopencl) {
LoadFunction(clGetPlatformIDs);
LoadFunction(clGetDeviceIDs);
LoadFunction(clGetDeviceInfo);
LoadFunction(clCreateContext);
LoadFunction(clCreateCommandQueue);
LoadFunction(clCreateBuffer);
Expand All @@ -91,6 +92,7 @@ void LoadOpenCLFunctions(void *libopencl) {

PFN_clGetPlatformIDs clGetPlatformIDs;
PFN_clGetDeviceIDs clGetDeviceIDs;
PFN_clGetDeviceInfo clGetDeviceInfo;
PFN_clCreateContext clCreateContext;
PFN_clCreateCommandQueue clCreateCommandQueue;
PFN_clCreateBuffer clCreateBuffer;
Expand Down
6 changes: 6 additions & 0 deletions nntrainer/opencl/opencl_loader.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ typedef cl_int(CL_API_CALL *PFN_clGetDeviceIDs)(
cl_uint /**< num_entries */, cl_device_id * /**< devices */,
cl_uint * /**< num_devices */);

typedef cl_int(CL_API_CALL *PFN_clGetDeviceInfo)(
cl_device_id /**< device */, cl_device_info /**< param_name */,
size_t /**< param_value_size */, void * /**< param_value */,
size_t * /**< param_value_size_ret */);

typedef cl_context(CL_API_CALL *PFN_clCreateContext)(
const cl_context_properties * /**< properties */, cl_uint /**< num_devices */,
const cl_device_id * /**< devices */,
Expand Down Expand Up @@ -133,6 +138,7 @@ typedef cl_int(CL_API_CALL *PFN_clReleaseMemObject)(cl_mem /**< memobj */);

extern PFN_clGetPlatformIDs clGetPlatformIDs;
extern PFN_clGetDeviceIDs clGetDeviceIDs;
extern PFN_clGetDeviceInfo clGetDeviceInfo;
extern PFN_clCreateContext clCreateContext;
extern PFN_clCreateCommandQueue clCreateCommandQueue;
extern PFN_clCreateBuffer clCreateBuffer;
Expand Down