-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD GPU support for conv, gemm, rnn for fp32. #89
Conversation
Add GEMM and CNNs
Thanks for creating this PR! We're excited to have support for AMD in DeepBench. I'll review the code later this week. A couple of questions/thoughts:
|
remove half.hpp dependency
add instruction for compiling and running AMD benchmark
Done. |
@sharannarang Ping. |
start = std::chrono::steady_clock::now(); | ||
|
||
for (int i = 0; i < num_repeats; ++i) { | ||
// Backward pass wrt weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The comment is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean weights
should be params
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function below is backward_inputs
. So, the comment should be "Backward pass wrt inputs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why, but git is not playing nice here. If you open the source-file, the comment is correct. So there is nothing to be done here.
|
||
int main(int argc, char **argv) { | ||
|
||
int num_repeats = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to keep the default for num_repeats
at 1000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently different num_repeats
are used for different platforms. Example here the value is 50
. Can you please point out where the default
info is mentioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, It would be good to clean this up on our part. Will do that as a separate PR. Can you confirm that the results have at least num_repeats
set to 1000?
|
||
## Compiling | ||
|
||
The `Makefile` in `code/amd` is for an AMD `gfx900` GPU. To benchmark other generations, please modify the `Makefile` accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the user does need to specify paths to HIPCC
and MIOpen
. Would be good to specify the environment variables that need to be passed with the make
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please address this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
trainspace_size_byte_) ); | ||
} | ||
|
||
void backward_data( Tensor<T> y, Tensor<T> dy, Tensor<T> dhy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you pointed out in #87 , Nvidia implementations are missing backprop with respect to weights. We will fix that. For the AMD release, you can fix as a part of this PR or create a separate one later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us create a separate PR.
} | ||
|
||
|
||
int main(int argc, char **argv) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add inference support similar to the RNN benchmark? It would be good the same functionality in both benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is to enable training support. Inference support including convolutions is planned as part of an upcoming PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, sounds good. I was bringing this up since the convolution benchmark had an inference flag whereas the RNN benchmark didn't. So, thought it might be good to be consistent. But I'll you decide how to manage that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo the minor requested changes.
This PR adds the support for AMD GPUs for convolutions, gemm, and RNN benchmarks.