Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD GPU support for conv, gemm, rnn for fp32. #89

Merged
merged 33 commits into from
May 2, 2018

Conversation

dagamayank
Copy link
Contributor

This PR adds the support for AMD GPUs for convolutions, gemm, and RNN benchmarks.

@sharannarang
Copy link
Contributor

Thanks for creating this PR! We're excited to have support for AMD in DeepBench.

I'll review the code later this week. A couple of questions/thoughts:

  1. Are you planning to add results for AMD GPUs as well?
  2. Could you also update the README with instructions on how to build the code? Something similar to Nvidia Benchmarks would be great.

@dagamayank
Copy link
Contributor Author

@sharannarang

A couple of questions/thoughts:

  • Are you planning to add results for AMD GPUs as well?
  • Could you also update the README with instructions on how to build the code? Something similar to Nvidia Benchmarks would be great.

Done.

@dagamayank
Copy link
Contributor Author

@sharannarang Ping.

start = std::chrono::steady_clock::now();

for (int i = 0; i < num_repeats; ++i) {
// Backward pass wrt weights
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The comment is incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean weights should be params?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function below is backward_inputs. So, the comment should be "Backward pass wrt inputs".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why, but git is not playing nice here. If you open the source-file, the comment is correct. So there is nothing to be done here.


int main(int argc, char **argv) {

int num_repeats = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to keep the default for num_repeats at 1000.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently different num_repeats are used for different platforms. Example here the value is 50. Can you please point out where the default info is mentioned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, It would be good to clean this up on our part. Will do that as a separate PR. Can you confirm that the results have at least num_repeats set to 1000?


## Compiling

The `Makefile` in `code/amd` is for an AMD `gfx900` GPU. To benchmark other generations, please modify the `Makefile` accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the user does need to specify paths to HIPCC and MIOpen. Would be good to specify the environment variables that need to be passed with the make command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please address this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

trainspace_size_byte_) );
}

void backward_data( Tensor<T> y, Tensor<T> dy, Tensor<T> dhy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out in #87 , Nvidia implementations are missing backprop with respect to weights. We will fix that. For the AMD release, you can fix as a part of this PR or create a separate one later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us create a separate PR.

}


int main(int argc, char **argv) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add inference support similar to the RNN benchmark? It would be good the same functionality in both benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to enable training support. Inference support including convolutions is planned as part of an upcoming PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds good. I was bringing this up since the convolution benchmark had an inference flag whereas the RNN benchmark didn't. So, thought it might be good to be consistent. But I'll you decide how to manage that.

Copy link
Contributor

@sharannarang sharannarang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo the minor requested changes.

@sharannarang sharannarang merged commit 57a0562 into baidu-research:master May 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants