AMD GPU support for conv, gemm, rnn for fp32. #89

dagamayank · 2018-03-30T22:40:03Z

This PR adds the support for AMD GPUs for convolutions, gemm, and RNN benchmarks.

This is dependent on ROCm v.1.7.1 and MIOpen v1.3
RNN benchmarks will only work on AMD GPUs with 16GB memory.

Add GEMM and CNNs

…nto amd_cnn

Amd RNN

sharannarang · 2018-04-04T00:44:07Z

Thanks for creating this PR! We're excited to have support for AMD in DeepBench.

I'll review the code later this week. A couple of questions/thoughts:

Are you planning to add results for AMD GPUs as well?
Could you also update the README with instructions on how to build the code? Something similar to Nvidia Benchmarks would be great.

remove half.hpp dependency

add instruction for compiling and running AMD benchmark

dagamayank · 2018-04-06T02:25:48Z

@sharannarang

A couple of questions/thoughts:

Are you planning to add results for AMD GPUs as well?
Could you also update the README with instructions on how to build the code? Something similar to Nvidia Benchmarks would be great.

Done.

dagamayank · 2018-04-17T19:08:48Z

@sharannarang Ping.

sharannarang · 2018-04-19T04:51:00Z

code/amd/conv_bench_rocm.cpp

+    start = std::chrono::steady_clock::now();
+
+    for (int i = 0; i < num_repeats; ++i) {
+        // Backward pass wrt weights


Nit: The comment is incorrect.

You mean weights should be params?

The function below is backward_inputs. So, the comment should be "Backward pass wrt inputs".

I am not sure why, but git is not playing nice here. If you open the source-file, the comment is correct. So there is nothing to be done here.

sharannarang · 2018-04-19T04:51:34Z

code/amd/conv_bench_rocm.cpp

+
+int main(int argc, char **argv) {
+
+    int num_repeats = 100;


It might be good to keep the default for num_repeats at 1000.

Currently different num_repeats are used for different platforms. Example here the value is 50. Can you please point out where the default info is mentioned?

Yeah, It would be good to clean this up on our part. Will do that as a separate PR. Can you confirm that the results have at least num_repeats set to 1000?

sharannarang · 2018-04-19T04:54:14Z

README.md

+
+## Compiling
+
+The `Makefile` in `code/amd` is for an AMD `gfx900` GPU. To benchmark other generations, please modify the `Makefile` accordingly.


Seems like the user does need to specify paths to HIPCC and MIOpen. Would be good to specify the environment variables that need to be passed with the make command.

Could you please address this?

sharannarang · 2018-04-19T05:00:24Z

code/amd/rnn_bench_rocm.cpp

+                                                    trainspace_size_byte_) );
+    }
+
+    void backward_data( Tensor<T> y, Tensor<T> dy, Tensor<T> dhy,


As you pointed out in #87 , Nvidia implementations are missing backprop with respect to weights. We will fix that. For the AMD release, you can fix as a part of this PR or create a separate one later.

Let us create a separate PR.

sharannarang · 2018-04-19T05:02:00Z

code/amd/conv_bench_rocm.cpp

+}
+
+
+int main(int argc, char **argv) {


Could you add inference support similar to the RNN benchmark? It would be good the same functionality in both benchmarks.

The goal of this PR is to enable training support. Inference support including convolutions is planned as part of an upcoming PR.

ok, sounds good. I was bringing this up since the convolution benchmark had an inference flag whereas the RNN benchmark didn't. So, thought it might be good to be consistent. But I'll you decide how to manage that.

sharannarang

LGTM, modulo the minor requested changes.

update readme

asroy and others added 19 commits March 21, 2018 21:23

AMD convolution

464c11a

AMD RNN benchmark

6fc0eb0

tidy

8b84405

cnn fp16 in progress

1905e0d

update Makefile for amd rnn

e4ec6b8

remove fp16 from cnn

b5ca84a

exchange filter w and h

df156f3

Merge branch 'amd_cnn' into amd

3f9bea2

fixed r,s s,r

6c59472

removed all nccl related code

a3896b6

Merge pull request #1 from ROCmSoftwarePlatform/amd_cnn

b49d1bb

Add GEMM and CNNs

Merge branch 'amd_cnn' of github.com:ROCmSoftwarePlatform/DeepBench i…

50ef364

…nto amd_cnn

merge with amd

ac73298

Merge branch 'master' into amd

3a25b28

debugging amd cnn fp16

c3b6cc5

added amd cnn fp16

c1a021b

Merge pull request #2 from ROCmSoftwarePlatform/amd

e569133

Amd RNN

use 100 iters

bcfad48

Merge branch 'master' into amd_cnn_fp16

c7a341f

asroy and others added 8 commits April 4, 2018 10:10

remove fp16 for now

608d7a3

revert change to rnn_problems.h

11dd9f7

keep iters as 100

c7055ef

Merge pull request #4 from ROCmSoftwarePlatform/amd

6395348

remove half.hpp dependency

Added AMD GPU results

469adaa

add instruction for compiling and running AMD benchmark

3b7d002

Update README.md

0eda4ae

Merge pull request #9 from ROCmSoftwarePlatform/amd_readme_2

160af97

add instruction for compiling and running AMD benchmark

Update README.md

589e751

sharannarang reviewed Apr 19, 2018

View reviewed changes

Mayank Daga and others added 3 commits April 19, 2018 11:30

Fixed comment.

91916d3

update readme

b107243

update README.md for AMD build

4a3b479

sharannarang approved these changes May 1, 2018

View reviewed changes

Mayank Daga added 2 commits May 1, 2018 21:56

Merge pull request #10 from ROCmSoftwarePlatform/amd_readme_2

eb2adff

update readme

fixed hipcc path

2853288

sharannarang merged commit 57a0562 into baidu-research:master May 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD GPU support for conv, gemm, rnn for fp32. #89

AMD GPU support for conv, gemm, rnn for fp32. #89

dagamayank commented Mar 30, 2018

sharannarang commented Apr 4, 2018

dagamayank commented Apr 6, 2018

dagamayank commented Apr 17, 2018

sharannarang Apr 19, 2018

dagamayank Apr 19, 2018

sharannarang May 1, 2018

dagamayank May 2, 2018

sharannarang Apr 19, 2018

dagamayank Apr 19, 2018

sharannarang May 1, 2018

sharannarang Apr 19, 2018

sharannarang May 1, 2018

dagamayank May 2, 2018

sharannarang Apr 19, 2018

dagamayank Apr 19, 2018

sharannarang Apr 19, 2018

dagamayank Apr 19, 2018

sharannarang May 1, 2018

sharannarang left a comment


		## Compiling

		The `Makefile` in `code/amd` is for an AMD `gfx900` GPU. To benchmark other generations, please modify the `Makefile` accordingly.

AMD GPU support for conv, gemm, rnn for fp32. #89

AMD GPU support for conv, gemm, rnn for fp32. #89

Conversation

dagamayank commented Mar 30, 2018

sharannarang commented Apr 4, 2018

dagamayank commented Apr 6, 2018

dagamayank commented Apr 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sharannarang left a comment

Choose a reason for hiding this comment