Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does rnn_bench uses training routines for inference pass? #114

Closed
mattsinc opened this issue Sep 19, 2019 · 3 comments
Closed

Why does rnn_bench uses training routines for inference pass? #114

mattsinc opened this issue Sep 19, 2019 · 3 comments

Comments

@mattsinc
Copy link
Contributor

mattsinc commented Sep 19, 2019

Hi everyone,

I'm attempting to understand the code in the RNN benchmark. Looking at both NVIDIA and AMD's implementations, I see that both are using what appear to be documented as training passes for what I believe to be the inference pass.

For example, for the AMD implementation I see that miopenRNNForwardTraining and miopenRNNBackwardData are used. I thought that miopenRNNForwardTraining was being used for the inference half of the benchmark, and miopenRNNBackwardData was being used for the training half of the benchmark – purely based on context clues from the benchmark (e.g., the !inference call https://github.com/ROCmSoftwarePlatform/DeepBench/blob/master/code/amd/rnn_bench_rocm.cpp#L239 means we’re also doing training too, and miopenRNNBackwardData gets called via that code).

However, according to the AMD documentation, both miopenRNNForwardTraining (https://rocmsoftwareplatform.github.io/MIOpen/doc/html/rnn.html#miopenrnnforwardtraining) and miopenRNNBackwardData (https://rocmsoftwareplatform.github.io/MIOpen/doc/html/rnn.html#miopenrnnbackwarddata) are passes to use when doing training. I also noticed that the NVIDIA implementation appears to do exactly the same thing: https://github.com/baidu-research/DeepBench/blob/master/code/nvidia/rnn_bench.cu#L196, https://github.com/baidu-research/DeepBench/blob/master/code/nvidia/rnn_bench.cu#L221.

So, I was wondering why an inference-only pass wouldn't use something specifically for inference-only passes, e.g., miopenRNNForwardInference (https://rocmsoftwareplatform.github.io/MIOpen/doc/html/rnn.html#miopenrnnforwardinference) or the equivalent cuDNN call? Does DeepBench have a requirement requirement for the backward path that necessitates this approach? @dagamayank , not sure if you know who the right person to ask here is (or if you know the answer)?

Looking through the open issues, I believe this is distinct from #87 .

Thanks,
Matt

@mattsinc
Copy link
Contributor Author

@sharannarang : wanted to ping you on this too, in case you know why.

@sharannarang
Copy link
Contributor

@mattsinc , I think you are correct. We should be using the cudnnRNNForwardInference instead of cudnnRNNForwardTraining for the inference benchmark. For NVIDIA benchmarks, I think I just used the training function without realizing that there may be a performance difference with using the inference function.

From the cudnnRNNForwardInference cudnn docs:

This routine executes the recurrent neural network described by rnnDesc with inputs x, hx, and cx, weights w and outputs y, hy, and cy. workspace is required for intermediate storage. This function does not store intermediate data required for training; cudnnRNNForwardTraining() should be used for that purpose.

So, there will be some overhead in using the training function instead of inference function.

@mattsinc
Copy link
Contributor Author

mattsinc commented Nov 16, 2019

Thanks @sharannarang ! This is exactly what I was seeing/thinking as well. I have tested and pushed a pull request for this change for both NVIDIA and AMD (#117).

Matt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants