You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found mlperf benchmark about DLRM-DCN using DMP and training FBGEMM GPU op, So, if i just want to deploy a inference pipeline, is DMP a must? because using DMP seems will call SplitTableBatchedEmbeddingBagsCodegen FBGEMM backend, and it includes optimizer and some other training modules.
can anyone figure it out for me ?
The text was updated successfully, but these errors were encountered:
Generally, as you alluded to, you don't want user facing code (which is backed by nn.Embedding / nn.EmbeddingBag) over FBGEMM optimized TableBatchEmbeddings (INTX versions). This still means you need apply some module swapping (which can be done with main DMP api, but its probably overkill.
It's probably helpful to refer to testing infra to see this pattern:
, basically you want to 'swap' out the respective TorchRec module with the 'quantized' or 'quantized sharded' versions. For most users, I would think quantized is sufficient. Basically these kernels are different from training versions - they are optimized to work on INTX quantized embedding tables, which are only appropriate for inference (due to low precision).
I found mlperf benchmark about DLRM-DCN using DMP and training FBGEMM GPU op, So, if i just want to deploy a inference pipeline, is DMP a must? because using DMP seems will call SplitTableBatchedEmbeddingBagsCodegen FBGEMM backend, and it includes optimizer and some other training modules.
can anyone figure it out for me ?
The text was updated successfully, but these errors were encountered: