DMP is a must to use when do inference? #2673

Ostring24 · 2025-01-09T06:55:17Z

I found mlperf benchmark about DLRM-DCN using DMP and training FBGEMM GPU op, So, if i just want to deploy a inference pipeline, is DMP a must? because using DMP seems will call SplitTableBatchedEmbeddingBagsCodegen FBGEMM backend, and it includes optimizer and some other training modules.

can anyone figure it out for me ?

dstaay-fb · 2025-02-10T23:15:15Z

Interesting, do you have a code reference?

Generally, as you alluded to, you don't want user facing code (which is backed by nn.Embedding / nn.EmbeddingBag) over FBGEMM optimized TableBatchEmbeddings (INTX versions). This still means you need apply some module swapping (which can be done with main DMP api, but its probably overkill.

It's probably helpful to refer to testing infra to see this pattern:

torchrec/torchrec/distributed/test_utils/infer_utils.py

Line 897 in 1afbf08

sharded_model = _shard_modules(

, basically you want to 'swap' out the respective TorchRec module with the 'quantized' or 'quantized sharded' versions. For most users, I would think quantized is sufficient. Basically these kernels are different from training versions - they are optimized to work on INTX quantized embedding tables, which are only appropriate for inference (due to low precision).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DMP is a must to use when do inference? #2673

DMP is a must to use when do inference? #2673

Ostring24 commented Jan 9, 2025

dstaay-fb commented Feb 10, 2025

DMP is a must to use when do inference? #2673

DMP is a must to use when do inference? #2673

Comments

Ostring24 commented Jan 9, 2025

dstaay-fb commented Feb 10, 2025