Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMP is a must to use when do inference? #2673

Open
Ostring24 opened this issue Jan 9, 2025 · 1 comment
Open

DMP is a must to use when do inference? #2673

Ostring24 opened this issue Jan 9, 2025 · 1 comment

Comments

@Ostring24
Copy link

I found mlperf benchmark about DLRM-DCN using DMP and training FBGEMM GPU op, So, if i just want to deploy a inference pipeline, is DMP a must? because using DMP seems will call SplitTableBatchedEmbeddingBagsCodegen FBGEMM backend, and it includes optimizer and some other training modules.

can anyone figure it out for me ?

@dstaay-fb
Copy link
Contributor

Interesting, do you have a code reference?

Generally, as you alluded to, you don't want user facing code (which is backed by nn.Embedding / nn.EmbeddingBag) over FBGEMM optimized TableBatchEmbeddings (INTX versions). This still means you need apply some module swapping (which can be done with main DMP api, but its probably overkill.

It's probably helpful to refer to testing infra to see this pattern:

sharded_model = _shard_modules(
, basically you want to 'swap' out the respective TorchRec module with the 'quantized' or 'quantized sharded' versions. For most users, I would think quantized is sufficient. Basically these kernels are different from training versions - they are optimized to work on INTX quantized embedding tables, which are only appropriate for inference (due to low precision).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants