description |
---|
#deep_learning_recommender_system #embedding_lookup #recommendation_inference #cache |
EVStore: Storage and caching capabilities for scaling embedding tables in deep recommendation system
Presented in ASPLOS 2023.
Authors: Daniar H. Kurniawan, Ruipu Wang, Kahfi S. Zulkifli, Fandi A. Wiranata, John Bent, Ymir Vigfusson, Haryadi S. Gunawi.
Code: https://github.com/ucare-uchicago/ev-store-dlrm
Each recommendation inference requires multiple EV table lookups, but if any memory access is slow, the whole inference request is slow.
Open-source DLRMs such as Facebook DLRM, store the full embedding tables in DRAM and lack support for responding to lookups from backend storage when memory is exhausted.
- Propose EVStore, add a caching layer within DLRS; optimize for access patterns.
- Three layers
- EVCache (L1)
- Extend various cache replacement algorithms.
- EVMix (L2)
- Store lower precision embedding (e.g., fp8).
- EVProx (L3)
- A key-to-key caching layer that maps a key to a surrogate key with a similar embedding value.
- Key mapping built in an offline preprocessing manner; adopt the statistical measures of Euclidean and cosine distances to calculate similarity.
- In general, the remapping should be done when L3 hit rate drops significantly.
- EVCache (L1)
Integrated within Facebook DLRM.
- Use the Criteo 1TB Click Logs dataset.
- 13 dense integer features and 26 sparse categorical features (26 EV tables)
- All EV tables have the same embedding dimensions of 36
- 156 billion total feature values and over 800 million unique attribute values