diff --git a/_includes/footer.html b/_includes/footer.html index 8e6b08f..2418115 100755 --- a/_includes/footer.html +++ b/_includes/footer.html @@ -21,5 +21,5 @@ - + diff --git a/_pages/cv.md b/_pages/cv.md index 6cd2ffe..a37adee 100755 --- a/_pages/cv.md +++ b/_pages/cv.md @@ -15,4 +15,4 @@ For a PDF Version [CV](https://YifanYuan3.github.io/files/cv_yifan_yuan.pdf) -Last update: Jun. 2024 +Last update: Jul. 2024 diff --git a/_publications/cxl2.md b/_publications/cxl2.md new file mode 100755 index 0000000..e958548 --- /dev/null +++ b/_publications/cxl2.md @@ -0,0 +1,16 @@ +--- +title: "Demystifying a CXL Type2 Device: A Heterogeneous Cooperative Computing Perspective" +collection: publications +permalink: /publication/memo +excerpt: 'This paper is the first-ever characterization study of real commodity CXL Type-2 devices. We also introduce a real-world use case of Type-2 device as cache-coherent accelerator for Linux kernel function offloading. [paper]() [slides]()' +date: '2024.11.2' +venue: 'MICRO' + +--- + +CXL is the latest interconnect technology built on PCIe, providing three protocols to facilitate three distinct types of devices, each with unique capabilities. Among these devices, a CXL Type-2 device has become commercially available, followed by CXL Type-3 devices. Therefore, it is timely to understand capabilities and characteristics of the CXL Type-2 device, as well as explore suitable applications. In this work, first, we delve into three key features of a CXL Type-2 device: cache-coherent (1) device-accelerator to host-memory, (2) device-accelerator to device-memory, and (3) host-CPU to device-memory accesses. Second, using micro-benchmarks, we comprehensively characterize the latency and bandwidth of these memory accesses with a CXL Type-2 device, and then compare them with those of equivalent memory accesses with comparable devices, such as emulated CXL Type-2, CXL Type-3, and PCIe devices. Lastly, exploiting unique capabilities of a CXL Type-2 device, we propose two CXL-based Linux memory optimization features: compressed RAM cache for swap (zswap) and memory deduplication (ksm), as applications and macro-benchmarks. Our evaluation shows that Redis, when running with traditional CPU-based zswap and ksm, experiences a 4.5–10.3× tail latency increase, compared to Redis running alone. While PCIe-based zswap and ksm still experience a tail latency increase of up to 8.1×, CXL-based zswap and ksm practically eliminate the tail latency increase with faster and more efficient host-device communications than PCIe-based zswap and ksm. + + +[paper]() + +[slides]() \ No newline at end of file diff --git a/files/cv_yifan_yuan.pdf b/files/cv_yifan_yuan.pdf index 998d01c..61d0c90 100644 Binary files a/files/cv_yifan_yuan.pdf and b/files/cv_yifan_yuan.pdf differ diff --git a/index.md b/index.md index fb38d76..f2fe9af 100644 --- a/index.md +++ b/index.md @@ -1,5 +1,7 @@ Welcome to Yifan Yuan's website! +**NEWS [Mar. 2024]**: our paper [Demystifying a CXL Type2 Device: A Heterogeneous Cooperative Computing Perspective](https://yifanyuan3.github.io/publication/cxl2), the first-ever paper based on real CXL Type-2 device, has been accepted by MICRO'24. + **NEWS [Mar. 2024]**: our paper [Nomad: Non-Exclusive Memory Tiering via Transactional Page Migration](https://yifanyuan3.github.io/publication/nomad) has been accepted by OSDI'24. **NEWS [Mar. 2024]**: our paper [Intel Accelerator Ecosystem: An SoC-Oriented Perspective](https://yifanyuan3.github.io/publication/isca2024), describing Intel's years' efforts on building hardware-software ecosystems for data-intensive accelerators, has been accepted by ISCA'24 Industry Track.