Git Product home page Git Product logo

pond's Introduction


 ______                _
(_____ \              | |
 _____) )__  ____   __| |
|  ____/ _ \|  _ \ / _  |
| |   | |_| | | | ( (_| |
|_|    \___/|_| |_|\____|  - Compute Express Link (CXL) based Memory Pooling Systems




README for CXL-emulation and Experiments HowTo

Cite our Pond paper (ASPLOS '23):

Pond: CXL-Based Memory Pooling Systems for Cloud Platforms

The preprint of the paper can be found here.

@InProceedings{pond.asplos23,
  author = {Huaicheng Li and Daniel S. Berger and Lisa Hsu and Daniel Ernst and Pantea Zardoshti and Stanko Novakovic and Monish Shah and Samir Rajadnya and Scott Lee and Ishwar Agarwal and Mark D. Hill and Marcus Fontoura and Ricardo Bianchini},
  title = "{Pond: CXL-Based Memory Pooling Systems for Cloud Platforms}",
  booktitle = {Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
  address = {Vancouver, BC Canada},
  month = {March},
  year = {2023}
}

What is this repo about?

This repo open-sources our approach for CXL simulation and evaluations (mainly some scripts). It is not a full-fledged open-source version of Pond design.

CXL Emulation on regular 2-socket (2S) server systems

We mainly emulate the following two characteristics of Compute Express Link (CXL) attached DRAM:

  • Latency: ~150ns
  • No local CPU which can directly accesses it, i.e., CXL-memory treated as a "computeless/cpuless" node

The CXL latency is similar to the latency of one NUMA hop on modern 2S systems, thus, we simulate the CXL memory using the memory on the remote NUMA node and disables all the cores on the remote NUMA node to simulate the "computeless" node behavior.

In this repo, the CXL-simulation is mainly done via a few scripts, check cxl-global.sh and the run-xxx.sh under each workload folder (e.g., cpu2017, gapbs, etc.).

These scripts dynamically adjust the system configuration to simulate a certain percentage of CXL memory in the system, and run workloads against such scenarios.

Configuring Local/CXL-DRAM Splits

One major setup we are benchmarking is to adjust the perentage of CXL-memory being used to run a certain workload and observe the performance impacts (compared to pure local DRAM "ideal" cases). For example, the common ratios the scripts include "100/0" (the 100% local DRAM case, no CXL), "95/5", "90/10" (90% local DRAM + 10% CXL), "85/15", "80/20", "75/25", "50/50", "25/75", etc.

To provision correct amount of local/CXL memory for the above split ratios, we need to profile the peak memory usage of the target workload. This is usually done via monitoring tools such as pidstat (the RSS field reported in the memory usage).

A Simple HowTo using SPEC CPU 2017

Under folder cpu2017, run-cpu2017.sh is the main entry to run a series of experiments under various split configurations. The script reads the profiled information in a workload input file (e.g., wi.txt), where the first column is the workload name and the second column is the peak memory consumption of the workload. Based on these, run-cpu2017.sh will iterate over a series of predefined split-ratios and run the experiments one by one. The scripts writes the logs and outputs to rst folder.

One could co-run profiling utilities such as emon or Intel Vtune together with the workload to collect architecture-level metrics for performance analysis. Make sure Intel Vtune is installed first before running the script.

pond's People

Contributors

huaicheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pond's Issues

question about pining the CPU and VM setup for qemu VMs

In the VM definition, each VM has 8 vcpus. Which version of QEMU does this experiment runs on? According to https://www.qemu.org/docs/master/system/qemu-manpage.html, there's different interpretation of what -smp 8 means for qemu to create the CPU topology. "For example, the following option(-smp 2) defines a machine board with 2 sockets of 1 core before 6.2 and 1 socket of 2 cores after 6.2" It would be great to specify the exact CPU topology on the argument.

https://github.com/vtess/Pond/blob/e9ae753669f98497f36c9ba52525e062515c5bf1/bigdata/vm/pin.sh#L9-L14
On line 10, it pins the vcpu thread to core 0 to 9 while we only have 8 vcpus.

On line 12, it said it pins the vcpu to the rest of the pCPUs but on line 14, it pins the qemu process to core 1 to 9 which overlaps the above range. Should it be on a different range or exactly the same range? Why 1-9 not 0-9?

Regarding qemu vm images

Hello, I am trying to run qemu/run-pond-vm.sh but this repo doesnt include the pondcxl.qcow2 image.
Could you guys give the way to compile the VM image?

Regarding Pond's Latency Insensitivity Model and Untouched-Memory Model

We have downloaded your source code, but it appears that it only contains components related to big data, CPU2017, and various scheduling mechanisms, including the simulation of two sockets. However, we could not locate the two models and the subsequent scheduling code. We kindly request your guidance on where to find the code corresponding to these components.

Your assistance in this matter would be greatly appreciated.

Sincerely

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.