This repository contains the evaluation code and experiments written in Kotlin for the research work that investigates the performance impact of various datacenter scheduler programming abstractions.
- An x86 machine with at least 16GB RAM.
- Any desktop Linux distribution with build tools installed. For example,
build-essential
package on Ubuntu. - Java 17 or greater.
- Python 3.11 (We recommend conda, virtualenv, or a similar environment).
- Run
./download-traces.sh
. This will download all required traces from Zenodo and copy them to the relevant directories for experiments. - Setup a python 3.11 virtual environment. Install dependencies using
pip install -r plot/script/requirements.txt
. - Use
./run-experiments.sh
to run the simulations and plot the results. - Figure 7 in the paper is
plot/output/migrations-results-packing-azure.pdf
. Figure 8 isplot/output/migrations-results-totaltime-azure.pdf
. Figure 10 isplot/output/metadata-results-ibm.pdf
.
Detailed instructions to build individual experiments and modify them start here. The code for all the specific experiments can be found in the opendc-experiments/studying-apis
directory of the repository. To run the experiments, make sure to place the relevant traces in the corresponding experiment folder within this directory. For example, an example trace bitbrains-small is provided in the traces directory for your reference.
- Clone the repository to your local machine.
- Navigate to the
opendc-experiments/studying-apis
folder. - Place the relevant traces for the experiment you want to run in the corresponding experiment folder.
Here are the steps for proper trace placing:
- Download the necessary traces from the following link: https://zenodo.org/record/7996316.
- Store the downloaded traces in the directory structure:
opendc-experiments/studying-apis/<experiment>/src/main/resources/trace
. - Create a folder within the
trace
directory with a name corresponding to the source of the trace (e.g.,google
,azure
,bitbrains
). - Place the trace Parquet file inside the respective source folder.
- Make sure to rename the trace file as
meta.parquet
when copying it.
These steps ensure that the traces are properly organized and available for the evaluation experiments using the OpenDC platform.
- Open the terminal and navigate to the
opendc
project directory. - Run the command
./gradlew :opendc-experiments:studying-apis:<experiment>:experiment
to compile and run the desired experiment.
To plot the experiment results, please follow these steps:
-
Install the required Python packages by running the following command:
pip install -r plot/script/requirements.txt
-
Ensure that all the necessary files for plotting are located under the
./plot
folder in the repository. -
Set up the trace data by copying the corresponding trace files to the
./plot/trace/<dataset>
directory. Rename the trace file tometa.parquet
. For example, if you are working with the Azure dataset, the path would be./plot/trace/azure/meta.parquet
. -
Prepare the input folder for the experiments you have run. Copy the experiment output to the path
./plot/input/<experiment>/<dataset>
. To locate the output of the experiments, you can find them under the following directory structure: opendc-experiments/studying-apis//output/. Each experiment and dataset combination will have its own corresponding folder under the output directory. For instance, if you ran the migrations experiment using the Azure dataset, the path would be./plot/input/migrations/azure
. -
Choose one of the available scripts located in the
./plot/script
directory based on the experiment type. The available scripts are:migrations_plot_results.py
reservations_plot_results.py
metadata_plot_results.py
metadata_plot_preview.py
-
Run the selected script with the desired dataset parameter. For example, to plot the results for the Azure dataset using the reservations experiment, run the following command:
python3 ./plot/script/reservations_plot_results.py azure
-
The generated plots and visualizations will be available in the
./plot/output/<experiment>/<dataset>
directory.
Please note that the dataset parameter is mandatory when running the plotting script, as it determines which dataset the script will process.
Feel free to explore the generated plots and visualizations to analyze and interpret the experiment results.
That's it! You should now be able to plot your experiment results using the provided scripts and instructions.
The evaluation experiments are performed using OpenDC, an open-source data center simulation platform. OpenDC allows for accurate and realistic simulations of data center environments, enabling thorough evaluation and analysis of various scheduling APIs and their impact on performance.
OpenDC is a free and open-source platform for datacenter simulation aimed at both research and education.
Users can construct datacenters (see above) and define portfolios of scenarios (experiments) to see how these datacenters perform under different workloads and schedulers (see below).
The simulator is accessible both as a ready-to-use website hosted by us at opendc.org, and as source code that users can run locally on their own machine, through Docker.
To learn more about OpenDC, have a look through our paper OpenDC 2.0 or on our vision.
🛠 OpenDC is a project by the @Large Research Group.
🐟 OpenDC comes bundled with Capelin , the capacity planning tool for cloud datacenters based on portfolios of what-if scenarios. More information on how to use and extend Capelin coming soon!
The documentation is located in the docs/ directory and is divided as follows:
Questions, suggestions and contributions are welcome and appreciated! Please refer to the contributing guidelines for more details.
This work is distributed under the MIT license. See LICENSE.txt.