Skip to content

Latest commit

 

History

History
186 lines (137 loc) · 7.9 KB

README.md

File metadata and controls

186 lines (137 loc) · 7.9 KB

Overview

  • Paper: Mourad Khayati, Ines Arous, Zakhar Tymchenko and Philippe Cudré-Mauroux: ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams. PVLDB 2021.
  • Algorithms: The benchmark evaluates all the algorithms mentioned in the paper: ORBITS, SPIRIT, SAGE, OGDImpute, pcaMME, TKCM and M-RNN*. To enable/disable any algorithm, please refer to the Algorithms customization section below.
  • Datasets: The benchmark evaluates all the datasets used in the paper: gas (drfit10), motion, bafu and soccer*. To enable/disable any dataset, please refer to the Datasets customization section below.
  • Scenarios: The benchmark will execute the full set of 11 recovery scenarios and report the error using RMSE, MSE and MAE. A detailed description of the recovery scenarios can be found here.
  • Reproducibilty: We create a dedicated repo for the reproducibility of all the results reported in this paper.

*disabled by default as it takes a couple of days to run.

Prerequisites | Build | Execution | Benchmark Customization | Citation


Prerequisites


Build

  • Build the Testing Framework using the installation script located in the root folder (takes few minutes):
    $ sh install_linux.sh

Execution

    $ cd TestingFramework/bin/Debug/
    $ mono TestingFramework.exe 

The test suite with the default setup will take ~20 hours to finish.

  • Results: All results will be added to Results folder. The accuracy results of all algorithms will be sequentially added for each scenario and dataset to: Results/.../.../.../error/. The runtime results of all algorithms will be added to: Results/.../.../.../runtime/. The plots of the recovered blocks will be added to the folder Results/.../.../.../plots/.

  • Scenarios creation: To compare (externally) your technique against the benchmark results, we provide a command to export the missing scenarios/patterns for a given dataset:

    $ cd TestingFramework/bin/Debug/
    $ mono TestingFramework.exe export dataset_name

This command will produce contaminated data (where missing values are designated as NaN) in the Export/ folder for each streaming scenario in the benchmark.


Benchmark Customization

Algorithms customization

To enable an additional algorithm

  • open the file TestingFramework/config.cfg
  • add the name of the algorithm to the line EnabledAlgorithms =

Datasets customization

  • All the datasets used in this paper can be found in: TestingFramework/bin/Debug/data/

  • To enable an additional dataset

    • open the file TestingFramework/config.cfg
    • Add the name of the dataset to the line Datasets =
  • To add a new dataset to the benchmark

    • import the file to TestingFramework/bin/Debug/data/{name}/{name}_normal.txt (name is the name of your data).
    • Requirements: rows>= 1'000; columns>= 10; column separator = space; row separator = newline

Scenario customization

To enable an additional recovery scenario

  • open the file TestingFramework/config.cfg
  • add the name of the scenario to the line Scenarios =

Citation

@inproceedings{orbits2021vldb,
 author    = {Mourad Khayati and Ines Arous and Zakhar Tymchenko and Philippe Cudr{\'{e}}{-}Mauroux},
 title     = {ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams},
 booktitle = {Proceedings of the VLDB Endowment},
 volume    = {14},
 number    = {3},
 year      = {2021}
}