This project provides tools to simulate access strategies and cache-content advertisement schemes for distributed caching. The simulator considers a user who is equipped by several caches, and receives from them periodical advertisements about the cached content. These advertisements use space-efficient approximated data structures (aka indicators) that are not totally accurate. The user selects which caches to access, to obtain the requested datum in the lowest price and maximum certainty possible. Each cache independently runs a content-advertisement algorithm, to decide when to advertise a fresh indicator. For details about the problem and the algorithms used, please refer to the following papers:
[1] I. Cohen, Gil Einziger, R. Friedman, and G. Scalosub, Access Strategies for Network Caching, IEEE Transactions on Networking, Vol. 29(2), 2021, pp.609-622.
[2] I. Cohen, Gil Einziger, and G. Scalosub, False Negative Awareness in Indicator-based Caching Systems, IEEE Transactions on Networking, 2022, pp. 46-56.
[3] I. Cohen, Bandwidth Efficient Cache Selection and Content Advertisement, pre-print, 2024.
The source files are described below. More detailed documentation is found within the source files.
All source files are written in Python, and found in ./src. The result files are written to ./res. The traces should be found in the directory ../traces/.
Runs a simulation, looping over all requested values of parameters (miss penalty, cache sizes, number of caches etc.).
A simulator that accepts system parameters (trace, number and size of caches, algorithm to run etc.); runs a simulation; and outputs the results to file.
Implements the client-side cache-selection algorithms.
The class for a DataStore (cache). The cache stores items using the LRU policy. It also implements the cache-side algorithm for estimating FPR (false-positive ratio) and FNR (false-negative ratio) and the exclusion probabilities. The cache itself is implemented in the file mod_pylru.py.
Implementation of an LRU cache. Source code is taken from: Copyright (C) Jay Hutchinson https://github.com/jlhutch/pylru
A Simple and a Counting Bloom filter. To study more about Bloom filters, consider the following links: 1, 2, 3, 4.
Parse traces and output a .csv file, which includes a row with hashes of the keys in the trace.
generates the OVH network which exemplifies a commercial CDN (Content Delivery Network), and saves it on a .csv file.
Generates a histogram showing the cache access costs for the OVH network
This file contains several accessory functions, used throughout the project, e.g. for generating the string describing the settings of the simulation, and generate the list of requests from the trace.
An accessory function for the PGM access strategy algorithm, described in [1]. The function merges 2 input nodes. Inputs: 2 nodes to merge, H and L, and r = log(beta) Output: res_node - the result node, res_node.
A candidate sol' for the DSS (Data-Store-Selection) prob' (aka the "Cache-Selection" problem), consists of a list of DSs, theirs aggregate (mult' of) miss rate, and aggregate (sum of) accs cost
An accessory function for format-printing to a file.
A script to compare the fpr, fnr, calculated by either the paper [2], or by an an older paper, which uses different model: [3] Y. Zhu and H. Jiang, “False rate analysis of bloom filter replicas in distributed systems”, in ICPP, 2006, pp. 255–262.
Receives files that are the results of simulations, and generates plots.
An accessory file for run-time measurements.
Checks the hit rate in a given cache trace.
Accessory file containing global parameters and accessory functions.
Analyze a given trace. The analysis include:
- mean/avg. time between subsequent appearance of the same item.
- Number of items that appear only once along the trace.
- Expected hit ratio of an optimal cache selection alg' given the caching capacity.