Storing results and memory errors #304

quaquel · 2023-10-31T08:28:58Z

With #299 we get even better support for running on HPC. However, the existing way in which results are stored does not scale well once going to a very large number of experiments or when creating high dimensional data. Presently, the results are stored as a collection of CSVs wrapped in a tarball. The main advantage of this is that the results are easy to unzip and open with any text editor or even Excel. It is also a very convenient way of storing results in a cross-platform, cross-language way. However, it breaks with large outputs because you will run into memory errors.

A short-term solution is to change save_results. It currently builds up the entire tarball in memory before flushing it to disk. A slightly more memory-efficient solution is to create a directory on disk, write each CSV file to it, and then turn the entire directory into a tarball. Some memory profiling is likely needed as to how much of a difference this will make.

A longer-term solution is to add other storage solutions where results are flushed to disk while they are coming in. This avoids having to build up in memory the very large results dataset. The basic machinery for this is in place because of the callback keyword argument that is passed to perform_experiments. It requires, probably, however, a minor rethink of how to capture the serialization of all classes of outcomes (i.e., to_disk and from_disk ). Depending on the chosen storage solution, a slightly different serialization will be required.

The text was updated successfully, but these errors were encountered:

steipatr · 2023-11-06T12:27:23Z

A benefit of the longer-term solution would also be that an error in the experiments (due to an edge case, divide by zero, etc) doesn't mean you have to re-do all experiments.

quaquel · 2023-11-10T08:45:24Z

I ran a quick test using memray. In my test case, I went from 2.6 GB to 1.9 GB. This is close to a 30% reduction in memory usage. It thus seems that creating a directory, writing all results to this directory, and turning this directory into a tarball is an easy way of getting quite some performance improvement.

quaquel added enhancement feature performance labels Oct 31, 2023

EwoutH mentioned this issue Nov 2, 2023

Introducing MPIEvaluator: Run on multi-node HPC systems using mpi4py #299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing results and memory errors #304

Storing results and memory errors #304

quaquel commented Oct 31, 2023

steipatr commented Nov 6, 2023

quaquel commented Nov 10, 2023

Storing results and memory errors #304

Storing results and memory errors #304

Comments

quaquel commented Oct 31, 2023

steipatr commented Nov 6, 2023

quaquel commented Nov 10, 2023