-
Notifications
You must be signed in to change notification settings - Fork 16
codes io workloads
We currently have initial support for extrapolating (lossy) Darshan logs (https://www.mcs.anl.gov/research/projects/darshan/), a simple synthetic IO kernel language, and IO Recorder (https://github.com/babakbehzad/Recorder) traces.
The synthetic IO language is a simple, interpreted set of IO and basic arithmetic commands meant to simplify the specification and running of application workloads.
The input for the workload generator consists of an IO kernel metadata file and a number of IO kernel files. The former specifies a set of kernel files to run and logical client IDs to participate in the workload, while the latter describes the IO to be performed.
The format of the metadata file is a set of lines containing: where:
- is the ID of this group (see restrictions)
- and form the range of logical client IDs that will perform the given workload. Note that the end ID is inclusive, so a start, end pair of 0, 3 will include IDs 0, 1, 2, and 3. An of -1 indicates to use the remaining number of clients as specified by the user.
- is the path to the IO kernel workload. It may either be an absolute or relative path.
The IO kernel file contains a set of commands performed on a per-client basis. Like the workload generator interface, files are represented by integer IDs, and the standard set of "POSIX-ish" operations can be applied (e.g., open, close, sync, write, read) and have a similar argument list (file ID, [length], [offset] where applicable). pread/pwrite equivalents are given by readat/writeat.
More detailed documentation on the language is ongoing, but for now a general example can be seen at doc/workload, which shows a simple out-of-core data shuffle. Braver souls may wish to visit the implementation at src/iokernellang and src/workload/codes-iolang-wrkld.c.
The following restrictions currently apply to the IO language:
- all user-defined variables must be a single, lower-case letter (the symbol table from the code we inherited is an array of 26 chars)
- the implementation of "groups" is currently broken. We have gotten around this by hard-coding in the group size and client ID into the parser when a kernel file is loaded (parsing currently occurs on a per-client basis). Hence, getgroupid should be completely ignored and getgrouprank and getgroupsize ignore the group ID parameter passed in.
The IO language is frozen and no future development will be happening with it, so keep the following limitations in mind when using it.
- There is currently no way to specify a "create" flag to open.
- Variables are expected to be a single lowercase character.
The mock IO workload generator creates a sequential workload of N requests of size M. The generated file ID is either an optional input or 0 - there's also an option to add a (simulated) processes rank to the file IDs, giving in effect a unique file per rank. Relevant configuration parameters are:
- mock_num_requests - the number of requests
- mock_request_size - the size of each request
- mock_request_type - the type of request ("read" or "write")
- mock_file_id (optional) - the file ID to use, default 0
- mock_use_unique_file_ids (optional) - if non-zero, add the workload processor's rank to the file ID. Default is 0.
- mock_rank_table_size (optional) - the hash table size to store the ranks in. For minimal collisions, choose a value larger than the expected workload number of ranks.
Recorder has both a static and a dynamic library that may be linked to a given application (preloaded at runtime in the case of the dynamic library). Whenever an MPI process calls an I/O function that is instrumented at a specific layer of the I/O stack by Recorder, the timestamp, function name, arguments, return value, and the duration of the function are stored into a per-process trace file.
For more details, see "Techniques for Modeling Large-Scale HPC I/O Workloads" by S. Snyder et al. in PMBS workshop held in conjunction with SC'15.
The checkpoint I/O workload is based on the optimum checkpointing interval for applications to minimize both the time spent writing checkpoints and the time recomputing lost work due to failure, given the system’s expected mean time to failure (MTTF), the amount of data to be checkpointed, and the available storage bandwidth. The amount of data to be checkpointed, storage/network bandwidth and MTTF can be configured by the user in the config files. See codes-storage-server repo (README) for more details on how to use the checkpoint IO workload.