DESIGN

Caskbench's Design
==================

Caskbench strives to be flexible, with abstracted core routines and lots
of command line options to customize behavior.  Tests are very simple to
add.  This abstraction also makes it feasible to extend caskbench with
more rendering backends or even add new drawing libraries to test.


== The Main Loop ==

The main() routine in caskbench.cpp does all the setup, dispatching of
tests, and results creation.  It runs through all the selected tests (or
all tests) and alternatively runs the cairo or skia version of that
test.  For each test, one rendering context is created for the given
drawing library, the test's setup routine is called, and the test is
given a free practice run (to warm caches or whatever).  Then the clock
is started and the test is run for the given number of iterations.  In
each iteration the screen is cleared, the test run, and the context
updated.  After the given test has finished all its iterations we do
some bookkeeping and cleanup everything, then move on to the next test.

Essentially, this is the core logic:

  foreach test c:
    perf_tests[c].context_setup(&context, config);  // calls your setup routine
    ...
    foreach iteration i {
      perf_tests[c].context_clear(&context);
      start timer
      perf_tests[c].test_case(&context);  // run the test
      perf_tests[c].context_update(&context);  // calls your update routine
      stop timer
    }
    ...
    perf_tests[c].context_destroy(&context); // calls your destroy routine

The list of available tests, perf_tests[], is generated by CMake when it
runs the `generate_tests` script to create the tests.cpp file.  Each
member of the array is a table of function pointers to do the various
renderer-backend-specific and drawing-library-specific
creation/destruction/update operations.


== Test Design Theory ==

The central purpose of caskbench is to help improve Skia performance on
the Tizen platform (phones in particular).

As a tester, you generally want to avoid having to hack and recompile
code. Indeed, there may not even be a compiler on the device itself, and
having to build new packages for a minor tweak can add a lot of hassle.

So, a design objective for caskbench is to build everything together,
and provide a rich array of command line options that enable the analyst
to undertake widely differing testing plans, without needing to
recompile.


I envision at least four distinct use cases for caskbench, that I think
we should try to support:

A. Quick check. All test cases are run with just randomized
   settings. Objective is full functional coverage in short period of
   time to identify bugs in the code. We use pre-defined seed value(s)
   in order to ensure reproducibility. This is mainly used during
   development or porting, or for a quick tire kicking.

B. Detailed survey. Several or all test cases are run at a series of
   different setting values, and multiple iterations, using a specified
   seed. These are intended to generate graphable data for analysis, and
   to cover corner-cases that may not have an immediate relation to
   real-world cases, but would help the analyst gain insight into
   performance limits.

C. Individual study. One test is run with specific settings and a
   pre-defined seed to study a specific performance issue. This might be
   used by a developer while optimizing the code.

D. Benchmark. Test cases and settings are selected that more closely
   resemble real-world use cases. Corner-cases are avoided. The tests
   are iterated many times to get good performance averages, and the
   seed is randomized. Objective is for comparing different rendering
   technologies, backends, or algorithms for their impact on end user
   experience, or for testing performance of different types of hardware
   or drivers.

What all this implies for individual test cases is:

    Each test should have a defined core "mission". When run with all
    parameters set to specific values, it should effectively let us do
    stress testing of that core function. For instance, drawing a red
    circle at a specific location repeatedly as fast as possible.
    Each test should allow specifying randomization. E.g. random fill
    colors, random font selections, or random shape dimensions.
    Each test should allow increasing or narrowing the scope of
    particular drawing parameters. E.g., increasing or decreasing the
    variety of shapes to draw, or the number of gradient stops in a
    radial gradient, or the min and max widths for line stroking.
    Given a specific seed to the RNG at the start of a test, both the
    Skia and Cairo implementations of the given test should produce
    visually identical output (they needn't be pixel-perfect, but should
    be darn close).