Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support performance testing #15

Open
piccolbo opened this issue Feb 3, 2015 · 8 comments
Open

support performance testing #15

piccolbo opened this issue Feb 3, 2015 · 8 comments

Comments

@piccolbo
Copy link
Collaborator

piccolbo commented Feb 3, 2015

No description provided.

piccolbo added a commit that referenced this issue Feb 15, 2015
piccolbo added a commit that referenced this issue Feb 16, 2015
@piccolbo
Copy link
Collaborator Author

So far we have introduced a level of visibility for performance. I think the next steps would be

  • express that performance in units that make more sense or are more portable from machine to machine
  • expand the API with a possibility of failure on insufficient performance

These are quite advanced features and I don't know of any precedent right now. As such, we are not going to try and solve this in the next release or so, and an attempt will be made in a dedicated branch.

@piccolbo
Copy link
Collaborator Author

piccolbo commented Apr 2, 2015

Actually the performance measurement has been hidden as it was confusing.

@piccolbo
Copy link
Collaborator Author

piccolbo commented Apr 2, 2015

Done some experiments with sorting. Observations include

  • could use performance modeling to use peculiar R features to do something that has not been done in other languages. That is run function to measure on a variety of inputs sizes and fit a model. Model could be user-provided but usually low degree polylog should do, but need to think about regularization or model will always fit highest poly.
  • outliers probably due to concurrent activities on the same system are a problem for model fitting. robust fitting a partial solution
  • experiments were aimed at modeling elapsed time as a function of input size, which is hardware dependent. It would be interesting to model performance on target function as a function of performance on reference task, like cpu, memory and disk bound tasks. T(C(S), M(S), D(S)). C, M and D functions would be fitted for each specific system, then we'd have expectations on the coefficients of T, independent of machine
  • need to extend ideas to multiple arguments. What should we do with interactions? Is it reasonable to provide a default model?

piccolbo added a commit that referenced this issue Apr 27, 2015
@piccolbo
Copy link
Collaborator Author

Trying to detail the model a bit more. If we want to measure function performance as a function of platform performance (how fast the test machine is) and want to characterize this over multiple dimensions, we need to have multiple observable or the system is undetermined. system.time comes to the rescue as it offers three different measurements: user, system and elapsed time. So we can think of function performance as a multivariate polylog of input size multiplied by the unit time cost of elementary operations such as numeric, memory and disk operations, So the number of operations can be, say, quadratic, but we model the difference between platforms as linear.

T ~ polylog(S) x C

We don't have C so we run a number of benchmark expressions and collect the timings (again, user, system and elapsed). We model this similarly with

B = M x C

Where M describes how many of the elementary operations are needed for each of the benchmarks and C is platform dependent.
If we invert this and right-multiply the former equation we obtain

T B^-1 ~ polylog(S) x M^-1

Where unobservable platform-dependent matrix C cancels out. This is equivalent to

T B^-1  ~ polylog(S)

That is, if the above reasoning is correct, we can make the timings portable by inverting (pseudo-inverse) the benchmark matrix, which is measurable, and left multiplying it to the test timings. This is a linear transformation so we can model it with a poly, polylog or other, user-defined model.

@piccolbo
Copy link
Collaborator Author

To clarify, quickcheck goal is not to model performance data. The goal here is to create portable performance related assertions, which ultimately end up in tests. But we may need to provide some modeling tools to help people write such assertions, because of portability.

@piccolbo
Copy link
Collaborator Author

The plan is as follows: users won't model directly their algo performance T but T B^-1 or other portable transformation. They will make their model coefficients as a vector in their tests. The test will use the model to predict T B^-1 for specific input sizes and compute B for a specific test machine. Then will compare predicted and actual T and when prediction are exceeded by a TBD amount, the test will fail.

@piccolbo
Copy link
Collaborator Author

It may be worth considering whether performance is only function of input size or of the input value itself. If one thinks of textbook algorithms such as sort, input size is all that matters in most cases. But in the case, for instance, of RNG, the input size is generally constant, but run time dependent on the actual sample size (an argument), so the general approach would be to have a performance model to be dependent on the actual arguments of a test, and then have the modeler decide whether to consider the length of an input or some other function of it.

@piccolbo
Copy link
Collaborator Author

piccolbo commented May 1, 2015

Trivial example to bring this back to earth. To test the performance of an implementation of quicksort, qsort() I can write a test like

test(forall(x = ratomic(), {n = length(x); expect("time.limit", C*n*log( n))}))

Where C is a constant that depends on the test machine hw sw platform, probably more hw in this case. Now if we just made tha C a function call C() that returns the right number for each machine, we can write a portable performance test. Such function doesn't exist yet and in any event it would not be a scalar function as machine performance is not described by a single number. So the above complication is an attempt to replace this deep knowledge of an architecture and an algorithm (user would have to quantify disk accesses, memory accesses etc) with a more experimental approach where a machine is characterized experimentally by running a number of expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant