Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Study to compare t-Digest and REQ sketch #416

Open
AlexanderSaydakov opened this issue Jan 5, 2024 · 13 comments
Open

Study to compare t-Digest and REQ sketch #416

AlexanderSaydakov opened this issue Jan 5, 2024 · 13 comments

Comments

@AlexanderSaydakov
Copy link
Contributor

Compare the performance of t-Digest with the closest competitor in the library, REQ sketch.
REQ sketch is the closest competitor because it prioritizes high rank accuracy (HRA mode) or low rank accuracy (LRA mode), unlike other quantile sketches (KLL, classic) with the same rank error for any rank.
There are a few obvious differences:

  • REQ sketch can work with any data type with a comparator, t-Digest is limited to numeric data (floating-point types)
  • REQ sketch retains and returns values observed in the input only - no notion of distance (only less than comparison), no interpolation. t-Digest is based on computing means and does interpolation.
  • t-Digest prioritizes both high rank and low rank accuracy at the same time with the default scaling function. Perhaps this can be changed with different scaling functions.
@AlexanderSaydakov
Copy link
Contributor Author

tdigest vs req 0 95 rank error tdigest vs req 0 99 rank error tdigest vs req serialized size

Here is what I have so far. Surprisingly REQ sketch with k=10 and k=12 behaves very similarly

@PavelVesely
Copy link

We compared t-digest and ReqSketch in a paper (at KDD'21) with Graham and people from Splunk. I think t-digest was not updated much since then, so I hope the points below are still true.

Here's an upshot based on the paper (see Section 6 in the paper):

  • We were able to construct inputs for t-digest that make its error arbitrarily bad, even though these inputs are highly unlikely to be encountered in practice (N.B. we needed to use numbers from astronomically large to infinitesimally small). On the other hand, we were able to come up with distributions such that on i.i.d. samples from these distributions, t-digest has worse error than ReqSketch while using similar space (again, these distributions may seem artificial due to requiring large scale). See plots in Section 5 in the paper. Thus, t-digest is not robust to adversarially inputs.
  • Of course, on more realistic distributions, t-digest works much better in terms of accuracy.
  • t-digest has worse update time; see Table 1. This is because sampling used by KLL and ReqSketch does not require so many expensive operations as maintaining centroids in the t-digest.

@PavelVesely
Copy link

Here is what I have so far. Surprisingly REQ sketch with k=10 and k=12 behaves very similarly

I'd say that values of $k \le 12$ are somewhat small, but of course, such small values are needed to have somewhat comparable space for t-digest (at least for short streams). Actually, in the KDD paper we used k = 4 to achieve similar space for a stream of length $2^{20}$.

The choice of the uniform distribution may not be the best for a comparison. I'd suggest to also include the log uniform distribution (uniform on the log scale) that we used in the paper. Or a normal distribution.

@AlexanderSaydakov
Copy link
Contributor Author

@PavelVesely although slightly off topic, do you understand why REQ k=12 and k=10 end up retaining about the same number of items and having about the same rank error? Do you think it might be a bug in the implementation?

@AlexanderSaydakov
Copy link
Contributor Author

tdigest vs req rank error 0 99 tdigest vs req rank error 0 95 tdigest vs req memory

@AlexanderSaydakov
Copy link
Contributor Author

tgigest vs req serialized_size

@PavelVesely
Copy link

@PavelVesely although slightly off topic, do you understand why REQ k=12 and k=10 end up retaining about the same number of items and having about the same rank error? Do you think it might be a bug in the implementation?

My guess is that for such a small k, rounding would yield sketches of pretty much the same size. Specifically, since k is the initial section size and section size decreases by a factor of $\sqrt{2}$, the next section size (after the first doubling of the number of sections) is the same for k=10 and k=12 --- the nearest even number to $10/\sqrt{2}$ is 8, so it's the same as for $12/\sqrt{2}$

Apologies for the belated answer! (You can let me know by email if you'd like me to look at something, but I'm not watching the dev mailing list closely.)

@PavelVesely
Copy link

@AlexanderSaydakov The plots are nice, albeit it's not surprising that t-digest is much better on uniform distribution. Can you try another distribution which is more skewed?

@AlexanderSaydakov
Copy link
Contributor Author

sorry for the delay. I plan to do this eventually, but was busy with more urgent tasks.

@AlexanderSaydakov
Copy link
Contributor Author

I measured rank error with input from exponential distribution (lambda=1.5). I don't see drastic difference from the uniform distribution
TDitest vs REQ high rank exp 1 5
TDitest vs REQ low rank exp 1 5

@PavelVesely
Copy link

Thanks for the update! In the KDD paper, we tried the log-uniform distribution, that is, choosing a random $R\in (0,1)$ and outputting item $10^{R*E}$, where $E$ the maximum exponent. Generally, the larger $E$, the worse for t-digest.

(We actually tried some variations of the log-uniform distribution, like additionally choosing a random sign, or squaring $R$ in the exponent, that is, $10^{R^2*E}$, which made the error of t-digest even worse.)

@PavelVesely
Copy link

@AlexanderSaydakov @leerho Btw, are there any differences between your implementation of t-digest and the "official" implementation? I mean primarily algorithmic differences that would influence the accuracy on some datasets.

@AlexanderSaydakov
Copy link
Contributor Author

We implemented merge slightly differently since the reference implementation modifies the input sketches. My testing shows that our version does a bit better in terms of accuracy after merge. Other than that I believe our implementation is equivalent to the reference implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants