diff --git a/docs/development/extensions-core/test-stats.md b/docs/development/extensions-core/test-stats.md index 9f069d87bf54..e0516e358895 100644 --- a/docs/development/extensions-core/test-stats.md +++ b/docs/development/extensions-core/test-stats.md @@ -1,6 +1,6 @@ --- id: test-stats -title: "Test Stats Aggregators" +title: "Test stats aggregators" --- -This Apache Druid extension incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for math background and details. +The `druid-stats` extension for Apache Druid incorporates aggregators to compute test statistics, including z-scores and p-values. +Please refer to [Democratizing Experimentation Data for Product Innovations](https://medium.com/paypal-tech/democratizing-experimentation-data-for-product-innovations-8b6e1cf40c27) for math background and details. Make sure to include `druid-stats` extension in order to use these aggregators. ## Z-Score for two sample ztests post aggregator -Please refer to [https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/](https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/) and [http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf](http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf) for more details. +Please refer to [Making Sense of the Two-Proportions Test](https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/) and [An Introduction to Statistics: Comparing Two Means](https://userweb.ucs.louisiana.edu/~jcb0773/Berry_statbook/427bookall-August2024.pdf) for more details. z = (p1 - p2) / S.E. (assuming null hypothesis is true) @@ -41,6 +42,7 @@ S.E. = sqrt{ p1 * ( 1 - p1 )/n1 + p2 * (1 - p2)/n2) } (p1 – p2) is the observed difference between two sample proportions. ### zscore2sample post aggregator + * **`zscore2sample`**: calculate the z-score using two-sample z-test while converting binary variables (***e.g.*** success or not) to continuous variables (***e.g.*** conversion rate). ```json @@ -74,7 +76,7 @@ p2 = (successCount2) / (sample size 2) } ``` -## Example Usage +## Example usage In this example, we use zscore2sample post aggregator to calculate z-score, and then feed the z-score to pvalue2tailedZtest post aggregator to calculate p-value. diff --git a/docs/querying/sql-translation.md b/docs/querying/sql-translation.md index e430caa8bf09..5125d17acde9 100644 --- a/docs/querying/sql-translation.md +++ b/docs/querying/sql-translation.md @@ -803,7 +803,7 @@ the query hits `maxStreamLength`: the maximum number of items to store in each s See [GitHub issue 11544](https://github.com/apache/druid/issues/11544) for more details. To workaround the issue, increase value of the maximum string length with the `approxQuantileDsMaxStreamLength` parameter in the query context. Since it is set to 1,000,000,000 by default, you don't need to override it in most cases. -See [accuracy information](https://datasketches.apache.org/docs/Quantiles/OrigQuantilesSketch) in the DataSketches documentation for how many bytes are required per stream length. +See [accuracy information](https://datasketches.apache.org/docs/Quantiles/ClassicQuantilesSketch.html) in the DataSketches documentation for how many bytes are required per stream length. This query context parameter is a temporary solution to avoid the known issue. It may be removed in a future release after the bug is fixed. ## Unsupported features