Add geometric mean normalization for scores #239

martin-gaievski · 2023-08-01T22:02:20Z

Description

Adding geometric mean technique, that is a generalization of the mean that is based on product and N-th root of N values (more details here). Weights are supported similarly what it's done in arithmetic mean. Example of pipeline with processor config:

{
    "description": "Post processor for hybrid search",
    "phase_results_processors": [
        {
            "normalization-processor": {}, 
            "combination": {
                 "technique": "geometric_mean",
                  "parameters": {
                        "weights": [
                            0.4, 0.7
                        ]
                    }
             }
        }
    ]
}

In addition to main changes there are some refactoring in integ tests. I have to put it to this PR because with few new tests added for geometric mean auto redeploy feature started acting more aggressively and tests became flaky.

move from global model id to reading it at the beginning of each test case. this is required because model can be redeployed by the auto redeploy feature of ml-commons
split on large class into three smaller ones based on area of responsibility
cleanup outdated tests

Issues Resolved

#228, part of solution for #126

Check List

New functionality includes testing.
- All tests pass
New functionality has javadoc added
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

codecov · 2023-08-02T18:07:02Z

Codecov Report

Merging #239 (f2fdcbe) into feature/normalization (6ad641a) will increase coverage by 3.80%.
Report is 1 commits behind head on feature/normalization.
The diff coverage is 93.93%.

@@                     Coverage Diff                     @@
##             feature/normalization     #239      +/-   ##
===========================================================
+ Coverage                    82.43%   86.23%   +3.80%     
- Complexity                     323      337      +14     
===========================================================
  Files                           26       28       +2     
  Lines                          979      981       +2     
  Branches                       153      153              
===========================================================
+ Hits                           807      846      +39     
+ Misses                         108       69      -39     
- Partials                        64       66       +2

Files Changed	Coverage Δ
...ation/ArithmeticMeanScoreCombinationTechnique.java	`89.47% <88.88%> (-2.84%)`	⬇️
...ination/HarmonicMeanScoreCombinationTechnique.java	`94.11% <91.66%> (+94.11%)`	⬆️
...nation/GeometricMeanScoreCombinationTechnique.java	`94.11% <94.11%> (ø)`
...ch/processor/combination/ScoreCombinationUtil.java	`95.45% <95.45%> (ø)`
...processor/combination/ScoreCombinationFactory.java	`100.00% <100.00%> (ø)`

Signed-off-by: Martin Gaievski <[email protected]>

...ensearch/neuralsearch/processor/combination/GeometricMeanScoreCombinationTechniqueTests.java

… values Signed-off-by: Martin Gaievski <[email protected]>

heemin32 · 2023-08-02T23:24:03Z

...ensearch/neuralsearch/processor/combination/GeometricMeanScoreCombinationTechniqueTests.java

+     * Verify score correctness by using alternative formula for geometric mean as n-th root of product of weighted scores,
+     * more details in here https://en.wikipedia.org/wiki/Weighted_geometric_mean
+     */
+    private float geometricMean(List<Float> scores, List<Double> weights) {


I still have a doubt on the effectiveness of this test code. l believe we don’t need test code based on random number. I would like to hear other opinions though.

I don't have any concern as long as test is able to fail if there are changes in formula. Let's just make sure that it doesn't become flaky because of floating point precision losses.

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski added the skip-changelog label Aug 1, 2023

martin-gaievski marked this pull request as ready for review August 1, 2023 22:05

martin-gaievski requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, sean-zheng-amazon, model-collapse, wujunshen, zane-neo, ylwu-amzn and jngz-es as code owners August 1, 2023 22:05

martin-gaievski force-pushed the add-geometric-mean-combination branch 4 times, most recently from 9c78771 to a62afaa Compare August 2, 2023 18:02

Add geometric mean normalization for scores

793f05a

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the add-geometric-mean-combination branch from a62afaa to 793f05a Compare August 2, 2023 18:18

heemin32 reviewed Aug 2, 2023

View reviewed changes

...ensearch/neuralsearch/processor/combination/GeometricMeanScoreCombinationTechniqueTests.java Show resolved Hide resolved

Adding score correstness unit tests based on precalculated and random…

b2aeaae

… values Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski requested a review from heemin32 August 2, 2023 22:41

heemin32 reviewed Aug 2, 2023

View reviewed changes

heemin32 approved these changes Aug 2, 2023

View reviewed changes

Add alternative formula for verifying geometric mean

f2fdcbe

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the add-geometric-mean-combination branch from e91da54 to f2fdcbe Compare August 3, 2023 00:49

naveentatikonda approved these changes Aug 3, 2023

View reviewed changes

martin-gaievski merged commit b867e69 into opensearch-project:feature/normalization Aug 3, 2023
14 checks passed

martin-gaievski mentioned this pull request Aug 3, 2023

Added Score Normalization and Combination feature #241

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add geometric mean normalization for scores #239

Add geometric mean normalization for scores #239

martin-gaievski commented Aug 1, 2023 •

edited

Loading

codecov bot commented Aug 2, 2023 •

edited

Loading

heemin32 Aug 2, 2023 •

edited

Loading

navneet1v Aug 3, 2023 •

edited

Loading

Add geometric mean normalization for scores #239

Add geometric mean normalization for scores #239

Conversation

martin-gaievski commented Aug 1, 2023 • edited Loading

Description

Issues Resolved

Check List

codecov bot commented Aug 2, 2023 • edited Loading

Codecov Report

heemin32 Aug 2, 2023 • edited Loading

Choose a reason for hiding this comment

navneet1v Aug 3, 2023 • edited Loading

Choose a reason for hiding this comment

martin-gaievski commented Aug 1, 2023 •

edited

Loading

codecov bot commented Aug 2, 2023 •

edited

Loading

heemin32 Aug 2, 2023 •

edited

Loading

navneet1v Aug 3, 2023 •

edited

Loading