-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain API changes #2403
base: main
Are you sure you want to change the base?
Explain API changes #2403
Conversation
93c1f96
to
8b52ed7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have Unit test for each case
- Disk-based search with valid rescore context.
- Radial search.
- ANN search (default case).
- Shard-level rescoring enabled.
- Shard-level rescoring disabled.
- Filter weight case where filtered IDs are less than k.
- Filter threshold value greater than cardinality.
- Missing native engine files.
- Valid context with matching document and disk-based search.
And Please validate with Explaination Object.
@Vikasht34 yes the tests are not yet added in here, hence its in a draft status. I will add the coverage with all the possible cases. |
8b52ed7
to
bffbdac
Compare
Signed-off-by: Neetika Singhal <[email protected]>
bffbdac
to
7c4f425
Compare
@navneet1v / @Vikasht34 would you please review the changes? |
@neetikasinghal can you please add an entry in the change log |
yup i generally add it towards the end of the review so that its easier to rebase with the latest changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at high level in the code, need to go through more in terms of explanation. But one thing I want to add is I am not seeing any ITs related to explain api. Can we please add them too
@@ -46,6 +51,11 @@ public float scoreTranslation(float rawScore) { | |||
return 1 / (1 + rawScore); | |||
} | |||
|
|||
@Override | |||
public String explainScoreTranslation(float rawScore) { | |||
return "`1 / (1 + rawScore)`"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this as private static final string.
@@ -77,6 +87,11 @@ public float scoreTranslation(float rawScore) { | |||
return Math.max((2.0F - rawScore) / 2.0F, 0.0F); | |||
} | |||
|
|||
@Override | |||
public String explainScoreTranslation(float rawScore) { | |||
return "`Math.max((2.0F - rawScore) / 2.0F, 0.0F)`"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
yup, that WIP as my setup for ITs was broken. I am able to fix that now, however the PR has the coverage for all the UTs. |
Signed-off-by: Neetika Singhal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will continue on other files.
@@ -177,6 +215,8 @@ public KNNVectorSimilarityFunction getKnnVectorSimilarityFunction() { | |||
|
|||
public abstract float scoreTranslation(float rawScore); | |||
|
|||
public abstract String explainScoreTranslation(float rawScore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstraction has to be added wisely when there is unique character about the concretes which implements it. In contrast in this case ,
Every SpaceType enum must implement this method, even when the explanation is always the same for certain types which leads to having redundant codes everywhere.
In my opinion when there is redundancy about function , the only way to reduce is through constructor overrides.
so in this case
SpaceType(String name, String explanationFormula) {
this.name = name;
this.explanationFormula = explanationFormula;
}
private static final String SCORE_TRANSLATION_L2 = "`1 / (1 + rawScore)`";
private static final String SCORE_TRANSLATION_COSINE = "`Math.max((2.0F - rawScore) / 2.0F, 0.0F)`";
and code will reduce to
L2("l2", SCORE_TRANSLATION_L2) {
@Override
public float scoreTranslation(float rawScore) {
return 1 / (1 + rawScore);
}
},
COSINE("cosine", SCORE_TRANSLATION_COSINE) {
@Override
public float scoreTranslation(float rawScore) {
return Math.max((2.0F - rawScore) / 2.0F, 0.0F);
}
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can introduce introduce the static final variables to reduce code duplication, however the logic highlighted above of constructors wont work in case of inner product where the explain formula differs:
@Override
public String explainScoreTranslation(float rawScore) {
if (rawScore >= 0) {
return "`1 / (1 + rawScore)`";
}
return "`-rawScore + 1`";
}
Its dependent on raw score. Also, there could be future space type whose formula depends on the raw score, hence i thought of introducing this function instead. let me know your thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how Ineer product will work, if explainScoreTranslation is differnt then it can be overridden.
INNER_PRODUCT("innerproduct") {
@Override
public float scoreTranslation(float rawScore) {
return rawScore >= 0 ? (1 / (1 + rawScore)) : (-rawScore + 1);
}
@Override
public String explainScoreTranslation(float rawScore) {
return rawScore >= 0 ? "`1 / (1 + rawScore)`" : "`-rawScore + 1`";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public enum SpaceType {
UNDEFINED("undefined") {
@Override
public float scoreTranslation(final float rawScore) {
throw new IllegalStateException("Unsupported method");
}
@Override
public String explainScoreTranslation(float rawScore) {
throw new IllegalStateException("Unsupported method");
}
},
L2("l2", "`1 / (1 + rawScore)`") {
@Override
public float scoreTranslation(float rawScore) {
return 1 / (1 + rawScore);
}
},
COSINE("cosine", "`Math.max((2.0F - rawScore) / 2.0F, 0.0F)`") {
@Override
public float scoreTranslation(float rawScore) {
return Math.max((2.0F - rawScore) / 2.0F, 0.0F);
}
},
INNER_PRODUCT("innerproduct") {
@Override
public float scoreTranslation(float rawScore) {
return rawScore >= 0 ? (1 / (1 + rawScore)) : (-rawScore + 1);
}
@Override
public String explainScoreTranslation(float rawScore) {
return rawScore >= 0 ? "`1 / (1 + rawScore)`" : "`-rawScore + 1`";
}
};
private final String name;
private final String explanationFormula;
SpaceType(String name) {
this.name = name;
this.explanationFormula = null;
}
SpaceType(String name, String explanationFormula) {
this.name = name;
this.explanationFormula = explanationFormula;
}
public abstract float scoreTranslation(float rawScore);
public String explainScoreTranslation(float rawScore) {
if (explanationFormula != null) {
return explanationFormula;
}
throw new UnsupportedOperationException("explainScoreTranslation not defined for this space type.");
}
}
Let me know if it makes sense.
* This class captures details around knn explain queries that is used | ||
* by explain API to generate explanation for knn queries | ||
*/ | ||
public class KnnExplanation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are couple of things that we can improve on this class,
- As of Now every client of this class has to get Maps and then put values so Maps are final but Modificable outside of this class. Ideally as client, I should not be knowing internal data structure of KnnExplanation Class , Client must exposed to api like addAnnResult, AddRawScores , addKnnScores that's all , how we are adding these and which data structure to use , it's should be upto KNNExplanation class.
- Please make all maps Unmodifiable outside of this class. No Client should be able to Modify these maps except KNNExplanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, will update it.
|
||
public DocAndScoreQuery(int k, int[] docs, float[] scores, int[] segmentStarts, Object contextIdentity) { | ||
public DocAndScoreQuery(int k, int[] docs, float[] scores, int[] segmentStarts, Object contextIdentity, KNNWeight knnWeight) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now whole explain method is there in KNNWeight class , DocAndScoreQuery has to take reference of this class , just to call , explain method of this class , Please abstract explain method from KnnWeight class so that in future if we have another type of query , we don't end up passing KnnWeight class everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand the caveat here of passing the KNNWeight class, but didnt really get your suggestion of abstracting out the explain method out of the KNNWeight class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move explain method to utlity class , which takes context , k and scorere and call that utility method from both KnnWieght and DocsAndScoREqUERY
.append(cardinality); | ||
} | ||
} | ||
if (knnExplanation.getAnnResultPerLeaf().get(context.id()) != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are doing multiple lookups here , Once to see if it's null , another one to get value to check if it is zero. Can we avoid these multiple look ups to map.?
|
||
public DocAndScoreQuery(int k, int[] docs, float[] scores, int[] segmentStarts, Object contextIdentity) { | ||
public DocAndScoreQuery(int k, int[] docs, float[] scores, int[] segmentStarts, Object contextIdentity, KNNWeight knnWeight) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move explain method to utlity class , which takes context , k and scorere and call that utility method from both KnnWieght and DocsAndScoREqUERY
@@ -177,6 +215,8 @@ public KNNVectorSimilarityFunction getKnnVectorSimilarityFunction() { | |||
|
|||
public abstract float scoreTranslation(float rawScore); | |||
|
|||
public abstract String explainScoreTranslation(float rawScore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how Ineer product will work, if explainScoreTranslation is differnt then it can be overridden.
INNER_PRODUCT("innerproduct") {
@Override
public float scoreTranslation(float rawScore) {
return rawScore >= 0 ? (1 / (1 + rawScore)) : (-rawScore + 1);
}
@Override
public String explainScoreTranslation(float rawScore) {
return rawScore >= 0 ? "`1 / (1 + rawScore)`" : "`-rawScore + 1`";
@@ -177,6 +215,8 @@ public KNNVectorSimilarityFunction getKnnVectorSimilarityFunction() { | |||
|
|||
public abstract float scoreTranslation(float rawScore); | |||
|
|||
public abstract String explainScoreTranslation(float rawScore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public enum SpaceType {
UNDEFINED("undefined") {
@Override
public float scoreTranslation(final float rawScore) {
throw new IllegalStateException("Unsupported method");
}
@Override
public String explainScoreTranslation(float rawScore) {
throw new IllegalStateException("Unsupported method");
}
},
L2("l2", "`1 / (1 + rawScore)`") {
@Override
public float scoreTranslation(float rawScore) {
return 1 / (1 + rawScore);
}
},
COSINE("cosine", "`Math.max((2.0F - rawScore) / 2.0F, 0.0F)`") {
@Override
public float scoreTranslation(float rawScore) {
return Math.max((2.0F - rawScore) / 2.0F, 0.0F);
}
},
INNER_PRODUCT("innerproduct") {
@Override
public float scoreTranslation(float rawScore) {
return rawScore >= 0 ? (1 / (1 + rawScore)) : (-rawScore + 1);
}
@Override
public String explainScoreTranslation(float rawScore) {
return rawScore >= 0 ? "`1 / (1 + rawScore)`" : "`-rawScore + 1`";
}
};
private final String name;
private final String explanationFormula;
SpaceType(String name) {
this.name = name;
this.explanationFormula = null;
}
SpaceType(String name, String explanationFormula) {
this.name = name;
this.explanationFormula = explanationFormula;
}
public abstract float scoreTranslation(float rawScore);
public String explainScoreTranslation(float rawScore) {
if (explanationFormula != null) {
return explanationFormula;
}
throw new UnsupportedOperationException("explainScoreTranslation not defined for this space type.");
}
}
Let me know if it makes sense.
Description
Add support for explain for Exact/ANN/Radial/Disk/Filtering k-nn search. Score calculation explanation is currently added only for ANN search.
Proposal for explain is given here: #875 (comment)
ITs - WIP..
Related Issues
Resolves #875
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.