-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1. score_pairs refactor #333
1. score_pairs refactor #333
Conversation
2cf4075
to
e07b11a
Compare
The general functionality is ready. What is left to be done is to produce a better documentation. I'm on it. |
@mvargas33 is this ready to review? |
Yes, it is ! The code and the docs are ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done a full review yet, but overall I'm +1 for the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a full review!
In general, please run spellcheck to avoid too many typos :-)
test/test_sklearn_compat.py
Outdated
@@ -148,7 +148,7 @@ def test_array_like_inputs(estimator, build_dataset, with_preprocessor): | |||
pairs = np.array([[X[0], X[1]], [X[0], X[2]]]) | |||
pairs_variants, _ = generate_array_like(pairs) | |||
for pairs_variant in pairs_variants: | |||
estimator.score_pairs(pairs_variant) | |||
estimator.pair_distance(pairs_variant) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all metric learners will have pair_distance implemented
so use pair_score instead
perhaps you can test pair_distance only when it is implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now its tested with pair_score
, but it also checks that pair_distance
can be used.
In case a learner does not have pair_distance
, then the exception must match that it cannot be used with that specific learner.
But as this PR does not contemplate any learner non-mahalnobian, the exception msg is empty as a To-Do for #329
test/test_utils.py
Outdated
@@ -834,8 +834,8 @@ def test_error_message_tuple_size(estimator, _): | |||
|
|||
@pytest.mark.parametrize('estimator, _', metric_learners, | |||
ids=ids_metric_learners) | |||
def test_error_message_t_score_pairs(estimator, _): | |||
"""tests that if you want to score_pairs on triplets for instance, it returns | |||
def test_error_message_t_pair_distance(estimator, _): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as before: pair_distance will not always be implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generalized with a try/catch
. If there is an exception, it can only happen if pair_distance
is not implemented. Did this in test_utils.py
and test_sklearn_compat.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mvargas33, still some small things to fix but overall looks good now!
@perimosocordiae do you want to take a pass before we merge?
metric_learn/base_metric.py
Outdated
Returns the similarity score between pairs of points. Depending on the | ||
algorithm, this method can return the learned similarity score between | ||
pairs, or the opposite of the distance learned between pairs. The larger | ||
the score, the more similar the pair. All learners have access to this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: this is a bit heavy. I would recommend simply:
"Returns the similarity score between pairs of points (the larger the score, the more similar the pair). For metric learners that learn a distance, the score is simply the opposite of the distance between pairs."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed and kept the last sentence "All learners have access to this method".
metric_learn/base_metric.py
Outdated
1D arrays and cannot use a preprocessor. Besides, the returned function | ||
is independent of the metric learner and hence is not modified if the | ||
metric learner is. | ||
two points. The difference between `pair_score` and `pair_distance` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, this is in the "see also" of pair_score
, so it should only be "The difference with pair_score
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually of the old score_pairs
, so I changed it for "The difference with score_pairs
".
pair_score
and pair_distance
's "See Also" are ok
def test_raise_not_fitted_error_if_not_fitted(estimator, build_dataset, | ||
with_preprocessor): | ||
"""Test that a NotFittedError is raised if someone tries to use | ||
score_pairs, decision_function, get_metric, transform or | ||
pair_score, score_pairs, decision_function, get_metric, transform or | ||
get_mahalanobis_matrix on input data and the metric learner | ||
has not been fitted.""" | ||
input_data, labels, preprocessor, _ = build_dataset(with_preprocessor) | ||
estimator = clone(estimator) | ||
estimator.set_params(preprocessor=preprocessor) | ||
set_random_state(estimator) | ||
with pytest.raises(NotFittedError): | ||
with pytest.raises(NotFittedError): # Remove in 0.8.0 | ||
estimator.score_pairs(input_data) | ||
with pytest.raises(NotFittedError): | ||
estimator.pair_score(input_data) | ||
with pytest.raises(NotFittedError): | ||
estimator.decision_function(input_data) | ||
with pytest.raises(NotFittedError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to keep in mind for #329: some of these "raise_not_fitted" tests should actually be moved to base metric / Mahalanobis mixin / bilinear mixin tests. Indeed some of these methods will be present for all metric learners, some of them only for Mahalanobis but not bilinear (eg, transform), etc. The tests for pairs_classifiers (and other classifiers) should only be for methods specific to the associated class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @mvargas33!
Let's wait a bit to see if others have some time to review before we merge.
Merged, thanks @mvargas33! |
See #329
In the aim of making metric learn compatible with similarity learners like OASIS and pseudo-distance Mahalanobis learners (all current learners) I take a step forward deprecating
score_pairs
with aFutureWarning
, and replacing it withpair_score
andpair_distance
, as discussed in #329 .After thinking a while between
pairwise_similarity
andpair_similarity
names, I think that using ''pairwise'' gives the user the intuition of similarity between all pairs, so a matrix is expected as output, not a list. Just like sklearn'spairwise_distances
(See here).Right now it does not makes too much sense to have
pair_distances
andpair_similarity
only for Mahalanobis learners becausepair_similarity
it's just the inverse of the distance. But, this PR is needed to proceed with #329 and then with #330.As discussed before, this change makes the code very extendible for new types of learners, for instance a
StringEdit
learner could fit in the library seamlessly extending fromBaseLearner
regardless the algorithm learns a similarity or a pseudo-distance.It also does not affect
Classifiers
that much, as they can work withpair_distance
from now on.In contrast to this proposal, using
isinstance()
for each kind of learner inside the currentscore_pairs
does not look like a good solution, in my humble opinion.Ps: If the discussion goes in favor, I'll check all current tests, one by one, to make sure they don't become broken.