Added mean reciprocal rank as an accuracy metric. Available as "mean_reciprocal_rank".
Added return_per_class argument for AccuracyCalculator. This is like avg_of_avgs but returns the accuracy per class, instead of averaging them for you.

Related issues

#369
#372
#374
#394

Contributors

Thanks to @cwkeam and @mlw214!

Contributors

mlw214 and cwkeam

Assets 2

28 Nov 20:20

KevinMusgrave

v1.0.0

1499aaf

v1.0.0

Reference embeddings for tuple losses

You can separate the source of anchors and positive/negatives. In the example below, anchors will be selected from embeddings and positives/negatives will be selected from ref_emb.

loss_fn = TripletMarginLoss()
loss = loss_fn(embeddings, labels, ref_emb=ref_emb, ref_labels=ref_labels)

Efficient mode for DistributedLossWrapper

efficient=True: each process uses its own embeddings for anchors, and the gathered embeddings for positives/negatives. Gradients will not be equal to those in non-distributed code, but the benefit is reduced memory and faster training.
efficient=False: each process uses gathered embeddings for both anchors and positives/negatives. Gradients will be equal to those in non-distributed code, but at the cost of doing unnecessary operations (i.e. doing computations where both anchors and positives/negatives have no gradient).

The default is False. You can set it to True like this:

from pytorch_metric_learning import losses
from pytorch_metric_learning.utils import distributed as pml_dist

loss_func = losses.ContrastiveLoss()
loss_func = pml_dist.DistributedLossWrapper(loss_func, efficient=True)

Documentation: https://kevinmusgrave.github.io/pytorch-metric-learning/distributed/

Customizing k-nearest-neighbors for AccuracyCalculator

You can use a different type of faiss index:

import faiss
from pytorch_metric_learning.utils.accuracy_calculator import AccuracyCalculator
from pytorch_metric_learning.utils.inference import FaissKNN

knn_func = FaissKNN(index_init_fn=faiss.IndexFlatIP, gpus=[0,1,2])
ac = AccuracyCalculator(knn_func=knn_func)

You can also use a custom distance function:

from pytorch_metric_learning.distances import SNRDistance
from pytorch_metric_learning.utils.inference import CustomKNN

knn_func = CustomKNN(SNRDistance())
ac = AccuracyCalculator(knn_func=knn_func)

Relevant docs:

Issues resolved

#204
#251
#256
#292
#330
#337
#345
#347
#349
#353
#359
#361
#362
#363
#368
#376
#380

Contributors

Thanks to @yutanakamura-tky and @KinglittleQ for pull requests, and @mensaochun for providing helpful code in #380

Contributors

mensaochun, KinglittleQ, and yutanakamura-tky

Assets 2

0 Join discussion

10 May 02:52

KevinMusgrave

v0.9.99

5cd57c8

v0.9.99

Bug fixes

Accuracy Calculation bug in GlobalTwoStreamEmbeddingSpaceTester (#301)
Mixed precision bug in convert_to_weights (#300)

Features

HierarchicalSampler
Improved functionality for InferenceModel (#296 and #304)
- train_indexer now accepts a dataset
- also added functions save_index, load_index, and add_to_indexer
Added power argument to LpRegularizer (#299)
Return exception if labels has more than 1 dimension (#307)
Added a global flag for turning on/off collect_stats (#311)
TripletMarginLoss smooth variant uses the input margin now (#315)
Use package-specific logger, "PML", instead of root logger (#318)
Cleaner key verification in the trainers (#102)

Thanks to @elias-ramzi, @gkouros, @vltanh, and @Hummer12007

Assets 2

03 Apr 00:20

KevinMusgrave

v0.9.98

987f2e9

v0.9.98

AccuracyCalculator breaking change (issue #290)

The k parameter in AccuracyCalculator has a new behavior. The allowed values are:

None. This means k will be set to the total number of reference embeddings.
An integer greater than 0. This means k will be set to the input integer.
"max_bin_count". This means k will be set to max(bincount(reference_labels)) - self_count where self_count == 1 if the query and reference embeddings come from the same source.

The old behavior is described here.

If your dataset is large, you might find the k-nn search is now very slow. This is because the new default behavior is to set k to len(reference_embeddings). To avoid this, you can set k to a number, like k = 1000 or try k = "max_bin_count" to get behavior similar (though not identical) to the old default.

Apologies for the drastic change. I'm hoping to have things stable and following semantic versioning when v1.0 arrives.

Bug fixes

lmu.convert_to_triplets has been fixed (#291)
Losses and miners should now be compatible with autocast (#293)

New features / improvements

The loss used in Supervised Contrastive Learning. Documentation: SupConLoss. By @fjsj (#281, #288)
Vectorized convert_to_triplets (#279)

Assets 2

04 Mar 00:46

KevinMusgrave

v0.9.97

0492d17

v0.9.97

Bug fixes

Small fix for NTXentLoss with no negative pairs #272
Fixed .detach() bug in NTXentLoss #282
Fixed parameter override bug in MatchFinder.get_matching_pairs() #286 by @joaqo

New features and improvements

AccuracyCalculator now uses torch instead of numpy

All the calculations (except for NMI and AMI) are done with torch. Calculations will be done on the same device and dtype as the input query tensor.
You can still pass numpy arrays into AccuracyCalculator.get_accuracy, but the arrays will be immediately converted to torch tensors.

Faster custom label comparisons in AccuracyCalculator

See #264 by @mlopezantequera

Numerical stability improvement for DistanceWeightedMiner

See #278 by @z1w

UniformHistogramMiner

This is like DistanceWeightedMiner, except that it works well with high dimension embeddings, and works with any distance metric (not just L2 normalized distance). Documentation

PerAnchorReducer

This converts unreduced pairs to unreduced elements. For example, NTXentLoss returns losses per positive pair. If you used PerAnchorReducer with NTXentLoss, then the losses per pair would first be converted to losses per batch element, before being passed to the inner reducer. See the documentation

BaseTester no longer converts embeddings from torch to numpy

This includes the get_all_embeddings function. If you want get_all_embeddings to return numpy arrays, you can set the return_as_numpy flag to True:

embeddings, labels = tester.get_all_embeddings(dataset, model, return_as_numpy=True)

The embeddings are converted to numpy only for the visualizer and visualizer_hook, if specified.

Reduced usage of .to(device) and .type(dtype)

Tensors are initialized on device and with the necessary dtype, and they are moved to device and cast to dtypes only when necessary. See this code snippet for details.

Simplified DivisorReducer

Replaced "divisor_summands" with "divisor".

Assets 2

12 Jan 14:49

KevinMusgrave

v0.9.96

62d6ad9

v0.9.96

New Features

Thanks to @mlopezantequera for adding the following features!

Testers: allow any combination of query and reference sets (#250)

To evaluate different combinations of query and reference sets, use the splits_to_eval argument for tester.test().

For example, let's say your dataset_dict has two keys: "dataset_a" and "train".

The default splits_to_eval = None is equivalent to:

splits_to_eval = [('dataset_a', ['dataset_a']), ('train', ['train'])]

dataset_a as the query, and train as the reference:

splits_to_eval = [('dataset_a', ['train'])]

dataset_a as the query, and dataset_a + train as the reference:

splits_to_eval = [('dataset_a', ['dataset_a', 'train'])]

Then pass splits_to_eval to tester.test:

tester.test(dataset_dict, epoch, model, splits_to_eval = splits_to_eval)

Note that this new feature makes the old reference_set init argument obsolete, so reference_set has been removed.

AccuracyCalculator: allow arbitrary label comparion functions (#254)

AccuracyCalculator now has an optional init argument, label_comparison_fn, which is a function that compares two numpy arrays of labels and returns a boolean array. The default is numpy.equal. If a custom function is used, then you must exclude clustering based metrics ("NMI" and "AMI"). The following is an example of a custom function for two-dimensional labels. It returns True if the 0th column matches, and the 1st column does not match:

def example_label_comparison_fn(x, y):
    return (x[:, 0] == y[:, 0]) & (x[:, 1] != y[:, 1])

AccuracyCalculator(exclude=("NMI", "AMI"), 
                    label_comparison_fn=example_label_comparison_fn)

Other Changes

BaseTrainer and BaseTester now take in an optional dtype argument. This is the type that the dataset output will be converted to, e.g. torch.float16. If set to the default value of None, then no type casting will be done.
Removed self.dim_reduced_embeddings from BaseTester and the associated code in HookContainer, due to lack of use.
tester.test() now returns all_accuracies, whereas before, it returned nothing and you'd have to access all_accuracies either through the end_of_testing_hook or by accessing tester.all_accuracies.
tester.embeddings_and_labels is deleted at the end of tester.test() to free up memory.

Assets 2

11 Dec 05:13

KevinMusgrave

v0.9.95

f14a850

v0.9.95

New

BatchEasyHardMiner

This new miner is an implementation of Improved Embeddings with Easy Positive Triplet Mining. See the documentation. Thanks @marijnl!

New metric added to AccuracyCalculator

The new metric is mean_average_precision, which is the commonly used k-nn based mAP in information retrieval.
Note that this differs from the already existing metric, mean_average_precision_at_r.

Bug fixes

dtype casting in MultiSimilarityMiner changed to work with autocast. See #233 by @thinline72
Added logic for dealing with zero rows in the weight matrix in DistanceWeightedMiner by ignoring them. For example, if the entire weight matrix is 0, then no triplets will be returned. Previously, the zero rows would cause a RuntimeError. See #230 by @tpanum

Assets 2

Releases: KevinMusgrave/pytorch-metric-learning

v1.2.0

New Loss Function: SubCenterArcFace

Contributors

v1.1.2

Bug fixes

v1.1.1

Bug fixes

v1.1.0

New features

CentroidTripletLoss

VICRegLoss