Change of bin loss computation to avoid learning from empty annotations. #278

johanneskbl · 2022-10-10T16:20:55Z

TLDR: This fix leads to better performance in rotation prediction.

First of all, thank you very much for publicating your work. It was super helpful for my work as well. In my work I found a slight hickup in the loss function for the roation bin classification. It is minor in code but it has a rather big impact on the convergence and overall performance of the network for predicting rotation of objects and therefore, at inference, also in predicting velocities.

I was using the nuScenes dataset so I can't (without high effort) be sure how it looks like for the KITTI dataset. Although, in the GenericDataset class there is the parameter max_objs for both datasets. For the nuScenes dataset this parameter states that there can be at most max_objs number of annotations per image. If there are less than this number of annotations present in the image the rest is simply filled up by default labeling (basically zeros for every parameter). For simplicity, let's name this rest as "placeholder annotations". This entire concept is completely fine if those "placeholder annotations" are not used for anything. Except of course, to not (!) predict anything because it would not make sense for the network to always predict the maximum number of objects. Although, this principle of not predicting objects is already trained for in the heatmap head.

I propose removing these placeholder annotations, or to be more concise, removing the indices from the output and target tensors where the mask tensor is zero. This would have the same effect as masking the output with zeros in all other loss functions but for the entropy loss it is different. When we mask the output of the rotation bin classification we basically say we are 0% certain that the angle is either in bin 1 or not in bin 1. But since the target by default is 0 (which means not in bin 1) the backward pass optimizes the parameters for classifying not being in bin 1. Thus we do not (!) ignore the placeholder annotations when we mask the output. In most other loss functions, for example L1Loss
Loss = |pred - target| = 0 - 0 = 0,
masking the output has the effect of ignoring the placeholder annotations, but not so in the entropy loss since
e^0 != 0.

Also, in other loss functions that share this problem as for example the WeightedBCELoss you masked the unreduced loss instead of the output which has the same effect as removing the indices.

…his leads to better performance in rotation prediction.

Change of bin loss computation to avoid learning from empty frames. T…

70b8e29

…his leads to better performance in rotation prediction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change of bin loss computation to avoid learning from empty annotations. #278

Change of bin loss computation to avoid learning from empty annotations. #278

johanneskbl commented Oct 10, 2022

Change of bin loss computation to avoid learning from empty annotations. #278

Are you sure you want to change the base?

Change of bin loss computation to avoid learning from empty annotations. #278

Conversation

johanneskbl commented Oct 10, 2022