You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As can be seen below, in the scene scene0011_00 which is in the val split, the utterance for one chair is This is a brown chair. There are many identical chairs setting around the table it sets at.
Obviously, there are at least 4 chairs that match this utterance. Such ambiguous descriptions in the training set may provide some supervision signals to facilitate the model's learning of vision-language alignment, but encountering such ambiguous descriptions in the validation set does not help us evaluate the model's performance.
The text was updated successfully, but these errors were encountered:
As can be seen below, in the scene
scene0011_00
which is in the val split, the utterance for one chair isThis is a brown chair. There are many identical chairs setting around the table it sets at.
Obviously, there are at least 4 chairs that match this utterance. Such ambiguous descriptions in the training set may provide some supervision signals to facilitate the model's learning of vision-language alignment, but encountering such ambiguous descriptions in the validation set does not help us evaluate the model's performance.
The text was updated successfully, but these errors were encountered: