You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to Bert's architecture, the loss is calculated as the sum of the mean masked LM likelihood and the mean next sentence prediction likelihood. Does this implementation includes the next sentence prediction loss when calculating loss? Does the use of 'SEP' tag will have any effect on training loss?
The text was updated successfully, but these errors were encountered:
According to Bert's architecture, the loss is calculated as the sum of the mean masked LM likelihood and the mean next sentence prediction likelihood. Does this implementation includes the next sentence prediction loss when calculating loss? Does the use of 'SEP' tag will have any effect on training loss?
The text was updated successfully, but these errors were encountered: