You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered a significant issue in the MuJoCo HalfCheetah environment related to entropy calculation when using active_mask. Specifically, the entropy is incorrectly summed across actions rather than being averaged, resulting in unusually high entropy values ranging from 6000 to 8000 in my tests using the HAPPO algorithm.
Impact on Rewards
Furthermore, this error in entropy calculation substantially affects the rewards obtained. With the flawed method of summing entropy, the observed rewards in my experiments peak at around 5000. However, when the entropy is correctly averaged, the rewards converge to about 7500.
Code
The text was updated successfully, but these errors were encountered:
Description
I've encountered a significant issue in the MuJoCo HalfCheetah environment related to entropy calculation when using
active_mask
. Specifically, the entropy is incorrectly summed across actions rather than being averaged, resulting in unusually high entropy values ranging from 6000 to 8000 in my tests using the HAPPO algorithm.Impact on Rewards
Furthermore, this error in entropy calculation substantially affects the rewards obtained. With the flawed method of summing entropy, the observed rewards in my experiments peak at around 5000. However, when the entropy is correctly averaged, the rewards converge to about 7500.
Code
The text was updated successfully, but these errors were encountered: