Incorrect entropy calculation in MuJoCo's HalfCheetah environment when using active_masks #52

Jinbo-He · 2024-09-19T04:30:55Z

Description

I've encountered a significant issue in the MuJoCo HalfCheetah environment related to entropy calculation when using active_mask. Specifically, the entropy is incorrectly summed across actions rather than being averaged, resulting in unusually high entropy values ranging from 6000 to 8000 in my tests using the HAPPO algorithm.

Impact on Rewards

Furthermore, this error in entropy calculation substantially affects the rewards obtained. With the flawed method of summing entropy, the observed rewards in my experiments peak at around 5000. However, when the entropy is correctly averaged, the rewards converge to about 7500.

Code

The text was updated successfully, but these errors were encountered:

Ivan-Zhong · 2024-10-04T09:44:01Z

Hi, thank you for pointing it out. I will check the code as soon as possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect entropy calculation in MuJoCo's HalfCheetah environment when using active_masks #52

Incorrect entropy calculation in MuJoCo's HalfCheetah environment when using active_masks #52

Jinbo-He commented Sep 19, 2024 •

edited

Loading

Ivan-Zhong commented Oct 4, 2024

Incorrect entropy calculation in MuJoCo's HalfCheetah environment when using active_masks #52

Incorrect entropy calculation in MuJoCo's HalfCheetah environment when using active_masks #52

Comments

Jinbo-He commented Sep 19, 2024 • edited Loading

Description

Impact on Rewards

Code

Ivan-Zhong commented Oct 4, 2024

Jinbo-He commented Sep 19, 2024 •

edited

Loading