Deep Mimic Reward Computation #3592

gsp-27 · 2018-12-29T17:30:39Z

gsp-27
Dec 29, 2018

Hi,
In GetReward() function of script humanoid.py, the reward is computed is computed by comparing the current pose of the agent with the _kinematicHumanoid Pose, where Pose of _kinematicHumanoid Pose is initialized from motion data.

Now in reset() function of humanoid_deepmimic_gym_env.py, one of the pose is randomly sampled from motion file and the agent is initialized in that pose, then the action is performed and dynamics are simulated for 8 steps (corresponding to 30Hz policy querying), which brings agent to a new state, but the SimTime is never updated so in reward computation, it is always compared with the same pose.

I believe that it setSimTime() should be used in step function too. If this sounds right, I am happy to submit a pull request to rectify the issue

erwincoumans · 2019-01-03T00:26:40Z

erwincoumans
Jan 3, 2019
Maintainer

Indeed, we need to call setSimTime. Keep in mind that this PyBullet DeepMimic implementation is still preliminary and under construction, so it still requires a bit more work until it is fully functional. Help is welcome though!

0 replies

gsp-27 · 2019-01-03T02:25:09Z

gsp-27
Jan 3, 2019
Author

Okay, I can make a pull request regarding it. I have made the changes in the necessary places I believe.
Thanks

Also GetReward() function does not fully implement the reward as mentioned in the paper. If you would have time to guide me a little bit I can complete that too.

0 replies

aerent · 2019-01-08T09:58:25Z

aerent
Jan 8, 2019

Hi, @erwincoumans . I am also following the deep mimic training with pybullet. I directly use PPO algorithm provided by OpenAI baseline to train the pybullet humanoid to mimic the reference motion of walking, but the policy does not succeed in mimicking the reference motion.
You mentioned the pybullet deep mimic implementation is still preliminary, I am wondering which direction should I focus to make it work, is it the humanoid model or the PPO algorithm?
Thanks very much!

0 replies

erwincoumans · 2019-01-15T04:46:46Z

erwincoumans
Jan 15, 2019
Maintainer

I'm working on this on-and-off. All of it needs more work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep Mimic Reward Computation #3592

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deep Mimic Reward Computation #3592

gsp-27 Dec 29, 2018

Replies: 4 comments

erwincoumans Jan 3, 2019 Maintainer

gsp-27 Jan 3, 2019 Author

aerent Jan 8, 2019

erwincoumans Jan 15, 2019 Maintainer

gsp-27
Dec 29, 2018

erwincoumans
Jan 3, 2019
Maintainer

gsp-27
Jan 3, 2019
Author

aerent
Jan 8, 2019

erwincoumans
Jan 15, 2019
Maintainer