Updated Dense Reward for Maze tasks #216

siddarth-c · 2024-04-08T08:11:27Z

Description

Updated the dense reward of Maze environments from exp(-distance) to -distance

Fixes #175

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Kallinteris-Andreas · 2024-04-08T08:35:23Z

Have you validated that the new version works as expected
Also compare training performance across the 2 different versions

Thanks

siddarth-c · 2024-04-14T12:12:46Z

Sorry for the wait. I've validated the updated reward function. Episodes of the new reward are, on average, 2.5 times shorter, with goals achieved in 50 timesteps compared to the previous 120. Meaning goals are reached much faster than wandering around the goal.

comparision.mp4

Kallinteris-Andreas

We need more information to change the environment,
please provide:

code tested (link the GitHub repo).
provide graphs showing learning behavior, "steps to reach goal (/terminate)", "episodic returns", "distance of robot from the goal over time" and others you believe are applicable
also test AntMaze

Thanks!

Kallinteris-Andreas · 2024-04-14T14:32:05Z

gymnasium_robotics/envs/maze/maze.py

@@ -274,9 +274,9 @@ def add_xy_position_noise(self, xy_pos: np.ndarray) -> np.ndarray:
    def compute_reward(
        self, achieved_goal: np.ndarray, desired_goal: np.ndarray, info
    ) -> float:
-        distance = np.linalg.norm(achieved_goal - desired_goal, axis=-1)
+        distance = np.linalg.norm(achieved_goal - desired_goal, ord = 2, axis=-1)


Why was ord=2, added? That is the default behavior anyway

Correct, wanted to put it out explicitly.

siddarth-c · 2024-04-16T03:38:41Z

Here is a repo with the code and a few plots for understanding the behaviour.
AntMaze did not learn in 1e6 time steps and I cant afford to run longer. But the difference in the behaviour is quite evident in PointMaze.

Thanks!

Kallinteris-Andreas · 2024-04-16T11:37:08Z

Your charts are wrong, for example episodic_return with "new reward" gets positive values, which is not possible as it is a sum of non positive values

also there is no indication on how many runs, were tested

siddarth-c added 2 commits April 8, 2024 13:22

Update dense reward

8ae3215

Update dense reward

7256426

Kallinteris-Andreas requested changes Apr 14, 2024

View reviewed changes

siddarth-c marked this pull request as draft April 16, 2024 03:20

siddarth-c requested a review from Kallinteris-Andreas April 16, 2024 04:33

siddarth-c closed this May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated Dense Reward for Maze tasks #216

Updated Dense Reward for Maze tasks #216

siddarth-c commented Apr 8, 2024

Kallinteris-Andreas commented Apr 8, 2024 •

edited

Loading

siddarth-c commented Apr 14, 2024 •

edited

Loading

Kallinteris-Andreas left a comment •

edited

Loading

Kallinteris-Andreas Apr 14, 2024

siddarth-c Apr 16, 2024

siddarth-c commented Apr 16, 2024

Kallinteris-Andreas commented Apr 16, 2024

Updated Dense Reward for Maze tasks #216

Updated Dense Reward for Maze tasks #216

Conversation

siddarth-c commented Apr 8, 2024

Description

Type of change

Screenshots

Checklist:

Kallinteris-Andreas commented Apr 8, 2024 • edited Loading

siddarth-c commented Apr 14, 2024 • edited Loading

Kallinteris-Andreas left a comment • edited Loading

Choose a reason for hiding this comment

Kallinteris-Andreas Apr 14, 2024

Choose a reason for hiding this comment

siddarth-c Apr 16, 2024

Choose a reason for hiding this comment

siddarth-c commented Apr 16, 2024

Kallinteris-Andreas commented Apr 16, 2024

Kallinteris-Andreas commented Apr 8, 2024 •

edited

Loading

siddarth-c commented Apr 14, 2024 •

edited

Loading

Kallinteris-Andreas left a comment •

edited

Loading