Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Dense Reward for Maze tasks #216

Closed
wants to merge 2 commits into from

Conversation

siddarth-c
Copy link

Description

Updated the dense reward of Maze environments from exp(-distance) to -distance

Fixes #175

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

Please attach before and after screenshots of the change if applicable.

image

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Apr 8, 2024

Have you validated that the new version works as expected
Also compare training performance across the 2 different versions

Thanks

@siddarth-c
Copy link
Author

siddarth-c commented Apr 14, 2024

Sorry for the wait. I've validated the updated reward function. Episodes of the new reward are, on average, 2.5 times shorter, with goals achieved in 50 timesteps compared to the previous 120. Meaning goals are reached much faster than wandering around the goal.

comparision.mp4

Copy link
Collaborator

@Kallinteris-Andreas Kallinteris-Andreas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more information to change the environment,
please provide:

  1. code tested (link the GitHub repo).
  2. provide graphs showing learning behavior, "steps to reach goal (/terminate)", "episodic returns", "distance of robot from the goal over time" and others you believe are applicable
  3. also test AntMaze

Thanks!

@@ -274,9 +274,9 @@ def add_xy_position_noise(self, xy_pos: np.ndarray) -> np.ndarray:
def compute_reward(
self, achieved_goal: np.ndarray, desired_goal: np.ndarray, info
) -> float:
distance = np.linalg.norm(achieved_goal - desired_goal, axis=-1)
distance = np.linalg.norm(achieved_goal - desired_goal, ord = 2, axis=-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was ord=2, added? That is the default behavior anyway

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, wanted to put it out explicitly.

@siddarth-c siddarth-c marked this pull request as draft April 16, 2024 03:20
@siddarth-c
Copy link
Author

Here is a repo with the code and a few plots for understanding the behaviour.
AntMaze did not learn in 1e6 time steps and I cant afford to run longer. But the difference in the behaviour is quite evident in PointMaze.

Thanks!

@Kallinteris-Andreas
Copy link
Collaborator

Your charts are wrong, for example episodic_return with "new reward" gets positive values, which is not possible as it is a sum of non positive values

also there is no indication on how many runs, were tested

@siddarth-c siddarth-c closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question] Maze Dense Reward
2 participants