-
-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Dense Reward for Maze tasks #216
Conversation
Have you validated that the new version works as expected Thanks |
Sorry for the wait. I've validated the updated reward function. Episodes of the new reward are, on average, 2.5 times shorter, with goals achieved in 50 timesteps compared to the previous 120. Meaning goals are reached much faster than wandering around the goal. comparision.mp4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need more information to change the environment,
please provide:
- code tested (link the GitHub repo).
- provide graphs showing learning behavior, "steps to reach goal (/terminate)", "episodic returns", "distance of robot from the goal over time" and others you believe are applicable
- also test
AntMaze
Thanks!
@@ -274,9 +274,9 @@ def add_xy_position_noise(self, xy_pos: np.ndarray) -> np.ndarray: | |||
def compute_reward( | |||
self, achieved_goal: np.ndarray, desired_goal: np.ndarray, info | |||
) -> float: | |||
distance = np.linalg.norm(achieved_goal - desired_goal, axis=-1) | |||
distance = np.linalg.norm(achieved_goal - desired_goal, ord = 2, axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was ord=2,
added? That is the default behavior anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, wanted to put it out explicitly.
Here is a repo with the code and a few plots for understanding the behaviour. Thanks! |
Your charts are wrong, for example also there is no indication on how many runs, were tested |
Description
Updated the dense reward of Maze environments from exp(-distance) to -distance
Fixes #175
Type of change
Please delete options that are not relevant.
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)