Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Maze Dense Reward #175

Open
llewynS opened this issue Sep 11, 2023 · 3 comments
Open

[Question] Maze Dense Reward #175

llewynS opened this issue Sep 11, 2023 · 3 comments
Labels
good first issue Good for newcomers

Comments

@llewynS
Copy link

llewynS commented Sep 11, 2023

Question

Looking at the dense reward function for Maze Env:

return np.exp(-np.linalg.norm(desired_goal - achieved_goal))

The agent seems to prefer sitting the ball as close as possible to the goal without touching it after optimisation.

This makes sense given there is no bonus for reaching the reward and the reward is positive for all time steps.

Why is the dense reward formulated this way?

@llewynS llewynS changed the title [Question] Question title [Question] Maze Dense Reward Sep 11, 2023
@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Dec 30, 2023

  1. Are you using continuing_task=True (which is the default)?
  2. Are you resetting about termination=True?
  3. Have you experimented with other reward functions?

@onnoeberhard
Copy link

Somewhat related: the description of the maze environments says the returned reward is the negative Euclidean distance between the achieved goal position and the desired goal. This is wrong (it is the exponential of the negative distance).

@Kallinteris-Andreas
Copy link
Collaborator

@onnoeberhard

def compute_reward(
self, achieved_goal: np.ndarray, desired_goal: np.ndarray, info
) -> float:
distance = np.linalg.norm(achieved_goal - desired_goal, axis=-1)
if self.reward_type == "dense":
return np.exp(-distance)
elif self.reward_type == "sparse":
return (distance <= 0.45).astype(np.float64)

You are correct, can you make a PR to fix it?
You can use the Gymnasium/MuJoCo as reference https://gymnasium.farama.org/main/environments/mujoco/ant/#rewards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants