Skip to content
This repository has been archived by the owner on Apr 25, 2023. It is now read-only.

Question: Is this some form of reward engineering? #34

Open
WorksWellWithOthers opened this issue Dec 5, 2020 · 1 comment
Open

Question: Is this some form of reward engineering? #34

WorksWellWithOthers opened this issue Dec 5, 2020 · 1 comment

Comments

@WorksWellWithOthers
Copy link

WorksWellWithOthers commented Dec 5, 2020

This would break in environments that return the state as more/less than 4 values for unpacking.

  1. If not essential can we just remove this?
  2. If it's essential, would someone explain why and/or reference the paper for this?
    This seems specific to CartPole. I wasn't sure if the implementation's goal was to only solve CartPole.
r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8  
r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5  
reward = r1 + r2
@scprotz
Copy link

scprotz commented Feb 1, 2021

@WorksWellWithOthers This is indeed a form of reward engineering and is specific to CartPole to turn the returned state into a numeric reward. Other environments would not need this specifically, and potentially would return a distinct reward already.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants