-
Notifications
You must be signed in to change notification settings - Fork 1
PPO converges slowly #1
Comments
There's a related issue reported: ray-project/ray#8088 |
Using SAC instead (squashed Gaussian distribution) which resolves some of this, although the firing solutions still seem to converge slowly. What I also found was that there appear to be some dependencies on the Box bounds, which I haven't find documented: if the absolute value of an Also, I've found that, so far, during rollout the sampled actions range [0.0, 1.0] regardless of how I'd set the |
Thank you @matej-macak-qb I had a similar approach of needing to start with a simple env. Glad we're getting more observations and analysis pointing toward underlying issues. And I understand there's work scheduled on RLlib to try to resolve this. |
Also related: ray-project/ray#8218 |
Got a problem with RLlib, while training with a custom environment.
This uses a simple env where the action space is defined as a single parameter in the range of
[0.0, 60.0]
:The text was updated successfully, but these errors were encountered: