Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

PPO converges slowly #1

Open
ceteri opened this issue Apr 27, 2020 · 5 comments
Open

PPO converges slowly #1

ceteri opened this issue Apr 27, 2020 · 5 comments
Assignees

Comments

@ceteri
Copy link
Contributor

ceteri commented Apr 27, 2020

Got a problem with RLlib, while training with a custom environment.
This uses a simple env where the action space is defined as a single parameter in the range of [0.0, 60.0]:

self.action_space = spaces.Box(np.float32(0.0), high, shape=(1,))```
When using OpenAI Gym to [run this environment](https://github.com/DerwenAI/gym_projectile/blob/master/example.py) through many steps, it seems to work correctly.

However, when using PPO to [train a policy](https://github.com/DerwenAI/gym_projectile/blob/c56d3ab15248c4767721fabb8e3731b0522b62cc/train.py#L21):

```import ray.rllib.agents.ppo as ppo
SELECT_ENV = "projectile-v0"

config = ppo.DEFAULT_CONFIG.copy()
agent = ppo.PPOTrainer(config, env=SELECT_ENV)

for _ in range(n_iter):
        result = agent.train()```
... then the action space -- as used by RLlib -- appears to stay very close to the `low` value in the Box, as long as the `low` value is zero.
In contrast, if the `low` value is non-zero, then that is always used and never varies.

For example, this is a simple physics simulation of projectile trajectories, and the action space is a `theta` angle, with an expected value of a projectile `range` in the observed space.  The problem that I'm seeing with RLlib is how `theta` never goes much above zero, and so the `range` also stays in the neighborhood of zero. Based on the Gym simulation, the median for `range` should be up in the thousands but RLlib seems to bias too low:

```(pid=57245) location: [2124, 1]
(pid=57245) location: [2124, 0]
(pid=57245) location: [2124, 27]
(pid=57245) location: [2124, 0]
(pid=57245) location: [2124, 74]
(pid=57245) location: [2124, 30]
(pid=57245) location: [2124, 74]
(pid=57245) location: [2124, 34]
(pid=57245) location: [2124, 67]
(pid=57245) location: [2124, 0]
(pid=57245) location: [2124, 0]
(pid=57245) location: [2124, 86]
(pid=57245) location: [2124, 0]```
Am I configuring the `agent.train()` part incorrectly?

I have noticed that RLlib use of Gym environments is *very* sensitive to odd and relatively undocumented edge cases, where RLlib's preprocessing will throw exceptions for what are otherwise valid configurations of action space and observation space.
@ceteri
Copy link
Contributor Author

ceteri commented Apr 27, 2020

There's a related issue reported: ray-project/ray#8088

@ceteri
Copy link
Contributor Author

ceteri commented Apr 27, 2020

Using SAC instead (squashed Gaussian distribution) which resolves some of this, although the firing solutions still seem to converge slowly.

What I also found was that there appear to be some dependencies on the Box bounds, which I haven't find documented: if the absolute value of an action_space Box bounds > 1.0 and the lower bound > 0.0 then also SAC still has the problem of pegging the action to its lower bounds.

Also, I've found that, so far, during rollout the sampled actions range [0.0, 1.0] regardless of how I'd set the action_space Box. Maybe I've omitted some required configuration for the rollout part? In any case, when I make the action space range [0.0, 1.0] this seems to behave properly for both training and rollout.

@ceteri ceteri self-assigned this Apr 27, 2020
@matej-macak-qb
Copy link

matej-macak-qb commented Apr 27, 2020

@ceteri in the case of continuous action_space bounded space in [0.0,1.0] there is still slow convergence as per the issue that I raised here #8088. I have tested this to be an issue in the case of Impala algorithm as well.

@ceteri
Copy link
Contributor Author

ceteri commented Apr 27, 2020

Thank you @matej-macak-qb
@sven1977 pointed me toward what you've researched, and that's much appreciated.

I had a similar approach of needing to start with a simple env.

Glad we're getting more observations and analysis pointing toward underlying issues. And I understand there's work scheduled on RLlib to try to resolve this.

@ceteri
Copy link
Contributor Author

ceteri commented Apr 30, 2020

Also related: ray-project/ray#8218

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants