-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO on continuous actions #77
Comments
Thanks for your suggestion. We actually plan to add continuous control PPO soon. |
Following up on this issue. Do you have an ETA for when this feature might be implemented? I would be interested in contributing to this if possible. |
Continuous action PPO is scheduled for sometime towards the end of the year. If you implement it yourself we would be delighted to accept the contribution! |
Awesome! Thanks for the opportunity! I will start working on this and add any development updates/questions to this issue. |
BTW, please don't forget to check |
Quick question, did you mean to close this issue, or does Pearl support PPO with continuous action spaces so it can be closed? |
Oops, sorry, I didn't mean to close the issue. Thanks for pointing it out. No, Pearl still does not support PPO with continuous action spaces. Thanks. |
Development UpdateI have spent the last few days better understanding Pearl and the different modules (replay buffer, policy learner, etc.). I also got PPO for discrete action spaces working in two Gymnasium environments (CartPole-v1 & LunarLander-v2). The implementation of PPO for continuous action spaces is coded, and I am currently troubleshooting some bugs. I plan to be wrapped up with this development in early September. Next Steps
Questions
|
Good to hear of your progress, @kuds.
Yes, please.
I could only find it being used in |
Development UpdateI finished working through the bugs for PPO in continuous action spaces. I am cleaning up my changes and adding new unit tests for the Next Steps
Questions
|
Hi Kuds, I think sum and mean both work if one uses optimizers that normalize the gradient such as Adam and RMSprop. But mean seems to be better if one uses SGD. Ideally, GAE normalization should be provided as an option and is applied in actor loss computation. Thanks for your work! |
I noticed that in the PPO agent initialization it forces the
is_action_continuous=False
whereas the PPO algorithm and other libraries implementing PPO allow continuous actions. Can this be added to Pearl as wellhttps://github.com/facebookresearch/Pearl/blob/main/pearl/policy_learners/sequential_decision_making/ppo.py#L99
The text was updated successfully, but these errors were encountered: