Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train baseline - memory consumption increases with iterations #157

Open
DaoudPiracha opened this issue Jul 22, 2019 · 4 comments
Open

train baseline - memory consumption increases with iterations #157

DaoudPiracha opened this issue Jul 22, 2019 · 4 comments

Comments

@DaoudPiracha
Copy link

Upon running the train_baseline command in python3, the workers seems to consume RAM memory with each iteration without properly freeing it up.

On some runs, I have seen a memory consumption increase of 40Mb/iterations, which when scaled to e.g 10000 iterations, becomes 400 GB memory. Since this is in RAM, it makes longer experiments impossible to run.

Additionally, please note that this seems to be different from object store in ray as upon termination, object store size was considerably small ~ 20 MB. However, each worker.__PolicyEvaluator() had ~2GB storage allocated with multiple such workers present.

@eugenevinitsky
Copy link
Owner

Oh, that's bad! Are you actually seeing this fail as a result? Normally I see a RAM increase but then eventually RLlib somehow clears it up. However, this seems like more of a RLlib issue than an issue with this library (I suspect). Would you mind reposting this in their github issues?

@DaoudPiracha
Copy link
Author

DaoudPiracha commented Jul 22, 2019

Yes. Unfortunately, it typically fails on most longer runs. I'll repost on the RLLib Github as well.

I'm currently getting this issue after simply cloning the current repo and running train_baseline.

@eugenevinitsky
Copy link
Owner

Hi, that's really good to know! Thank you for updating us on this. I'll examine it as well when I get a chance, but I'm suspicious it's an rllib issue rather than something on our end. I don't think there's any memory that's persisted across environment rollouts.

@DaoudPiracha
Copy link
Author

DaoudPiracha commented Jul 22, 2019

Sounds good. For now, could you possibly share your current environment/setup, where you have RLLib clearing up storage automatically. Possibly as a docker container?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants