Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas Ran out of memory again! #69

Open
saulshanabrook opened this issue Aug 18, 2020 · 4 comments
Open

Pandas Ran out of memory again! #69

saulshanabrook opened this issue Aug 18, 2020 · 4 comments
Labels
bug Something isn't working

Comments

@saulshanabrook
Copy link
Contributor

So the pandas test suite ran out of memory again in Kubernetes. It used up ~13Gb and then was killed, because the pods only have that much available.

I am a bit hesitant to just raise the pod memory limit again... If anyone knows if this is a reasonable amount of memory for Pandas to use when testing (cc @datapythonista), that would be helpful! It's also possible that the tracing has some sort of memory leak which is blowing things up for pandas, although all the other test suites don't seem to have the same problem.

Maybe I can run Pandas test suites with some flags to ignore some high memory tests? These are my current ones:

CMD [ "pytest", "pandas", "--skip-slow", "--skip-network", "--skip-db", "-m", "not single", "-r", "sxX", "--strict", "--suppress-tests-failed-exit-code" ]

I copied it from the test-fast script, or whatever that is, in the Pandas repo.

@saulshanabrook saulshanabrook added the bug Something isn't working label Aug 18, 2020
@datapythonista
Copy link
Member

That's strange. There is a flag --run-high-memory to run the high memory tests, which by default are not run. I'm not sure how much memory it requires, but surely not 13Gb.

I'm not sure if test-fast is updated, I'd probably use the settings in the CI ci/azure/posix.yml.

pytest -m "not slow and not network and not clipboard" pandas is what I'd use. I don't think that should be very different from what you've got, but you can give it a try, just in case.

I'm not sure if we've got many more markers you can play with, but if you uninstall optional libraries you'll be running less tests. You can set up an environment with just numpy, dateutil and pytz and give it a try, this should run the core tests only. If you're wondering, this approach makes tests very unreliable, and sometimes have problems of tests being silently skipped. I wouldn't recommend it, but it is how it works now in pandas.

@saulshanabrook
Copy link
Contributor Author

I'm not sure if we've got many more markers you can play with, but if you uninstall optional libraries you'll be running less tests. You can set up an environment with just numpy, dateutil and pytz and give it a try, this should run the core tests only.

Interesting, I will give that a go. Just those three? Is that documented anywhere or used anywhere, or just something you try locally when you are running less tests?

@jorisvandenbossche
Copy link
Member

Just those three? Is that documented anywhere or used anywhere, or just something you try locally when you are running less tests?

Those three are documented to be the minimal required dependencies (eg https://pandas.pydata.org/docs/dev/getting_started/install.html#dependencies), and since we automatically skip tests for optional dependencies (not something that is explicitly documented I think), having only those installed is the way to run only tests that don't rely on any optional dependency.

@saulshanabrook
Copy link
Contributor Author

saulshanabrook commented Nov 1, 2020

Still happening :( It is now exceeding 14 gb... #94 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants