Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(python): Test S3 functionality using moto server #10164

Merged
merged 5 commits into from
Sep 5, 2023

Conversation

cjackal
Copy link
Contributor

@cjackal cjackal commented Jul 29, 2023

As requested by @ritchie46 in #10008

@github-actions github-actions bot added internal An internal refactor or improvement python Related to Python Polars labels Jul 29, 2023
@cjackal cjackal changed the title test(python): [WIP] test scan_* functionality on s3 test(python): [WIP] test remote I/O functionality over s3 Jul 29, 2023
@cjackal cjackal marked this pull request as ready for review July 29, 2023 17:36
storage_options={"endpoint_url": f"http://{host}:{port}"},
)
assert df.columns == ["category", "calories", "fats_g", "sugars_g"]
assert df.collect().shape == (27, 4)
Copy link
Contributor Author

@cjackal cjackal Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems scan_* are yet to be working properly; tests succeeded on schema reading part, but failed on .collect() part. This is not moto's fault, scan_parquet currently fails on master too.

(try pl.scan_parquet("s3://saturn-public-data/nyc-taxi/data/yellow_tripdata_2019-01.parquet").collect() and you will see the same error)

@cjackal cjackal changed the title test(python): [WIP] test remote I/O functionality over s3 test(python): test remote I/O functionality over s3 Jul 29, 2023
Comment on lines 24 to 29
# Tooling
flask!=2.2.0,!=2.2.1 # Required for moto.server w/o installing all moto[server] dependencies
flask-cors # Required for moto.server w/o installing all moto[server] dependencies
hypothesis==6.82.0
maturin==1.1.0
moto[s3]==4.1.13 # Need moto.server to mock s3fs - see aio-libs/aiobotocore#755
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented here, we need moto.server, not only moto[s3]. But moto[server] is a total dependency mess due to cloudwatch support - especially it pins pydantic to ~1.8, breaking other tests. We only need S3 part of the server, a brainless pass of manually install flask is taken for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe elide this problem with a moto github action? https://github.com/getmoto/moto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought a bit and feels like I don't follow what you mean, do you mean launching moto server outside test venv (w/ moto github action) and removing moto dependencies from requirements-dev.txt? Doesn't it make test_cloud.py not running under local dev setup?

@cjackal
Copy link
Contributor Author

cjackal commented Jul 30, 2023

Our new unit tests in this PR already played its role in detecting #10174 🎊

@cjackal
Copy link
Contributor Author

cjackal commented Jul 31, 2023

New test suits now pass after #10175

@cjackal
Copy link
Contributor Author

cjackal commented Aug 1, 2023

A humble ping to reviewers @ritchie46 @stinodego @alexander-beedie that this PR is ready for review.

I was a little worried if this test is slow (perhaps too slow to linger overall CI workflow?), but it turns out to be negligible (a few seconds of launching a server is marginal within running all ~3000 tests)

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have left some comments.

py-polars/tests/unit/io/test_cloud.py Outdated Show resolved Hide resolved
Comment on lines 24 to 29
# Tooling
flask!=2.2.0,!=2.2.1 # Required for moto.server w/o installing all moto[server] dependencies
flask-cors # Required for moto.server w/o installing all moto[server] dependencies
hypothesis==6.82.0
maturin==1.1.0
moto[s3]==4.1.13 # Need moto.server to mock s3fs - see aio-libs/aiobotocore#755
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe elide this problem with a moto github action? https://github.com/getmoto/moto

@cjackal cjackal force-pushed the moto branch 8 times, most recently from 64e2d6a to 7577ccb Compare August 16, 2023 13:38
@ritchie46
Copy link
Member

Thanks a lot for this work @cjackal. I think this is very valuable. I want to leave the final review to @stinodego as he is our CI expert.

@stinodego
Copy link
Member

I'll test this out later today or tomorrow!

@cjackal
Copy link
Contributor Author

cjackal commented Sep 4, 2023

@stinodego Just a comment on latest nontrivial commit:
As we are running pytest --dist loadgroup, I grouped moto-related test suits as cloud tag so that they are running in a single worker. It would both speed up the test running time and avoid launching moto server multiple times (in case of multiple workers).

Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some minor refactoring - this should now be good to go!

Thanks for the effort @cjackal !

@stinodego stinodego changed the title test(python): test remote I/O functionality over s3 test(python): Test S3 functionality using moto server Sep 5, 2023
@stinodego stinodego merged commit 30e0be3 into pola-rs:main Sep 5, 2023
12 checks passed
@cjackal cjackal deleted the moto branch September 9, 2023 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants