Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): show inefficient apply warning in ipython #10312

Merged
merged 5 commits into from
Aug 6, 2023

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Aug 5, 2023

fixes the bug reported here #9968 (comment)

Essentially, the bytecode is slightly different between python and ipython 😧 I think many, many polars users will be running jupyter notebooks, so it's probably important to get this right

The hardest part was coming up with tests which fail on main

Gist of the test changes:

  • introduce tests in test_udfs which runs ipython in a subprocess and checks the printed output
  • modify TEST_CASES so that the function is a string, rather than a function, so it can be used in the above (and change func to eval(func) in the existing tests)

The only source code change is:

-    LOAD_ATTR = {"LOAD_ATTR"} if _MIN_PY311 else {"LOAD_METHOD"}
+    LOAD_ATTR = {"LOAD_ATTR", "LOAD_METHOD"} if _MIN_PY311 else {"LOAD_METHOD"}

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Aug 5, 2023
@MarcoGorelli MarcoGorelli marked this pull request as ready for review August 5, 2023 14:41
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 5, 2023

Super-clean fix; great pick-up! With our current usage the expansion of LOAD_ATTR matching will be fine, even if not in IPython, but if there's a (cheap!) way to constrain it, should we?

I think many, many polars users will be running jupyter notebooks

Unquestionably :)

@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented Aug 5, 2023

good idea, thanks! yup, can just check if get_ipython throws an exception

Copy link
Collaborator

@alexander-beedie alexander-beedie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, looks like something needs a fix-up after adding the IPython check?

@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented Aug 5, 2023

oh wow, check this out:

running in python repl, py311: the third element is LOAD_METHOD

>>> pprint.pprint(list(dis.get_instructions(lambda x: np.sin(x)+1)))
[Instruction(opname='RESUME', opcode=151, arg=0, argval=0, argrepr='', offset=0, starts_line=1, is_jump_target=False, positions=Positions(lineno=1, end_lineno=1, col_offset=0, end_col_offset=0)),
 Instruction(opname='LOAD_GLOBAL', opcode=116, arg=0, argval='np', argrepr='np', offset=2, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=1, col_offset=50, end_col_offset=52)),
 Instruction(opname='LOAD_METHOD', opcode=160, arg=1, argval='sin', argrepr='sin', offset=14, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=1, col_offset=50, end_col_offset=56)),
 Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=36, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=1, col_offset=57, end_col_offset=58)),
 Instruction(opname='PRECALL', opcode=166, arg=1, argval=1, argrepr='', offset=38, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=1, col_offset=50, end_col_offset=59)),
 Instruction(opname='CALL', opcode=171, arg=1, argval=1, argrepr='', offset=42, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=1, col_offset=50, end_col_offset=59)),
 Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=1, argrepr='1', offset=52, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=1, col_offset=60, end_col_offset=61)),
 Instruction(opname='BINARY_OP', opcode=122, arg=0, argval=0, argrepr='+', offset=54, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=1, col_offset=50, end_col_offset=61)),
 Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=58, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=1, col_offset=0, end_col_offset=0))]
>

running from script, note that now, the third element is LOAD_ATTR!:

(.311venv) marcogorelli@DESKTOP-U8OKFP3:~/tmp$ cat t.py
import numpy as np
import dis
import pprint

pprint.pprint(list(dis.get_instructions(lambda x: np.sin(x)+1)))

(.311venv) marcogorelli@DESKTOP-U8OKFP3:~/tmp$ python t.py
[Instruction(opname='RESUME', opcode=151, arg=0, argval=0, argrepr='', offset=0, starts_line=5, is_jump_target=False, positions=Positions(lineno=5, end_lineno=5, col_offset=0, end_col_offset=0)),
 Instruction(opname='LOAD_GLOBAL', opcode=116, arg=1, argval='np', argrepr='NULL + np', offset=2, starts_line=None, is_jump_target=False, positions=Positions(lineno=5, end_lineno=5, col_offset=50, end_col_offset=52)),
 Instruction(opname='LOAD_ATTR', opcode=106, arg=1, argval='sin', argrepr='sin', offset=14, starts_line=None, is_jump_target=False, positions=Positions(lineno=5, end_lineno=5, col_offset=50, end_col_offset=56)),
 Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=24, starts_line=None, is_jump_target=False, positions=Positions(lineno=5, end_lineno=5, col_offset=57, end_col_offset=58)),
 Instruction(opname='PRECALL', opcode=166, arg=1, argval=1, argrepr='', offset=26, starts_line=None, is_jump_target=False, positions=Positions(lineno=5, end_lineno=5, col_offset=50, end_col_offset=59)),
 Instruction(opname='CALL', opcode=171, arg=1, argval=1, argrepr='', offset=30, starts_line=None, is_jump_target=False, positions=Positions(lineno=5, end_lineno=5, col_offset=50, end_col_offset=59)),
 Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=1, argrepr='1', offset=40, starts_line=None, is_jump_target=False, positions=Positions(lineno=5, end_lineno=5, col_offset=60, end_col_offset=61)),
 Instruction(opname='BINARY_OP', opcode=122, arg=0, argval=0, argrepr='+', offset=42, starts_line=None, is_jump_target=False, positions=Positions(lineno=5, end_lineno=5, col_offset=50, end_col_offset=61)),
 Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=46, starts_line=None, is_jump_target=False, positions=Positions(lineno=5, end_lineno=5, col_offset=0, end_col_offset=0))]

But before Python3.11, it seems it was consistently LOAD_METHOD

I think we should probably match on both (LOAD_METHOD and LOAD_ATTR) for py>=311, right?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 6, 2023

I think we should probably match on both (LOAD_METHOD and LOAD_ATTR) for py>=311, right?

Yup, let's do it - it's definitely safe the way the code is currently put together, and not limiting it to IPython will cover our bases (as this can clearly happen elsewhere 🤔)

@alexander-beedie alexander-beedie merged commit 19e622d into pola-rs:main Aug 6, 2023
12 checks passed
@henryharbeck
Copy link
Contributor

Hi @MarcoGorelli, sorry to be commenting on a closed PR.

Not 100% sure this is a problem, but just wanted to draw to your attention.

I can see that this PR adds a dependency on ipython in py-polars/tests/test_udfs.py, but it is not currently a dev requirement.
Should ipython be added to the dev requirements?
Or is it probably okay as is, because this file (as I understand it) is standalone and not meant to run on the main branch (as per the CI config)?

From a "clean" environment, the described usage in py-polars/tests/test_udfs.py won't work due to ipython not being installed.

Steps to reproduce

# assumes the starting dir is py-polars
deactivate # if applicable
make clean
make requirements
source .venv/bin/activate
PYTHONPATH=polars/utils pytest tests/test_udfs.py # usage described in file
# 39 failed, 40 passed - the failed tests are the ones ending in `in_ipython`

and then when I do the below, all tests pass

pip install ipython
PYTHONPATH=polars/utils pytest tests/test_udfs.py

Came to my attention as I have (finally) made a start on adding more numpy funcs.

@MarcoGorelli
Copy link
Collaborator Author

thanks for noticing this - yeah might be good to add it to the dev requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants