Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-125038: Iterator checks are added for some FOR_ITER bytecodes, crash fixed #125051

Closed

Conversation

efimov-mikhail
Copy link
Contributor

@efimov-mikhail efimov-mikhail commented Oct 7, 2024

SIGSEGV on generators in case of gi_frame.f_locals is fixed.

This applies to _FOR_ITER bytecode implementation.
Similar checks are added to _FOR_ITER_TIER_TWO and INSTRUMENTED_FOR_ITER bytecode implementations.

@@ -2804,7 +2804,10 @@ dummy_func(
replaced op(_FOR_ITER, (iter -- iter, next)) {
/* before: [iter]; after: [iter, iter()] *or* [] (and jump over END_FOR.) */
PyObject *iter_o = PyStackRef_AsPyObjectBorrow(iter);
PyObject *next_o = (*Py_TYPE(iter_o)->tp_iternext)(iter_o);
PyObject *next_o = NULL;
if (PyIter_Check(iter_o)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling PyIter_Check feels a little wasteful. Instead, I'd assign (*Py_TYPE(iter_o)->tp_iternext) to a variable and check if it's NULL. If it is NULL, we should raise a TypeError. Your code instead makes it immediately end the loop, which doesn't feel like the right behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I've tried to add a TypeError
Such piece of code is provided:

            PyTypeObject *type = Py_TYPE(iter_o);
            iternextfunc iternext = type->tp_iternext;
            if (iternext == NULL) {
                _PyErr_Format(tstate, PyExc_TypeError,
                              "'for' requires an object with "
                              "__iter__ method, got %.100s",
                              type->tp_name);
                DECREF_INPUTS();
                ERROR_IF(true, error);
            }
            PyObject *next_o = (*iternext)(iter_o);

And there is a temporarily result of my test case:

-> % ./python -m unittest -v test.test_generators.GeneratorTest.test_issue125038
test_issue125038 (test.test_generators.GeneratorTest.test_issue125038) ... ERROR

======================================================================
ERROR: test_issue125038 (test.test_generators.GeneratorTest.test_issue125038)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mikhail.efimov/projects/cpython/Lib/test/test_generators.py", line 274, in test_issue125038
    l = list(g)
  File "/home/mikhail.efimov/projects/cpython/Lib/test/test_generators.py", line 272, in <genexpr>
    g = (x for x in range(10))
                    ~~~~~^^^^
TypeError: 'for' requires an object with __iter__ method, got range

----------------------------------------------------------------------
Ran 1 test in 0.001s

It seems like such a message doesn't provide any clarity.
Do you have any better suggestions about error message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a pretty good error message, much better than silently stopping the loop.

@efimov-mikhail efimov-mikhail changed the title gh-125038: PyIter_Checks are added for some FOR_ITER bytecodes gh-125038: Iterator checks are added for some FOR_ITER bytecodes, crash fixed Oct 7, 2024
except TypeError:
return "TypeError"

# This should not raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want with self.assertRaisesRegex.

@gaogaotiantian
Copy link
Member

gaogaotiantian commented Oct 7, 2024

We should confirm with @markshannon whether we should fix the interpreter or f_locals. Maybe it's a valid assumption? PEP 667 does not implement everything dict-like for FrameLocalsProxy, but maybe this is necessary? We are adding a small overhead to a very commonly used instruction if I understood it correctly.

@JelleZijlstra
Copy link
Member

I looked at PEP 668 and it seems to assume that mutating f_locals is supported, which makes me think that we need to add this check. I wouldn't personally mind blocking the f_locals mutation instead if you can find a principled way to do it.

We are adding a small overhead to a very commonly used instruction if I understood it correctly.

Yes, that's right; we're adding a NULL check and a branch. However, FOR_ITER has a couple of specialized versions, so the overhead should be minimal when iterating over the most common types. Still, it may be worth benchmarking how bad the overhead is.

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me, but will leave to Mark to see if there's a better solution.

@JelleZijlstra JelleZijlstra added the needs backport to 3.13 bugs and security fixes label Oct 8, 2024
…e-125038.ffSLCz.rst


News improvement

Co-authored-by: Jelle Zijlstra <[email protected]>
@efimov-mikhail
Copy link
Contributor Author

Thanks for your advice, it definitely looks better now.

But I have a question.
Is it possible to make the piece of code in the original issue correct?
Why we can't assign different value to the iterator object directly through the frame's f_locals?
I can't see something like this in PEP 667.

@efimov-mikhail
Copy link
Contributor Author

efimov-mikhail commented Oct 8, 2024

It seems that this code is incorrect now:

g = (x for x in range(10))
g.gi_frame.f_locals['.0'] = range(20)

But very similar code is correct:

g = (x for x in range(10))
g.gi_frame.f_locals['.0'] = iter(range(20))

Maybe, some change in framelocalsproxy_setitem function can be provided?
Something like this:

if (_PyUnicode_EqualToASCIIString(key, ".0")) {
    PyObject *it = PyObject_GetIter(value);
    if (it != NULL) {
        value = it;
    }
} 

I've provided this change, but it breaks two tests: test_proxy_key_stringlikes_overwrite and test_proxy_key_stringlikes_ftrst_write. Those tests are authored by @encukou and @ncoghlan.

@JelleZijlstra
Copy link
Member

Setting .0 to a iter(range(20)) should work for now, yes. I don't know what you're trying to use this for at a higher level, but I'd recommend trying to find a solution that doesn't involve mutating f_locals; that's a low-level interface that should probably only be used by tools like debuggers.

I would oppose adding special handling in the FrameLocalsProxy that mutates objects; doing so would complicate the implementation and make it less useful for tools like debuggers.

@efimov-mikhail
Copy link
Contributor Author

Actually, there is no real world use case when changing underlying iterator for a generator is needed.
And I understand that f_locals is a low-level interface.

Main purpose of my questions is achieving some clarity.
That's totally okay if the current code behavior is desired.
Maybe just some piece of docs about this use case should be provided.
And, of course, raising the appropriate exceptions is much better than crashing.

@efimov-mikhail
Copy link
Contributor Author

I've moved my test from test_generators.py to test_frame.py, renamed it, and added more test cases to emphasize current code behavior and save it explicitly in tests.
IMHO, there is still a room to improvement.

savannahostrowski and others added 10 commits October 9, 2024 13:08
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
…008 implementation issues (python#125151)

Skip test_fma_zero_result on NetBSD due to IEE 754-2008 implementation issues
…124974)

Now it returns a tuple of up to 100 strings (an empty tuple on most locales).
Previously it returned the first item of that tuple or an empty string.
…pire far in the future by default (pythonGH-107594)

This allows testing Y2038 with system time set to after that,
so that actual Y2038 issues can be exposed, and not masked
by expired certificate errors.

Signed-off-by: Alexander Kanavin <[email protected]>
@efimov-mikhail
Copy link
Contributor Author

See #125178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.