Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pass the __dict__ item of a class __dict__ #291

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

pierreglaser
Copy link
Member

@pierreglaser pierreglaser commented Jun 13, 2019

Closes #282

@pierreglaser pierreglaser changed the title FIX pass the __dict__ item of a class __dict__ pass the __dict__ item of a class __dict__ Jun 13, 2019
@pierreglaser pierreglaser force-pushed the pass-__dict__-attribute branch from ca0c8c8 to cdd0aa7 Compare June 17, 2019 09:41
Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once rebased on top of #290, please run the downstream ci tests.

tests/cloudpickle_test.py Outdated Show resolved Hide resolved
@pierreglaser
Copy link
Member Author

pierreglaser commented Jun 17, 2019

Arg, this code is not safe with regards to reference cycles when __dict__ is not natively memoized by pickle...

@codecov
Copy link

codecov bot commented Jun 17, 2019

Codecov Report

Merging #291 into master will decrease coverage by 37%.
The diff coverage is 33.33%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #291       +/-   ##
===========================================
- Coverage   92.84%   55.83%   -37.01%     
===========================================
  Files           2        2               
  Lines         852      849        -3     
  Branches      177      176        -1     
===========================================
- Hits          791      474      -317     
- Misses         37      347      +310     
- Partials       24       28        +4
Impacted Files Coverage Δ
cloudpickle/cloudpickle_fast.py 0% <0%> (-95.16%) ⬇️
cloudpickle/cloudpickle.py 76.2% <100%> (-15.8%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2db58f1...ff598c7. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 17, 2019

Codecov Report

Merging #291 into master will increase coverage by 0.23%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #291      +/-   ##
==========================================
+ Coverage   92.84%   93.07%   +0.23%     
==========================================
  Files           2        2              
  Lines         852      852              
  Branches      177      177              
==========================================
+ Hits          791      793       +2     
+ Misses         37       36       -1     
+ Partials       24       23       -1
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 92% <100%> (ø) ⬆️
cloudpickle/cloudpickle_fast.py 96.03% <100%> (+0.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2db58f1...5914c7e. Read the comment docs.

@pierreglaser
Copy link
Member Author

Some updates:

now that __dict__ is part of type_kwargs, we must make sure it does not contain unsafe reference cycles with its class. An unsafe reference cycle will appear in the default case (when __dict__ is not overridden), because in this case, __dict__ is a getset_descriptor, that must be pickled by reference to its class, it is non-avoidable.

The guard

if not isinstance(obj, types.GetSetDescriptorType) and getattr(__dict__, '__objclass__', None), is not obj

serves the purpose of finding this default case.
I don't think there is any other way to generate an unsafe cyclic reference, but @ogrisel if you can think of an explicit pattern that could create infinite recursion, let me know.

Also, now that we special-case the situation where __dict__ is a getset_descripor, #290 and this PR are orthogonal. This PR would work without #290, my bad for the erroneous comments 😕 I still think #290 is a nice to have with very little codebase burden, but I won't mind if you'd better revert #290 from cloudpickle master.

@codecov
Copy link

codecov bot commented Jun 17, 2019

Codecov Report

Merging #291 into master will increase coverage by 0.23%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #291      +/-   ##
==========================================
+ Coverage   92.84%   93.07%   +0.23%     
==========================================
  Files           2        2              
  Lines         852      852              
  Branches      177      177              
==========================================
+ Hits          791      793       +2     
+ Misses         37       36       -1     
+ Partials       24       23       -1
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 92% <100%> (ø) ⬆️
cloudpickle/cloudpickle_fast.py 96.03% <100%> (+0.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2db58f1...62e1873. Read the comment docs.

@pierreglaser
Copy link
Member Author

The failure of distributed also appears on cloudpickle master (see #295 )

@ogrisel
Copy link
Contributor

ogrisel commented Jun 18, 2019

In #282 you write:

After #253 (comment), I started examinating these lines. I think the original use-case was to handle classes like namedtuple overloading dict.

Can you be more explicit? Do we have a test for what you describe?

cloudpickle/cloudpickle.py Outdated Show resolved Hide resolved
cloudpickle/cloudpickle.py Outdated Show resolved Hide resolved
@pierreglaser
Copy link
Member Author

pierreglaser commented Jun 18, 2019

Can you be more explicit? Do we have a test for what you describe?

In Python 2.7, the __dict__ attribute of namedtuple classes created like this:

from collections import namedtuple
namedtuple_class = namedtuple('MyTuple', ['a', 'b', 'c'])

is overriden so that instances of these classes:

t = namedtuple_class('foo', 'bar', 'qux')

have their __dict__ look like this:

>>> t.__dict__
OrderedDict([('a', 'foo'), ('b', 'bar'), ('c', 'qux')]) 
>>> type(namedtuple.__dict__['__dict__'])
property

(corresponding line in cpython/Lib/collections.py:
https://github.com/python/cpython/blob/1b57ab5c6478b93cf4150bd8c475022252776598/Lib/collections.py#L290)

Visibly, cloudpickle was aware of that use case

# If type overrides __dict__ as a property, include it in the type
# kwargs. In Python 2, we can't set this attribute after construction.
__dict__ = clsdict.pop('__dict__', None)
if isinstance(__dict__, property):
type_kwargs['__dict__'] = __dict__

It turns out that commenting the lines lines 648-649 does not break the test suite: there is no need to save "__dict__" in the case of namedtuples (the if isinstance(__dict__, property) check clearly looks for namedtuple classes) because the __dict__ property gets re-generated automatically at unpickling time when cloudpickle recreates the namedtuple class.

So IMO, those lines in cloudpickle do not serve any use-case. In addition, they are pretty hard to understand at first sight. So I wanted to clarify the way we handle the presence of a "__dict__" item in a class __dict__.

There are two ways we can do that:

  • popping the "__dict__" from the class __dict__ and do not try to save it. It will break dynamically created classes that override the __dict__ attribute of their instances. Admittedly, it rarely happens.
  • popping the "__dict__" from the class __dict__ and actually save it, which is somewhat tricky (hence the 20 lines of comments in this PR).

I am fine with both ways. I simply stumbled across those lines many times, and was always confused by their purpose/the use-case they served - this PR was an attempt to rationalize and clearly document them. Suggestions welcomed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloudpickle drops __dict__ attribute of classes.
2 participants