-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve type hints for str vs bytes #72
base: master
Are you sure you want to change the base?
Conversation
lxml-stubs/etree.pyi
Outdated
_StrOrBytes = Union[str, bytes] | ||
_ValueType = Union[str, bytes, QName] | ||
_InputDictAnyStr = Dict[_StrOrBytes, _StrOrBytes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type aliases for parameters that allow str or bytes on both Python 2 and Python 3.
) -> _Element: ... | ||
def write( | ||
self, | ||
file: _FileSource, | ||
encoding: _AnyStr = ..., | ||
method: _AnyStr = ..., | ||
encoding: Optional[_StrOrBytes] = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be marked as Optional since None is allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, even the unicode
(or str
) type is allowed as value. Although passing the string name "unicode"
is probably a better idea, so your proposal seems sufficient. In fact, I'd rather see this deprecated and accept only strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike the global tostring
function this only accepts unicode or bytes.
elm_tree = etree.fromstring("<test/>").getroottree()
print(type(elm_tree))
with open("test.xml", "wb") as out_file:
elm_tree.write(out_file, encoding = str)
Traceback (most recent call last):
File "C:\Source\p5\taccs\unicode_xml.py", line 82, in <module>
elm_tree.write(out_file, encoding = str)
File "src/lxml/etree.pyx", line 2055, in lxml.etree._ElementTree.write
TypeError: unbound method str.upper() needs an argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's start the deprecation by making sure users pass only strings.
pretty_print: bool = ..., | ||
xml_declaration: Any = ..., | ||
with_tail: Any = ..., | ||
standalone: bool = ..., | ||
doctype: _StrOrBytes = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated for new parameters.
7736c43
to
2851617
Compare
@scoder Who should I contact to get reviews of this pull request? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's a good improvement.
I like the idea of separating input and output declarations.
https://en.wikipedia.org/wiki/Robustness_principle
) -> _Element: ... | ||
def write( | ||
self, | ||
file: _FileSource, | ||
encoding: _AnyStr = ..., | ||
method: _AnyStr = ..., | ||
encoding: Optional[_StrOrBytes] = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, even the unicode
(or str
) type is allowed as value. Although passing the string name "unicode"
is probably a better idea, so your proposal seems sufficient. In fact, I'd rather see this deprecated and accept only strings.
lxml-stubs/etree.pyi
Outdated
encoding: _AnyStr = ..., | ||
method: _AnyStr = ..., | ||
encoding: Optional[_StrOrBytes] = ..., | ||
method: str = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be unicode
in Py2, essentially the same as encoding
above (but not None).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, changed to _StrOrBytes
. Also updated the tostring
overrides.
lxml-stubs/etree.pyi
Outdated
compression: int = ..., | ||
exclusive: bool = ..., | ||
inclusive_ns_prefixes: List[Union[str, bytes]] = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this different from the declaration below in write_c14n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to be Iterable[_StrOrBytes]
like write_c14n
.
def __setitem__(self, key: _TagName, value: _ValueType) -> None: ... | ||
def __delitem__(self, key: _TagName) -> None: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good catch.
_ListAnyStr = Union[List[str], List[bytes]] | ||
_DictAnyStr = Union[Dict[str, str], Dict[bytes, bytes]] | ||
_StrOrBytes = Union[str, bytes] | ||
_ValueType = Union[str, bytes, QName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a very generic and unclear name. Isn't it just the same as _TagName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct it has the same type as _TagName
but it felt odd using _TagName
for both parameters below. I have no problem changing it if you prefer. We could also change this to _ValueType = _TagName
so that we maintain readability.
def set(self, key: _TagName, value: _ValueType) -> None: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that both are different for the _Attrib
dict (and .get()
/.set()
), but the first really is a _TagName
.
lxml-stubs/etree.pyi
Outdated
def update( | ||
self, | ||
sequence_or_dict: Union[ | ||
_Attrib, Mapping[_AnyStr, _AnyStr], Sequence[Tuple[_AnyStr, _AnyStr]] | ||
_Attrib, Mapping[_ValueType, _ValueType], Sequence[Tuple[_ValueType, _ValueType]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key and value are not quite the same here, I think. The key would probably accept a QName
, i.e. should be _TagName
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use _TagName
for the key and first tuple value.
@@ -409,7 +429,7 @@ class XMLParser(_FeedParser): | |||
class HTMLParser(_FeedParser): | |||
def __init__( | |||
self, | |||
encoding: Optional[_AnyStr] = ..., | |||
encoding: Optional[_StrOrBytes] = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this appears quite a few times, it seems worth another alias, e.g. _Identifier
.
lxml-stubs/etree.pyi
Outdated
def end(self, tag: _StrOrBytes) -> None: ... | ||
def pi(self, target: _StrOrBytes, data: Optional[_StrOrBytes] = ...) -> Any: ... | ||
def start(self, tag: _StrOrBytes, attrib: Dict[_StrOrBytes, _StrOrBytes]) -> None: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth using _TagName
for the tag
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated start
and end
to take tag: _TagName
28132d4
to
80e9ad4
Compare
80e9ad4
to
1ebe412
Compare
@scoder Could you take another look at this, I believe I have handled most issues but there are a few outstanding questions. |
Triggering a fresh test run |
Triggering a fresh test run |
I've fixed the dependency induced test failures in the master branch, but this PR still has a couple of own failures. Could you please see if you can resolve them? |
No description provided.