This repository has been archived by the owner on Aug 7, 2024. It is now read-only.
Update dependency lxml to v4.9.1 [SECURITY] #167
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==4.4.1
->==4.9.1
GitHub Vulnerability Alerts
CVE-2020-27783
A XSS vulnerability was discovered in python-lxml's clean module. The module's parser didn't properly imitate browsers, which caused different behaviors between the sanitizer and the user's page. A remote attacker could exploit this flaw to run arbitrary HTML/JS code.
CVE-2021-28957
An XSS vulnerability was discovered in the python
lxml
clean module versions before 4.6.3. When disablingthe safe_attrs_only
andforms
arguments, theCleaner
class does not remove theformaction
attribute allowing for JS to bypass the sanitizer. A remote attacker could exploit this flaw to run arbitrary JS code on users who interact with incorrectly sanitized HTML. This issue is patched inlxml
4.6.3.CVE-2021-43818
Impact
The HTML Cleaner in lxml.html lets certain crafted script content pass through, as well as script content in SVG files embedded using data URIs.
Users that employ the HTML cleaner in a security relevant context should upgrade to lxml 4.6.5.
Patches
The issue has been resolved in lxml 4.6.5.
Workarounds
None.
References
The issues are tracked under the report IDs GHSL-2021-1037 and GHSL-2021-1038.
CVE-2022-2309
NULL Pointer Dereference allows attackers to cause a denial of service (or application crash). This only applies when lxml is used together with libxml2 2.9.10 through 2.9.14. libxml2 2.9.9 and earlier are not affected. It allows triggering crashes through forged input data, given a vulnerable code sequence in the application. The vulnerability is caused by the iterwalk function (also used by the canonicalize function). Such code shouldn't be in wide-spread use, given that parsing + iterwalk would usually be replaced with the more efficient iterparse function. However, an XML converter that serialises to C14N would also be vulnerable, for example, and there are legitimate use cases for this code sequence. If untrusted input is received (also remotely) and processed via iterwalk function, a crash can be triggered.
Release Notes
lxml/lxml (lxml)
v4.9.1
Compare Source
==================
Bugs fixed
iterwalk()
(orcanonicalize()
)after parsing certain incorrect input. Note that
iterwalk()
can crashon valid input parsed with the same parser after failing to parse the
incorrect input.
v4.9.0
Compare Source
==================
Bugs fixed
lxml.html
was corrected.Patch by xmo-odoo.
Other changes
Built with Cython 0.29.30 to adapt to changes in Python 3.11 and 3.12.
Wheels include zlib 1.2.12, libxml2 2.9.14 and libxslt 1.1.35
(libxml2 2.9.12+ and libxslt 1.1.34 on Windows).
GH#343: Windows-AArch64 build support in Visual Studio.
Patch by Steve Dower.
v4.8.0
Compare Source
==================
Features added
GH#337: Path-like objects are now supported throughout the API instead of just strings.
Patch by Henning Janssen.
The
ElementMaker
now supportsQName
values as tags, which always overridethe default namespace of the factory.
Bugs fixed
lower case, whereas XML Schema datatypes define them as "NaN" and "INF" respectively.
Patch by Tobias Deiminger.
Other changes
v4.7.1
Compare Source
==================
Features added
parser.feed()
now encodes the input datato the native UTF-8 encoding directly, instead of going through
Py_UNICODE
/wchar_t
encoding first, which previously required duplicate recoding in most cases.Bugs fixed
The standard namespace prefixes were mishandled during "C14N2" serialisation on Python 3.
See https://mail.python.org/archives/list/[email protected]/thread/6ZFBHFOVHOS5GFDOAMPCT6HM5HZPWQ4Q/
lxml.objectify
previously accepted non-XML numbers with underscores (like "1_000")as integers or float values in Python 3.6 and later. It now adheres to the number
format of the XML spec again.
LP#1939031: Static wheels of lxml now contain the header files of zlib and libiconv
(in addition to the already provided headers of libxml2/libxslt/libexslt).
Other changes
v4.6.5
Compare Source
==================
Bugs fixed
A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script
content through SVG images (CVE-2021-43818).
A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script
content through CSS imports and other crafted constructs (CVE-2021-43818).
v4.6.4
Compare Source
==================
Features added
GH#317: A new property
system_url
was added to DTD entities.Patch by Thirdegree.
GH#314: The
STATIC_*
variables insetup.py
can now be passed via env vars.Patch by Isaac Jurado.
v4.6.3
Compare Source
==================
Bugs fixed
which allowed JavaScript to pass through. The cleaner now removes the HTML5
formaction
attribute.v4.6.2
Compare Source
==================
Bugs fixed
which allowed JavaScript to pass through. The cleaner now removes more sneaky
"style" content.
v4.6.1
Compare Source
==================
Bugs fixed
JavaScript to pass through. The cleaner now removes more sneaky "style" content.
v4.6.0
Compare Source
==================
Features added
GH#310:
lxml.html.InputGetter
supports__len__()
to count the number of input fields.Patch by Aidan Woolley.
lxml.html.InputGetter
has a new.items()
method to ease processing all input fields.lxml.html.InputGetter.keys()
now returns the field names in document order.GH-309: The API documentation is now generated using
sphinx-apidoc
.Patch by Chris Mayo.
Bugs fixed
LP#1869455: C14N 2.0 serialisation failed for unprefixed attributes
when a default namespace was defined.
TreeBuilder.close()
raisedAssertionError
in some error cases where itshould have raised
XMLSyntaxError
. It now raises a combined exception tokeep up backwards compatibility, while switching to
XMLSyntaxError
as aninterface.
v4.5.2
Compare Source
==================
Bugs fixed
Cleaner()
now validates that only known configuration options can be set.LP#1882606:
Cleaner.clean_html()
discarded comments and PIs regardless of thecorresponding configuration option, if
remove_unknown_tags
was set.LP#1880251: Instead of globally overwriting the document loader in libxml2, lxml now
sets it per parser run, which improves the interoperability with other users of libxml2
such as libxmlsec.
LP#1881960: Fix build in CPython 3.10 by using Cython 0.29.21.
The setup options "--with-xml2-config" and "--with-xslt-config" were accidentally renamed
to "--xml2-config" and "--xslt-config" in 4.5.1 and are now available again.
v4.5.1
Compare Source
==================
Bugs fixed
LP#1570388: Fix failures when serialising documents larger than 2GB in some cases.
LP#1865141, GH#298:
QName
values were not accepted by theel.iter()
method.Patch by xmo-odoo.
LP#1863413, GH#297: The build failed to detect libraries on Linux that are only
configured via pkg-config.
Patch by Hugh McMaster.
v4.5.0
Compare Source
==================
Features added
indent()
was added to insert tail whitespace for pretty-printingan XML tree.
Bugs fixed
deletion disappeared silently instead of sticking with the node that was removed.
Other changes
MacOS builds are 64-bit-only by default.
Set CFLAGS and LDFLAGS explicitly to override it.
Linux/MacOS Binary wheels now use libxml2 2.9.10 and libxslt 1.1.34.
LP#1840234: The package version number is now available as
lxml.__version__
.v4.4.3
Compare Source
==================
Bugs fixed
itertext()
was missing tail text of comments and PIs since 4.4.0.v4.4.2
Compare Source
==================
Bugs fixed
ElementInclude
incorrectly rejected repeated non-recursiveincludes as recursive.
Patch by Rainer Hausdorf.
Configuration
📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.