Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation #1

Merged
merged 28 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
788dbe2
Initial implementation and docs
Gallaecio Jul 1, 2024
612dd72
Remove the AI mention from the docs
Gallaecio Jul 2, 2024
83d0200
Support formaction and formmethod, and raise NotImplementedError for …
Gallaecio Jul 2, 2024
564f5fc
docs/conf.py: remove leftover
Gallaecio Jul 2, 2024
d4d0b61
Use from None to hide internal exception
Gallaecio Jul 2, 2024
c227a4d
Solve issues reported by CI
Gallaecio Jul 2, 2024
29d1f6f
Solve additional CI issues
Gallaecio Jul 2, 2024
ec0a7c9
Add doctest to GitHub Actions
Gallaecio Jul 2, 2024
3475f17
Complete test coverage
Gallaecio Jul 2, 2024
ba3eb10
Install pytest for mypy
Gallaecio Jul 2, 2024
991ab50
Run pre-commit
Gallaecio Jul 2, 2024
e62c738
Allow method override
Gallaecio Jul 2, 2024
f37e269
Update docs/usage.rst
Gallaecio Jul 3, 2024
51e733f
request_from_form → form2request
Gallaecio Jul 3, 2024
0d0ded7
Add parsel support
Gallaecio Jul 3, 2024
9830c9f
Only raise NotImplementedError for the dialog method
Gallaecio Jul 3, 2024
0337a40
Do not make form and data position-only
Gallaecio Jul 3, 2024
e098d54
Support text/plain enctype, only raise NotImplementedError for mutipa…
Gallaecio Jul 3, 2024
1907424
Remove cast usages
Gallaecio Jul 3, 2024
259ddde
Allow overriding enctype
Gallaecio Jul 3, 2024
e54ae78
Shorten attribute override docs
Gallaecio Jul 3, 2024
6974d95
Minor refactoring
Gallaecio Jul 3, 2024
ba5da92
Update exception messages and test expectations after adding multipar…
Gallaecio Jul 3, 2024
4d6e25d
Cover method and enctype in the docstring
Gallaecio Jul 3, 2024
f9ec401
Improve the docstring
Gallaecio Jul 3, 2024
864a7e3
Minor doc improvements
Gallaecio Jul 12, 2024
aa5ec68
Fix typo (of → or)
Gallaecio Jul 12, 2024
689d7ee
Clarify a test scenario
Gallaecio Jul 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .bandit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
skips:
- B101 # assert_used, needed for mypy
exclude_dirs: ['tests']
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[report]
exclude_lines =
if TYPE_CHECKING:
4 changes: 4 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
[flake8]
extend-select = TC, TC1
ignore =
max-line-length = 88
per-file-ignores =
# F401: Imported but unused
form2request/__init__.py:F401
# D100-D104: Missing docstring
docs/conf.py:D100
tests/__init__.py:D104
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
fail-fast: false
matrix:
python-version: ['3.12']
tox-job: ["pre-commit", "mypy", "docs", "twinecheck"]
tox-job: ["pre-commit", "mypy", "docs", "doctest", "twinecheck"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
/.coverage
/coverage.xml
/dist/
/.tox/
12 changes: 12 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,20 @@ repos:
- flake8-debugger
- flake8-docstrings
- flake8-string-format
- flake8-type-checking
- repo: https://github.com/asottile/pyupgrade
rev: v3.16.0
hooks:
- id: pyupgrade
args: [--py38-plus]
- repo: https://github.com/pycqa/bandit
rev: 1.7.9
hooks:
- id: bandit
args: [-r, -c, .bandit.yml]
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.18.0
hooks:
- id: blacken-docs
additional_dependencies:
- black==24.4.2
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ form2request

.. description starts
``form2request`` is an AI-powered Python 3.8+ library to build HTTP requests
out of HTML forms.
``form2request`` is a Python 3.8+ library to build HTTP requests out of HTML
forms.

.. description ends
Expand Down
Binary file removed dist/form2request-0.0.0.tar.gz
wRAR marked this conversation as resolved.
Show resolved Hide resolved
Binary file not shown.
6 changes: 5 additions & 1 deletion docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,8 @@
API reference
=============

.. autofunction:: form2request.request_from_form
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

.. autoclass:: form2request.Request
:members:
:undoc-members:
15 changes: 13 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,22 @@

html_theme = "sphinx_rtd_theme"

autodoc_member_order = "groupwise"

intersphinx_disabled_reftypes = []
intersphinx_mapping = {
"lxml": ("https://lxml.de/apidoc/", None),
"parsel": ("https://parsel.readthedocs.io/en/stable", None),
"python": ("https://docs.python.org/3", None),
"scrapy": ("https://docs.scrapy.org/en/latest", None),
}

nitpick_ignore = [
*(
("py:class", cls)
for cls in (
# https://github.com/sphinx-doc/sphinx/issues/11225
"FormdataType",
"FormElement",
"HtmlElement",
)
),
]
211 changes: 210 additions & 1 deletion docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,213 @@
Usage
=====

:ref:`Given an HTML form <form>`:

.. _fromstring-example:

>>> from lxml.html import fromstring
>>> html = b"""<form><input type="hidden" name="foo" value="bar" /></form>"""
>>> root = fromstring(html, base_url="https://example.com")
>>> form = root.xpath("//form")[0]

You can use :func:`~form2request.request_from_form` to generate :ref:`form
submission request data <request>`:

>>> from form2request import request_from_form
>>> request_from_form(form)
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

:func:`~form2request.request_from_form` supports :ref:`user-defined form data
<data>` and :ref:`choosing a specific form submission button (or none)
<click>`.


.. _form:

Getting a form
==============

:func:`~form2request.request_from_form` requires an
:class:`lxml.html.FormElement` object.
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

You can build one using :func:`lxml.html.fromstring` to parse an HTML document
and :meth:`lxml.html.HtmlElement.xpath` to find a form element in that
document, as :ref:`seen above <fromstring-example>`.

Here are some examples of XPath expressions that can be useful to find a form
element using :meth:`~lxml.html.HtmlElement.xpath`:

- To find a form by one of its attributes, such as ``id`` or ``name``, use
``//form[@<attribute>="<value>"]``. For example, to find ``<form id="foo"
…``, use ``//form[@id="foo"]``.

- To find a form by index, by order of appearance in the HTML code, use
``(//form)[n]``, where ``n`` is a 1-based index. For example, to find the
2nd form, use ``(//form)[2]``.

If you prefer, you could use the XPath of an element inside the form, and then
visit parent elements until you reach the form element. For example:

.. code-block:: python

element = root.xpath('//input[@name="zip_code"]')[0]
while True:
if element.tag == "form":
break
element = element.getparent()
form = element

If you use an lxml-based library or framework, chances are they also let you
get a :class:`~lxml.html.FormElement` object. For example, when using
:doc:`parsel <parsel:index>`:

>>> from parsel import Selector
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")[0].root
>>> type(form)
<class 'lxml.html.FormElement'>

A similar example, with a :doc:`Scrapy <scrapy:index>` response:

>>> from scrapy.http import TextResponse
>>> response = TextResponse("https://example.com", body=html)
>>> form = response.css("form")[0].root
>>> type(form)
<class 'lxml.html.FormElement'>


.. _data:

Setting form data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the docs and the explanations, great work! It's something which is currently missing in Scrapy's FormRequest docs.

=================

While there are forms made entirely of hidden fields, like :ref:`the one above
<fromstring-example>`, most often you will work with forms that expect
user-defined data:

>>> html = b"""<form><input type="text" name="foo" /></form>"""
>>> root = fromstring(html, base_url="https://example.com")
>>> form = root.xpath("//form")[0]

Use the second parameter of :func:`~form2request.request_from_form`, to define
the corresponding data:

>>> request_from_form(form, {"foo": "bar"})
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

You may sometimes find forms where more than one field has the same ``name``
attribute:

>>> html = b"""<form><input type="text" name="foo" /><input type="text" name="foo" /></form>"""
>>> root = fromstring(html, base_url="https://example.com")
>>> form = root.xpath("//form")[0]

To specify values for all same-name fields, instead of a dictionary, use an
iterable of key-value tuples:

>>> request_from_form(form, (("foo", "bar"), ("foo", "baz")))
Request(url='https://example.com?foo=bar&foo=baz', method='GET', headers=[], body=b'')

Sometimes, you might want to prevent a value from a field from being included
in the generated request data. For example, because the field is removed or
disabled through JavaScript, or because the field or a parent element has the
``disabled`` attribute (currently not supported by form2request):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL


>>> html = b"""<form><input name="foo" value="bar" disabled /></form>"""
>>> root = fromstring(html, base_url="https://example.com")
>>> form = root.xpath("//form")[0]

To remove a field value, set it to ``None``:

>>> request_from_form(form, {"foo": None})
Request(url='https://example.com', method='GET', headers=[], body=b'')

By default, if a form uses an unsupported method:

>>> html = b"""<form method="foo"></form>"""
>>> root = fromstring(html, base_url="https://example.com")
>>> form = root.xpath("//form")[0]

A :exc:`NotImplementedError` exception will be raised:
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

>>> request_from_form(form)
Traceback (most recent call last):
...
NotImplementedError: Found unsupported form method 'FOO'.

If the reason for the bad method is that the right method is set through
JavaScript code, you can use the ``method`` parameter to set the right value:

>>> request_from_form(form, method="GET")
Request(url='https://example.com', method='GET', headers=[], body=b'')


.. _click:

Configuring form submission
===========================

When an HTML form is submitted, the way the submission is triggered has an
impact on the resulting request data.

Given a submit button with ``name`` and ``value`` attributes:

>>> html = b"""<form><input type="submit" name="foo" value="bar" /></form>"""
>>> root = fromstring(html, base_url="https://example.com")
>>> form = root.xpath("//form")[0]

If you submit the form by clicking that button, those attributes are included
in the request data, which is what :func:`~form2request.request_from_form` does
by default:

>>> request_from_form(form)
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

However, sometimes it is possible to submit a form without clicking a submit
button, even when there is such a button. In such cases, the button data should
not be part of the request data. For such cases, set ``click`` to ``False``:

>>> request_from_form(form, click=False)
Request(url='https://example.com', method='GET', headers=[], body=b'')

You may also find forms with more than one submit button:

>>> html = b"""<form><input type="submit" name="foo" value="bar" /><input type="submit" name="foo" value="baz" /></form>"""
>>> root = fromstring(html, base_url="https://example.com")
>>> form = root.xpath("//form")[0]

By default, :func:`~form2request.request_from_form` clicks the first submission
element:

>>> request_from_form(form)
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

To change that, set ``click`` to the element that should be clicked:

>>> submit_baz = form.xpath('.//*[@value="baz"]')[0]
>>> request_from_form(form, click=submit_baz)
Request(url='https://example.com?foo=baz', method='GET', headers=[], body=b'')


.. _request:

Using request data
==================

:class:`~form2request.Request` is a simple data container that you can use to
build an actual request object:

>>> request_data = request_from_form(form)

Here are some examples for popular Python libraries and frameworks:

>>> from requests import Request
>>> request = Request(request_data.method, request_data.url, headers=request_data.headers, data=request_data.body)
>>> request
<Request [GET]>


>>> from scrapy import Request
>>> request = Request(request_data.url, method=request_data.method, headers=request_data.headers, body=request_data.body)
>>> request
<GET https://example.com?foo=bar>
2 changes: 2 additions & 0 deletions form2request/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
"""Build HTTP requests out of HTML forms."""

from ._base import Request, request_from_form
Loading
Loading