Skip to content

Commit

Permalink
Initial implementation (#1)
Browse files Browse the repository at this point in the history
Co-authored-by: Mikhail Korobov <[email protected]>
  • Loading branch information
Gallaecio and kmike authored Jul 12, 2024
1 parent dd97de2 commit 3c00688
Show file tree
Hide file tree
Showing 16 changed files with 1,300 additions and 13 deletions.
3 changes: 3 additions & 0 deletions .bandit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
skips:
- B101 # assert_used, needed for mypy
exclude_dirs: ['tests']
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[report]
exclude_lines =
if TYPE_CHECKING:
4 changes: 4 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
[flake8]
extend-select = TC, TC1
ignore =
max-line-length = 88
per-file-ignores =
# F401: Imported but unused
form2request/__init__.py:F401
# D100-D104: Missing docstring
docs/conf.py:D100
tests/__init__.py:D104
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
fail-fast: false
matrix:
python-version: ['3.12']
tox-job: ["pre-commit", "mypy", "docs", "twinecheck"]
tox-job: ["pre-commit", "mypy", "docs", "doctest", "twinecheck"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
/.coverage
/coverage.xml
/dist/
/.tox/
12 changes: 12 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,20 @@ repos:
- flake8-debugger
- flake8-docstrings
- flake8-string-format
- flake8-type-checking
- repo: https://github.com/asottile/pyupgrade
rev: v3.16.0
hooks:
- id: pyupgrade
args: [--py38-plus]
- repo: https://github.com/pycqa/bandit
rev: 1.7.9
hooks:
- id: bandit
args: [-r, -c, .bandit.yml]
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.18.0
hooks:
- id: blacken-docs
additional_dependencies:
- black==24.4.2
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ form2request

.. description starts
``form2request`` is an AI-powered Python 3.8+ library to build HTTP requests
out of HTML forms.
``form2request`` is a Python 3.8+ library to build HTTP requests out of HTML
forms.

.. description ends
Expand Down
Binary file removed dist/form2request-0.0.0.tar.gz
Binary file not shown.
6 changes: 5 additions & 1 deletion docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,8 @@
API reference
=============

.. autofunction:: form2request.form2request

.. autoclass:: form2request.Request
:members:
:undoc-members:
17 changes: 15 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,24 @@

html_theme = "sphinx_rtd_theme"

autodoc_member_order = "groupwise"

intersphinx_disabled_reftypes = []
intersphinx_mapping = {
"lxml": ("https://lxml.de/apidoc/", None),
"parsel": ("https://parsel.readthedocs.io/en/stable", None),
"python": ("https://docs.python.org/3", None),
"scrapy": ("https://docs.scrapy.org/en/latest", None),
}

nitpick_ignore = [
*(
("py:class", cls)
for cls in (
# https://github.com/sphinx-doc/sphinx/issues/11225
"FormdataType",
"FormElement",
"HtmlElement",
"Selector",
"SelectorList",
)
),
]
219 changes: 218 additions & 1 deletion docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,221 @@
Usage
=====

:ref:`Given an HTML form <form>`:

.. _parsel-example:

>>> from parsel import Selector
>>> html = b"""<form><input type="hidden" name="foo" value="bar" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

You can use :func:`~form2request.form2request` to generate form submission
request data:

>>> from form2request import form2request
>>> req = form2request(form)
>>> req
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

:func:`~form2request.form2request` does not make requests, but you can use its
output to build requests with any HTTP client software, e.g. with the requests_
library:

.. _requests: https://requests.readthedocs.io/en/latest/

.. _requests-example:

>>> import requests
>>> requests.request(req.method, req.url, headers=req.headers, data=req.body) # doctest: +SKIP
<Response [200]>

:func:`~form2request.form2request` supports :ref:`user-defined form data
<data>`, :ref:`choosing a specific submit button (or none) <click>`, and
:ref:`overriding form attributes <override>`.


.. _form:

Getting a form
==============

:func:`~form2request.form2request` requires an HTML form object. You can get
one using :doc:`parsel <parsel:index>`, as :ref:`seen above <parsel-example>`,
or you can use :doc:`lxml <lxml:index>`:

.. _fromstring-example:

>>> from lxml.html import fromstring
>>> root = fromstring(html, base_url="https://example.com")
>>> form = root.xpath("//form")[0]

If you use a library or framework based on :doc:`parsel <parsel:index>` or
:doc:`lxml <lxml:index>`, chances are they also let you get a form object. For
example, when using a :doc:`Scrapy <scrapy:index>` response:

>>> from scrapy.http import TextResponse
>>> response = TextResponse("https://example.com", body=html)
>>> form = response.css("form")

Here are some examples of XPath expressions that can be useful to get a form
using parsel’s :meth:`Selector.xpath <parsel.selector.Selector.xpath>` or
lxml’s :meth:`HtmlElement.xpath <lxml.html.HtmlElement.xpath>`:

- To find a form by one of its attributes, such as ``id`` or ``name``, use
``//form[@<attribute>="<value>"]``. For example, to find ``<form id="foo"
``, use ``//form[@id="foo"]``.

When using :meth:`Selector.css <parsel.selector.Selector.css>`, ``#<id>``
(e.g. ``#foo``) finds by ``id``, and ``[<attribute>="<value>"]`` (e.g.
``[name=foo]`` or ``[name="foo bar"]``) finds by any other attribute.

- To find a form by index, by order of appearance in the HTML code, use
``(//form)[n]``, where ``n`` is a 1-based index. For example, to find the
2nd form, use ``(//form)[2]``.

If you prefer, you could use the XPath of an element inside the form, and then
visit parent elements until you reach the form element. For example:

.. code-block:: python
element = root.xpath('//input[@name="zip_code"]')[0]
while True:
if element.tag == "form":
break
element = element.getparent()
form = element
.. _data:

Setting form data
=================

While there are forms made entirely of hidden fields, like :ref:`the one above
<fromstring-example>`, most often you will work with forms that expect
user-defined data:

>>> html = b"""<form><input type="text" name="foo" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

Use the ``data`` parameter of :func:`~form2request.form2request`, to define
the corresponding data:

>>> form2request(form, {"foo": "bar"})
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

You may sometimes find forms where more than one field has the same ``name``
attribute:

>>> html = b"""<form><input type="text" name="foo" /><input type="text" name="foo" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

To specify values for all same-name fields, instead of a dictionary, use an
iterable of key-value tuples:

>>> form2request(form, (("foo", "bar"), ("foo", "baz")))
Request(url='https://example.com?foo=bar&foo=baz', method='GET', headers=[], body=b'')

.. _remove-data:

Sometimes, you might want to prevent a value from a field from being included
in the generated request data. For example, because the field is removed or
disabled through JavaScript, or because the field or a parent element has the
``disabled`` attribute (currently not supported by form2request):

>>> html = b"""<form><input name="foo" value="bar" disabled /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

To remove a field value, set it to ``None``:

>>> form2request(form, {"foo": None})
Request(url='https://example.com', method='GET', headers=[], body=b'')


.. _click:

Choosing a submit button
========================

When an HTML form is submitted, the way form submission is triggered has an
impact on the resulting request data.

Given a submit button with ``name`` and ``value`` attributes:

>>> html = b"""<form><input type="submit" name="foo" value="bar" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

If you submit the form by clicking that button, those attributes are included
in the request data, which is what :func:`~form2request.form2request` does
by default:

>>> form2request(form)
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

However, sometimes it is possible to submit a form without clicking a submit
button, even when there is such a button. In such cases, the button data should
not be part of the request data. For such cases, set ``click`` to ``False``:

>>> form2request(form, click=False)
Request(url='https://example.com', method='GET', headers=[], body=b'')

You may also find forms with more than one submit button:

>>> html = b"""<form><input type="submit" name="foo" value="bar" /><input type="submit" name="foo" value="baz" /></form>"""
>>> selector = Selector(body=html, base_url="https://example.com")
>>> form = selector.css("form")

By default, :func:`~form2request.form2request` clicks the first submit button:

>>> form2request(form)
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

To change that, set ``click`` to the element that should be clicked:

>>> submit_baz = form.css("[value=baz]")
>>> form2request(form, click=submit_baz)
Request(url='https://example.com?foo=baz', method='GET', headers=[], body=b'')


.. _override:

Overriding form attributes
==========================

You can override the method_ and enctype_ attributes of a form:

.. _enctype: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/form#enctype
.. _method: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/form#method

>>> form2request(form, method="POST", enctype="text/plain")
Request(url='https://example.com', method='POST', headers=[('Content-Type', 'text/plain')], body=b'foo=bar')


.. _request:

Using request data
==================

The output of :func:`~form2request.form2request`,
:class:`~form2request.Request`, is a simple request data container:

>>> req = form2request(form)
>>> req
Request(url='https://example.com?foo=bar', method='GET', headers=[], body=b'')

While :func:`~form2request.form2request` does not make requests, you can use
its output request data to build an actual request with any HTTP client
software, like the requests_ library (see an example :ref:`above
<requests-example>`) or the :doc:`Scrapy <scrapy:index>` web scraping
framework:

.. _Scrapy: https://docs.scrapy.org/en/latest/

>>> from scrapy import Request
>>> Request(req.url, method=req.method, headers=req.headers, body=req.body)
<GET https://example.com?foo=bar>
2 changes: 2 additions & 0 deletions form2request/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
"""Build HTTP requests out of HTML forms."""

from ._base import Request, form2request
Loading

0 comments on commit 3c00688

Please sign in to comment.