Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation #1

Merged
merged 28 commits into from
Jul 12, 2024
Merged

Initial implementation #1

merged 28 commits into from
Jul 12, 2024

Conversation

Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Jul 1, 2024

Implementation differences compared to Scrapy

While the code is heavily based on Scrapy’s, and intended to replace it in the future, it has some notable implementation differences worth pointing out for discussion:

  • The API is a single form2request function that returns a Request dataclass.
  • The default method, if a form does not indicate one or indicates a non-standard one, is GET, in line with the standard and browser behavior. In Scrapy, it is POST if non-empty user data is specified.
  • The dialog method raises NotImplementedError.
  • There is no utility for getting a FormElement through parameters (formname, formnid, formnumber, formxpath, formcss). I thought the API would be cleaner if instead we expect a FormElement as input. To compensate, I documented how to get a FormElement in all the different ways Scrapy supports using XPath. Parsel support has also been added, and the docs cover how to use Scrapy.
  • clickdata and dont_click are merged into a single click keyword-only parameter. Similar to form, click accepts an HtmlElement instead of parameters (clickdata) that resolve to one. It also supports None (default, same as before), False (equivalent to dont_click=True) and True (like None, but raises a ValueError if there is no button).
  • Clicked buttons can now override action and method with their corresponding, form-prefixed attributes. I found about the existence of those attributes while reviewing the implementation, and found it trivial to add support.
  • Removed some Scrapy code (_value, _select_value) that does not seem necessary anymore with the minimum supported version of lxml.
  • When enctype (or formenctype in a submit button) is multipart/form-data and method is POST, NotImplementedError is raised.
  • Users can override enctype.
  • Added support for text/plain enctype, since it was trivial, although it should rarely be used in the wild.

Let’s discuss these choices, specially naming and the not implementation of form selection and click selection parameters.

To do

  • Base implementation.
  • Docs.
  • Remove the AI mention from the description.
  • Support formaction.
  • Full test coverage.
  • Cover all implementation differences when compared to Scrapy and the reasoning behind.
  • Allow overriding enctype to handle scenarios where a different value is set via JavaScript.

docs/conf.py Outdated Show resolved Hide resolved
form2request/_base.py Outdated Show resolved Hide resolved
@Gallaecio Gallaecio marked this pull request as ready for review July 2, 2024 13:32
docs/api.rst Outdated Show resolved Hide resolved
docs/usage.rst Outdated Show resolved Hide resolved
Sometimes, you might want to prevent a value from a field from being included
in the generated request data. For example, because the field is removed or
disabled through JavaScript, or because the field or a parent element has the
``disabled`` attribute (currently not supported by form2request):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL

docs/usage.rst Outdated Show resolved Hide resolved
form2request/_base.py Outdated Show resolved Hide resolved
form2request/_base.py Outdated Show resolved Hide resolved

.. _data:

Setting form data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the docs and the explanations, great work! It's something which is currently missing in Scrapy's FormRequest docs.

form2request/_base.py Outdated Show resolved Hide resolved
dist/form2request-0.0.0.tar.gz Outdated Show resolved Hide resolved
docs/usage.rst Show resolved Hide resolved
form2request/_base.py Outdated Show resolved Hide resolved
tests/test_main.py Outdated Show resolved Hide resolved
Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @Gallaecio!

@Gallaecio Gallaecio merged commit 3c00688 into scrapy:main Jul 12, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants