Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using More Than One Custom Field Duplicates The Output #1025

Open
matugm opened this issue Sep 16, 2024 · 0 comments · May be fixed by #1031
Open

Using More Than One Custom Field Duplicates The Output #1025

matugm opened this issue Sep 16, 2024 · 0 comments · May be fixed by #1031
Assignees
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@matugm
Copy link

matugm commented Sep 16, 2024

katana version:

v1.1.0

Current Behavior:

Having defined the following test page:

dsfsdfdsf
<title>Test Page</title>
<h2>testingaaaa</h2>

aaaaaa
<a href='/c1'>B</a>
lñsdfjlsdfsfkdj

<p>[email protected]</p>

And these custom fields:

- name: email
  type: regex
  part: response
  regex:
    - ([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)

- name: h2
  type: regex
  group: 1
  regex:
    - <h2>(.*)</h2>

- name: title
  type: regex
  group: 1
  regex:
    -  <title>(.*)</title>

With this katana command:

katana -u http://localhost:8000/test.html -v -f email,h2,title

Produces duplicate output fields:

[INF] Started standard crawling for => http://localhost:8000/test.html
[email] [email protected]
[h2] testingaaaa
[title] Test Page
[h2] testingaaaa
[title] Test Page
[email] [email protected]
[email] [email protected]
[h2] testingaaaa
[title] Test Page

Expected Behavior:

Katana shouldn't produce more elements than really exists on the page. For example, the "title" only appears once in the page but katana is displaying it 3 times. If I only use one custom tag (for example just -f title) it works as expected, but when I supply multiple custom tags (-f email,h2,title) this bug appears.

Thanks for the great tool. Hopefully it can be fixed! :)

@matugm matugm added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Sep 16, 2024
@dogancanbakir dogancanbakir self-assigned this Sep 16, 2024
@dogancanbakir dogancanbakir linked a pull request Sep 16, 2024 that will close this issue
@dogancanbakir dogancanbakir linked a pull request Sep 16, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants