Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement extraction of asset version from response body #79

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

elvin-tajirzada
Copy link

No description provided.

@GeorginaReeder
Copy link

Thanks for your contribution @elvin-tajirzada , we appreciate it!

We also have a Discord server that you're welcome to join. It's a great place to connect with fellow contributors and stay updated with the latest developments!

@ehsandeep ehsandeep requested a review from Mzack9999 April 9, 2024 10:09
Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elvin-tajirzada Thanks for the PR. I'm trying to better understand the use case of the new addition. Unless I'm missing something, isn't the URL content already available within the body?

@elvin-tajirzada
Copy link
Author

I need the URL to extract the version from the scripts. Let me give an example. Let's assume that jquery is used. Right now the version of jquery does not come, because the version is inside Jquery's own script. (Script: <script src="/bootstrap/js/jquery.js"></script>). I need the URL to reach the /bootstrap/js/jquery.js endpoint. Unfortunately, the URL is not included in the Body of all sites.

@Gby56
Copy link

Gby56 commented Jun 9, 2024

Hi ! I think I'm currently doing something similar to properly analyze a full webpage, I have a headless browser to get the list of all loaded assets, then I download all of them and analyze them to detect if a piece of JavaScript bundle had react, jquery and so on...
Problem is that by default wappalyzer seems to only tokenize HTML and doesn't try to regex js files, I think you fixed that here ?

Or is it just extracting the version but not the actual technology from the content ?

@elvin-tajirzada
Copy link
Author

Yes, it is just extracting the version. It doesn't extract the actual technology from the content.

@Gby56
Copy link

Gby56 commented Jun 10, 2024

Ok I see... My idea might fit in a different PR then, adding a new Fingerprinting function to indicate whether it's HTML or a js file to analyze, so that we skip the HTML tokenizer etc...

@elvin-tajirzada
Copy link
Author

Yes. Right now my approach is used in our project but this idea can be written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants