Skip to content

Latest commit

 

History

History
27 lines (14 loc) · 1.19 KB

README.md

File metadata and controls

27 lines (14 loc) · 1.19 KB

WebCrawley

A web crawling qualifier library.

Goal

The purpose of this project is to offer a simple and dynamic way to detect which websites are qualified according to your configurations.

How it works

This library fetches the HTML code from the input urls to visit, and applies to each one a CSS query (so that the crawler library can be easily configured to reach different elements).

A website is qualified if at least one keyword (there can be many in the input) are searched for in the results of the previous query.

Depending on the CSS query, the keywords are search for in different places:

  1. Inside the element's inner HTML (find an example here).

  2. Inside the element's specified attributes (find an example here).

The HTML and the keywords are all transformed into lowercase before checking. Which means that the results from the qualifier are case insensitive.

The applied qualifiers can be a mix of checking the inner HTML and the attribute's value. It's your choice how to use it!

Example

You can find an overall example here.