This repository has been archived by the owner on Mar 3, 2020. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 10
Features
Fabio Cicerchia edited this page May 23, 2014
·
1 revision
- Command Line Interface
- Catch and handle all the events bound to DOM elements (regardless how they have been set)
- Follows any 3xx redirect, JS document.location and meta redirect (can be disabled)
- Ignore duplicated URLs / requests and external URLs
- Test case files, with support of:
- COOKIEs
- FILES upload
- GET parameters
- HTTP headers
- POST parameters
- HTTP authentication
- Proxy settings
- Politeness Policy
- Generate report for each page crawled, with: 6
- Screenshot
- HTTP headers
- HTTP method
- Data sent (GET and POST)
- Page output
- Execution time
- Console messages
- Alerts, Confirmations & Prompts
- Errors
- List of successful and failed requests
- Pool system to limit the number of workers in the same time, then queue them
- Multiple crawlers working asynchronously one URL each one
- Support for the following HTML tags: a, area, base, form, frame, iframe, img, input, link, script
- URL normalisation
- Process the web page using PhantomJS
- Process all the output content types
- Keep the connection alive for lower CPU and memory load on the server