Skip to content
This repository has been archived by the owner on Mar 3, 2020. It is now read-only.

How it works

Fabio Cicerchia edited this page May 23, 2014 · 2 revisions
  1. Start processing an URL
  2. Open a system process to PhantomJS
  3. Open the URL
  4. If there is a JS event, put it into a dedicate stack
  5. Inject custom event listener 1. Override existent event listener
  6. Collect all the relevant info from the page for the report
  7. On load complete, execute the events in the stack
  8. Start to process the web page
  9. Get all the links from the page content
  10. Normalise and filter by uniqueness all the URLs collected
  11. Get all the JS events bound to DOM elements
  12. Clone the web page for each new combination in the page (confirm)
  13. Put the web page instance in a dedicate stack for each JS event
  14. Process the all the web pages in the stack
  15. Get all the links from the page content
  16. Reiterate until there are no more JS events
  17. If there is an error retry up to 5 times
  18. Collect all the data sent by the parser
  19. Create test cases for POST data with normalised fields
  20. Get POST test cases for current URL
  21. Launch a new crawler for each test case
  22. Store details in report file
  23. Increase the counter for possible crawlers to be launched based on the links
  24. Check the links if are already been processed
  25. If not, launch a new process for each link
  26. If there are no more links to be processed, check if there are still sub-crawlers running
  27. If not so, terminate the process
Clone this wiki locally