python main.py http://www.example.com
- Creates a list of all pages indexed on a website - Creates a list of indexed pages with their relative depths and respective predecessors - Creates an image of the website network - Saves the indexed URLs to a JSON file - Added robustness to handle complex data parsing and broken hyperlinks - Added limits for maxdepth and maxpages indexed - Added support for relative links i.e. (href = "/source") - Requests or Requests[Security] to allow true SSL connections - JSON library - Matplotlib to allow plotting the graph - Networkx to create the Graph using the list data An image of codeacademy.com Network: An image of google.com Network @ 100 pages:NOTE: Crawling can sometimes a really long time depending on the maxpages specified. It has a default value of 100 pages.
- Set up a web app to perform crawling on a given user query and then present an interactive visualization. - Use d3.js to visualize the website tree.Author - bagarwa2