Recursively crawl popular blogging website https://medium.com using Node.js and harvest all possible hyperlinks that belong to medium.com and store them in a database.
What is needed to be stored?
- Every unique URL encountered.
- The total reference count of every URL.
- A complete unique list of parameters associated with this URL
1) Node.js
2) MongodB
After mongodb server is up and running, run following commands:
1) npm install
2) npm start dev-server
Once the server starts:
1) It will start crawling data and then it will upload it to database.
2) User can get all the uploaded data by sending a GET request to 'http://localhost:3000/api/url/getAllData'.
docker-compose build
docker-compose up
Once the server starts, it will follow same steps mentioned in the above section.