Node.js Async Crawler

Requirements

Recursively crawl popular blogging website https://medium.com using Node.js and harvest all possible hyperlinks that belong to medium.com and store them in a database.

What is needed to be stored?

Every unique URL encountered.
The total reference count of every URL.
A complete unique list of parameters associated with this URL

Prerequisites

1) Node.js
2) MongodB

Running server in development mode

After mongodb server is up and running, run following commands:

1) npm install
2) npm start dev-server

Once the server starts:

1) It will start crawling data and then it will upload it to database.
2) User can get all the uploaded data by sending a GET request to 'http://localhost:3000/api/url/getAllData'.

Deployment in production mode

docker-compose build
docker-compose up

Once the server starts, it will follow same steps mentioned in the above section.

Built With

Node.js - Server
MongodB - Database
Docker - Container Platform

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Models		Models
config		config
controllers		controllers
routes		routes
util/crawler		util/crawler
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Node.js Async Crawler

Requirements

Prerequisites

Running server in development mode

Deployment in production mode

Built With

About

Releases

Packages

Languages

Ajinkya009/asyncCrawler

Folders and files

Latest commit

History

Repository files navigation

Node.js Async Crawler

Requirements

Prerequisites

Running server in development mode

Deployment in production mode

Built With

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages