GitHub - kunlizhang/espy-crawler

EEPY CRAWL: INFO

We did not use any third party extensions.

We partitioned our crawls, and ran PageRank and Indexer on those partitions. We created one large compiled index table using appends, and merged all crawl tables (and all PageRank tables) using merge-script.py. These zip files, along with the entire set of crawls in another folder, are in Drive. Prior to compiling, unzip the relevant worker folders for indexer, PageRank, and crawler, and add all tables to worker folders (partitioned already by pre-determined IDs). If running in local, add worker folders to the local repository; if in EC2, add to the repository root directory.

EEPY CRAWL: COMPILE

To compile the code files on EC2, run ./crawler-ec2-script. If compiling in local, run ./script.sh.

EEPY CRAWL: RUN

To run the search engine on EC2 after running the script: sudo java -Xmx80g -cp bin cis5550.frontend.EepyCrawlSearch 80 > program.log 2>&1 &

Job-related commands (after running the script) are listed below: Crawler: java -cp bin cis5550.flame.FlameSubmit localhost:9000 new-crawler.jar cis5550.jobs.NewCrawler PageRank: java -cp bin cis5550.flame.FlameSubmit localhost:9000 pagerank.jar cis5550.jobs.NewPageRank Indexer: java -cp bin cis5550.flame.FlameSubmit localhost:9000 indexer.jar cis5550.jobs.Indexer

Name		Name	Last commit message	Last commit date
Latest commit History 380 Commits
.idea		.idea
merge-tables-sandbox		merge-tables-sandbox
pages		pages
src/cis5550		src/cis5550
.DS_Store		.DS_Store
.gitignore		.gitignore
Fa24-CIS5550-Project-eepy-crawl.iml		Fa24-CIS5550-Project-eepy-crawl.iml
README.md		README.md
crawler-ec2-script.sh		crawler-ec2-script.sh
crawler-script.sh		crawler-script.sh
dns		dns
eepy-crawl.iml		eepy-crawl.iml
kvsBenchmark.sh		kvsBenchmark.sh
log.properties		log.properties
merge-tables.py		merge-tables.py
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EEPY CRAWL: INFO

EEPY CRAWL: COMPILE

EEPY CRAWL: RUN

About

Releases

Packages

Contributors 5

Languages

kunlizhang/espy-crawler

Folders and files

Latest commit

History

Repository files navigation

EEPY CRAWL: INFO

EEPY CRAWL: COMPILE

EEPY CRAWL: RUN

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages