AiStudy Documents Crawler

This project encompasses a sophisticated web crawler engineered to systematically acquire educational resources from the 上海市中小学数字教学系统.

The crawler leverages Puppeteer, a Node.js library, to simulate human-like interactions with the Chromium browser, enabling the efficient extraction of download links. Subsequently, the tool employs the curl command-line utility to facilitate the recursive downloading of these resources to the local system.

README_demo.mov

Installation

# Clone the repository
npm i # Installs project dependencies, including compatible Chrome
npm run start # Executes the start script, which runs `app/start.js`

Examples

Crawl first

prompt> npm run start

Directly download or crawl first? (d/C) 
Run in headless mode? (Y/n) 
subjectIndex [1-17]: 1
Crawl documents or answer sheets? (D/a) 
subjectIndex [1-2]: 1
Startup grade [Default: 0]: 
Offset [Default: 100]: 
Startup semester [Default: 0]: 
Offset [Default: 100]: 
Startup unit [Default: 0]: 
Offset [Default: 100]: 
Startup course [Default: 0]: 
Offset [Default: 100]:

Direct download

prompt> npm run start

Directly download or crawl first? (d/C) d
劳动 - 6.json
sitemapName:

No available linkmaps

prompt> npm run start

Directly download or crawl first? (d/C) d
No linkmaps available!

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
app		app
.gitignore		.gitignore
README.md		README.md
icon.png		icon.png
package-lock.json		package-lock.json
package.json		package.json
screenshot.png		screenshot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AiStudy Documents Crawler

Installation

Examples

About

Releases 1

Packages

Languages

Mccranky83/aistudy-docs-crawler

Folders and files

Latest commit

History

Repository files navigation

AiStudy Documents Crawler

Installation

Examples

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages