-
Notifications
You must be signed in to change notification settings - Fork 2
Home
nOOBIE nOOBIE edited this page May 28, 2018
·
9 revisions
with this scraper you can scrape every news article from onlinekhabar.com and save it locally on your computer in different formats such as csv, json
Operating System: Linux
Language: python3
libraries: Scrapy, system, chdir
This scraper was purely built for research purpose.
Please note that you might need to make some changes to the scraper if in future the interface of the onlinekhabar.com is changed.( the scraping is totally based on CSS)
- clone this repository to your computer
- Launch terminal
- Navigate to the folder with file scrapy.cfg
scrapy crawl onlinekhabr -a category="blog" -a address="https://www.onlinekhabar.com/content/opinion" -o blogdata.csv
This is the sample code
scrapy crawl onlinekhabr -a category="blog" -a address="https://www.onlinekhabar.com/content/opinion" -o blogdata.csv
- It will create a folder name blog(category="blog").If you want another folder for your news change the category value * It will create a csv file name blogdata.csv and the scraped news articles will be inside the csv file(if you want news in json format put (-o blogdata.json)
- You can also change the address to scrape other news such as entertainment, business. Get the link from onlinekhabar for a category that you want. dont forget to change category='yourcategory' in the code