Skip to content

"This Python script automates scraping of data science job listings from Shine.com, targeting posts less than or equal to 7 days old. It extracts detailed job information, handles pagination, and uses Selenium for dynamic content. Configure via config.ini."

License

Notifications You must be signed in to change notification settings

bharadwaj008/shine-jobs-scraping

Repository files navigation

shine-jobs-scraping

Data Science Jobs Scraper

Overview

This script automates the process of scraping job listings from Shine.com for data science positions. It navigates through the job listings, extracting detailed information including job title, company name, experience required, job type, and job description.

Features

  • Targets recent job listings (posted within the last 7 days).
  • Handles pagination up to a specified number of pages.
  • Utilizes Selenium for dynamic content extraction to ensure accuracy.
  • Detailed logging of operations and errors.

Requirements

To install the required packages, run the following command: pip install -r requirements.txt

markdown Copy

Configuration

Edit the config.ini file to specify the maximum number of pages (MaxPages) and the path to your Selenium WebDriver (DriverPath).

Example:

[DEFAULT] MaxPages = 10 DriverPath = path/to/your/chromedriver.exe

bash Copy

Running the Script

To run the script, use the following command: python scrape_jobs.py

Ensure that Python and all required packages are installed, and that you are in the directory containing the script.

Logging

Errors and information are logged to BharadwajKamepalli_Errors.log, which includes details about URL accesses, data extraction issues, and other runtime events.

Limitations

  • The script is dependent on the structure of the website. Changes to the website may require updates to the script.
  • Designed to run on websites without advanced anti-bot protections.

Author

Bharadwaj Kamepalli

About

"This Python script automates scraping of data science job listings from Shine.com, targeting posts less than or equal to 7 days old. It extracts detailed job information, handles pagination, and uses Selenium for dynamic content. Configure via config.ini."

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages