Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added an Indeed Scrapper #970

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions dev-documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -1616,3 +1616,20 @@ First create an object of class `Dictionary`.
| `.word_of_the_day_definition()` | Returns the definition of the word of the day. |

---

## Indeed

First create an object of class `Indeed`.
```python
from scrape_up import indeed

indeed_job = indeed.get_url(positon="business manager",location="Geneva")
indeed_job.get_record()
```

| Methods | Details |
| ---------------------- | ------------------------------------------------------------------------------------------------ |
| `.get_url()` | Returns the URL of the job having a specific position and location. |
| `.get_record()` | Returns the company details like job title, company name, location, job post date, and salary. |

---
18 changes: 18 additions & 0 deletions documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -733,3 +733,21 @@ boxoffice = imdb.BoxOffice()
| Methods | Details |
| --------------- | ------------------------------------------------------------------------------ |
| `.top_movies()` | Returns the top box office movies, weekend and total gross, and weeks released.|


#### Indeed

Create an object of class `Indeed`.
```python
from scrape_up import indeed

indeed_job = indeed.get_url(positon="business manager",location="Geneva")
indeed_job.get_record()
```

| Methods | Details |
| ---------------------- | ------------------------------------------------------------------------------------------------ |
| `.get_url()` | Returns the URL of the job having a specific position and location. |
| `.get_record()` | Returns the company details like job title, company name, location, job post date, and salary. |

---
2 changes: 2 additions & 0 deletions src/scrape_up/Indeed/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .indeed import Indeed
__all__= ["Indeed"]
70 changes: 70 additions & 0 deletions src/scrape_up/Indeed/indeed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import requests
from bs4 import BeautifulSoup
import csv
from datetime import datetime

class Indeed:
"""
Create an instance of `Indeed` class.
```python
indeed = Indeed()
```
| Methods | Details |
| ---------------------- | ------------------------------------------------------------------------------------------------ |
| `.get_url()` | Returns the URL of the job having a specific position and location. |
| `.get_record()` | Returns the company details like job title, company name, location, job post date, and salary. |
"""

def __init__(self):
self.position=position
self.location=location

def get_url (self, position,location):
template = 'https://www.indeed.com/jobs?q={}&l={}'
url = template.format(position,location)
return url

#getting the record
def get_record(self, card):
atag1= card.h2.a.span
job_title= atag1.get('title')
atag2= card.h2.a
job_url= 'https://indeed.com'+atag2.get('href')

company= card.find('span','companyName').text.strip()
location= card.find('div','companyLocation').text.strip()
summary= card.find('div','job-snippet').text.strip()
posted_date= card.find('span','date').text.strip()
today= datetime.today().strftime('%Y-%m-%d')

try:
salary = card.find('div','metadata estimated-salary-container').text.strip()
except AttributeError:
salary = ''

record = (job_title, job_url, location, company, posted_date, today, summary, salary)
return record

#writing the main function
def main(self, position, location):
records = []
url = get_url(position, location)

while True:
response=requests.get(url)
soup = BeautifulSoup(response.text,'html.parser')
cards=soup.find_all('div','job_seen_beacon')
for card in cards:
record=get_record(card)
records.append(record)
try:
url='https://indeed.com'+soup.find('a',{'aria-label':'Next'}).get('href')
except AttributeError:
break

with open(f'{position}-{location}.csv','w',newline='',encoding= 'utf-8') as f:
writer= csv.writer(f)
writer.writerow(['Job_Title', 'Job_Url', 'Location', 'Company', 'Post_Date', 'Extraction_Date', 'Summary', 'Salary'])
writer.writerows(records)

main('business manager', 'Geneva') #creating a demo csv file to access the records
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DO not write any function to return or process the data. Just scrape the data from the platform and server it as a JSON.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also resolve the merge conflicts @Subhoshri