Skip to content

melisacar/ktb-monthly-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KTB Monthly Scraper

  • This project extracts monthly tourism statistics for Türkiye and Istanbul from a PDF available on a specified webpage.
  • The extracted data is structured and ready for further analysis or storage in a database.

Features

  • Web scraping: Automatically fetches the latest PDF from the specified website.
  • PDF processing: Extracts relevant data (date, location, and visitor count) from the PDF.
  • Data structuring: Outputs the extracted information in a structured format as a pandas.DataFrame.

 Requirements

  • Install the dependencies using:
pip install -r requirements.txt

Usage

  • Run the script: Execute the main_check() function to process the latest available data.
  • Output: The script outputs a structured DataFrame with the following columns:
    • tarih (Date)
    • ist_tr (Location: "Türkiye" or "İstanbul")
    • ziyaretci_sayisi (Visitor Count)

Error Handling

  • If you wish to install a Python library that isn't in Homebrew,

    • Use a Virtual Environment: Using a virtual environment allows you to manage your Python packages independently of the system-wide Python installation. Here’s how to create and activate a virtual environment:
python3.13 -m venv venv 
# Here the path to venv is /Users/user_name/your_folder/ktb-monthly-scraper/venv
source /Users/user_name/your_folder/ktb-monthly-scraper/venv/bin/activate
python3 -m pip install xyz

Check for Existing Services Using the Port

  1. Open the Terminal
  2. Run the command to check the "5432" port:
lsof -i :5432

About

Ministry of Culture and Tourism

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published