Parking Requirement Database

(for database credentials: place database.ini file into repo directory on your machine)

Current development:

web scraping/crawling
ORM database management
PostgreSQL database
currently hosted by Supabase
Wiki documentation

Future development:

NLP
Website
Aggregation and other data analysis

For Windows users:

Due to the use of scrapy-playwright (loading js elements when using scrapy), we recommend installing WSL/Ubuntu to run the scrapy spider.

However, you may want to still have a conda environment on your Windows environment for quick debugging and development through your IDE (ex. VSCode, PyCharm, etc).

Other utilities to consider include this for commands like view(response) in scrapy shell.

There may be more dependencies to install including:

playwright install-deps
etc

Installing conda environment

Open Anaconda Prompt
Create conda environment from environment.yml: has all the necessary libraries and packages (including ipykernel)
```
(base) > conda env create -f environment-[os].yml
```
NOTE: environment.yml will need to be updated if we need to use more packages
Main packages:
- SQLAlchemy 2.0
- Camelot
- Selenium
- Beautiful Soup 4
- ipykernel
- lxml
- html5lib
- pandas, numpy
- scrapy
- scrapy-playwright
Use the following command to update environmental.yml
```
   conda env export > environment.yml
```

Web scraping/crawling

(currently trying to migrate from selenium/bs4 into scrapy)

In the web_crawling folder (has settings.py and a folder called web_crawling)
```
scrapy crawl munispider
```
Check out the wiki for updated information

Extra installation step for Jupyter Notebook

Open Anaconda Prompt
Install nb_conda_kernels in base environment: allows you to access conda environments in Jupyter Notebook (as long as ipykernel is installed)
```
(base) > conda install nb_conda_kernels
```
When running .ipynb, switch kernel to "Python [conda env: db_env]"
A quick test to make sure the environment/kernel is working.
```
import sqlalchemy
sqlalchemy.__version__
>> '2.0.12'
```

Inserting into database

Download Chrome driver. Change path of chrome driver in get_html() in scraper_functions.py.

Main function is create_csv_url() in data_processing.py

from data_processing import *
create_csv_url("CA", "Los Angeles County", [insert url], [insert table number])   # web-scrape
create_csv_url("CA", "Los Angeles County")   # if csv exists

Follow prompts

Reading database

Main function is read_database() in database_functions.py
```
from database_functions import *
read_database()
```
Follow prompts

Reading pdfs

Main function is read_pdf() in scraper_functions.py
Install Ghostscript via your OS here
Run function with parameters

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
input_csv		input_csv
pdfs		pdfs
web_crawling		web_crawling
.gitignore		.gitignore
README.md		README.md
config.py		config.py
data_processing.py		data_processing.py
database_functions.py		database_functions.py
environment-mac.yml		environment-mac.yml
environment-ubuntu.yml		environment-ubuntu.yml
environment-windows.yml		environment-windows.yml
scraper_functions.py		scraper_functions.py
using_db_env.ipynb		using_db_env.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parking Requirement Database

For Windows users:

Installing conda environment

Web scraping/crawling

Extra installation step for Jupyter Notebook

Inserting into database

Reading database

Reading pdfs

About

Releases

Packages

Contributors 3

Languages

ParkingReformNetwork/parking-requirement-database

Folders and files

Latest commit

History

Repository files navigation

Parking Requirement Database

For Windows users:

Installing conda environment

Web scraping/crawling

Extra installation step for Jupyter Notebook

Inserting into database

Reading database

Reading pdfs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages