Fall 2016 Capstone Project for Nora Barry, Laura Buchanan, Jacqueline Gutman
Final paper in RevealEstate.pdf
Python version 3.5.2
Set up a Python3 virtual enviroment with
. Install necessary packages for analysis by runningpip3 install -r requirements.txt -U
orconda install --yes --file hpc_requirements.txt
. -
Download and organize NYC Department of Finance data into data directory by running
source download_finance_data.sh
. -
Download PLUTO data and data dictionary by running
source download_pluto_data.sh
. -
Download PyProj github repository to perform transformations between X/Y coordinates and latitude/longitude coordinates by running
source get_pyproj.sh
. -
Review and follow instructions for downloading Digital Tax Map data in the DTM README
Extract distance to subway data and other open NYC data (add details here).
Merge PLUTO and Department of Finance data for specified years and boroughs by running
python3 merge_pluto_finance_new.py --borough all --year all
to obtain and merge data for all boroughs and all years (5 boroughs, 2003-2016), or by runningpython3 merge_pluto_finance_new.py --borough {BOROUGH} {BOROUGH} --year {YEAR} {YEAR}
for whatever boroughs and years desired. Seepython3 merge_pluto_finance_new.py --help
for details. -
Fit Linear Regression or Random Forest Model to merged data set by running
python3 regression_loop.py --data {path_to_merged_csv} --model {lr, rf, ada, bag, et, gb, en, hr, br, ll, lasso, ridge, sgd, svr, linsvr} --iters {50}
. Seepython3 regression_loop.py --help
for details. -
If using the Mercer HPC environment, check the configuration with
source setup_conda.sh
and then submit a job by runninggit pull; qsub jobs/{job}.pbs