Skip to content

Latest commit

 

History

History
46 lines (28 loc) · 1.24 KB

README.md

File metadata and controls

46 lines (28 loc) · 1.24 KB

dsci560_app

APP GitHub:

https://github.com/Alleria1809/dsci560_app.git

yelp_crawler.ipynb

Crawl information and attributes from Yelp using selenium.

EDA.ipynb

Exploratory Data Analysis steps for collected data, e.g. encoding, statistical analysis, plotting, and other visualizations.

prediction_modeling.ipynb

Use different models to predict the risk levels.

record_linkage.ipynb:

Read data from the LA open dataset and Yelp crawled data. Use RLTK package to handle the two datasets. Apply Blocking and Entity Linking techniques to combine the data.

segmentation.ipynb

Run PCA to reduce the dimension. Run KMeans to cluster the data. Use t-SNE to generate 2-D visualizations. Apply LDA topic modeling to detect keywords of the restaurant comments in each cluster.

NN.ipynb

Use TensorFlow framework to build neural network models for multiclass classification.

recommendation.ipynb

Generate tag sets for each restaurant. Compute Jaccard similarities. Recommendation algorithms for both recommendation functions - inputting features & inputting name.

Project Video

https://drive.google.com/file/d/1i-z4BUMXxMZFXgBARAiYcsB-Vs2owMNM/view?usp=sharing

Presentation

Please refer to the Final_Presentation.pdf