big-data-crime-and-weather

A big data system to handle weather and crime data | Final Project MPCS 53013 Big Data

Summary

The code in this repo implements a lambda architecture to feed a large-scale data application that takes weather and crime data, ingests it to HDFS, then automatically runs batch views of the data for user availability while also allowing for real-time updates on the fly.

You can see the Speed layer interface here. Fair warning, it's not very pretty because of the time constraints placed on this project -- the effort was, necessarily, on the back end functionality.

Tools Implemented

Assumes a Hadoop HDFS file system hosted on Google Cloud
Apache Kakfa for Serving Layer data collection
Apache Storm topology for Serving Layer ingestion
Apache Thrift data structure for fact-based, schema-on read data storage
Apache Pig for Batch Layer pre-computed view construction
Apache HBase for pre-computed view storage, Serving Layer data storage, and data access in Speed Layer
Basic HTML with Python back-end for Speed Layer data access

What's Here

set-up contains the necessary shell code for running various aspects of the system
frontEnd contains the Speed Layer for data access
ingestFiles contains the ingestion code for HDFS serialization
thriftFiles contains the Thrift schema for serialization
pigFiles contain all the Pig code for batch layer runs
stormFiles contains the code for the Serving layer Storm topology
jars contain the necessary jars and uberjars for java applications
mvn and pig contain necessary open-source application jars for implementation

Data

Data are from the NOAA (ftp://ftp.ncdc.noaa.gov/pub/data/gsod/) and the City of Chicago data portal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

big-data-crime-and-weather

Summary

Tools Implemented

What's Here

Data

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
frontEnd		frontEnd
ingestFiles		ingestFiles
jars		jars
pig		pig
pigFiles		pigFiles
set-up		set-up
stormFiles		stormFiles
thriftFiles		thriftFiles
README.md		README.md

aldengolab/big-data-crime-and-weather

Folders and files

Latest commit

History

Repository files navigation

big-data-crime-and-weather

Summary

Tools Implemented

What's Here

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages