Disaster Response Pipeline Project

Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in the database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Run the following command in the app's directory to run your web app. python run.py
Go to http://0.0.0.0:3001/

Summary:

In this project, I've analyzed disaster data from Figure Eight to build a model for an API that classifies disaster messages.

The data set contains real messages that were sent during disaster events. I've created a machine learning pipeline to categorize these events so that messages can be sent to an appropriate disaster relief agency.

This project has an app inside the app folder. Using it an emergency worker can input a new message and get classification results in several categories. The web app also display the visualization of the data.

File Description

ETL Pipeline Preparation.ipynb: Notebook contains ETL Pipeline.
ML Pipeline Preparation.ipynb: Notebook contains ML Pipeline.
etl.db: etl database.
categories.csv: Categories data set.
messages.csv: Messages data set.
classifier.pkl: Trained model pickle file.
train_classifier.py: Python file for model training.
transformation.py: Helper file for train_classifier.py
disaster_categories.csv: Disaster Categories data set.
disaster_messages.csv: Disaster Messages data set.
process_data.py: Python ETL script.
app: Flask Web App
run.py: Flask Web App main script.
img: Image Folder
requirements.txt: Text file containing list of packages used.
LICENSE: Project LICENSE file.

Dataset

This disaster data is from Figure Eight This dataset has two files messages.csv and categories.csv.

Data Cleaning

Based on id two datasets were first merged into df.
Categories were split into separate category columns.
Category values were converted to numbers 0 or 1.
Replaced categories column in df with new category columns.
Removed duplicates based on the message column.
df were exported to etl.db database.

Modeling Process

Wrote a tokenization function to process text data.
Build a machine learning pipeline using TfidfVectorizer, RandomForestClassifier, and Pipeline.
Split the data into training and test sets.
Using pipeline trained and evaluated a simple RandomForestClassifier.
Then using hyperparameter tuning with 5 fold cross-validation fitted 100 models to find the best random forest model for predicting disaster response category. Random Forest best parameters were


{'clf__criterion': 'entropy',
 'clf__max_depth': 40,
 'clf__max_features': 'auto',
 'clf__random_state': 42}

Using this best model we've made train_classifier.py

Screenshots

Model results

Our final RandomForestClassifier model with 5 fold cross-validation has the following results.

Effect of Imbalance:

The dataset is an imbalance. We can get an idea of an imbalance from the following image.

For imbalanced classes with fewer samples, the model will not generalize well. For various categories, we should focus on recall as all the categories has the same precision.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
data		data
img		img
models		models
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Procfile		Procfile
README.md		README.md
requirements.txt		requirements.txt
requirementsprod.txt		requirementsprod.txt
runtime.txt		runtime.txt
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Response Pipeline Project

Table of Contents

Instructions:

Summary:

File Description

Dataset

Data Cleaning

Modeling Process

Screenshots

Model results

Effect of Imbalance:

About

Releases

Packages

Languages

License

ranamahmud/disaster-response-pipelines-udacity

Folders and files

Latest commit

History

Repository files navigation

Disaster Response Pipeline Project

Table of Contents

Instructions:

Summary:

File Description

Dataset

Data Cleaning

Modeling Process

Screenshots

Model results

Effect of Imbalance:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages