# Importing All Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from warnings import filterwarnings
filterwarnings(action='ignore')
- Imports essential libraries for data manipulation (
pandas
,numpy
) and visualization (matplotlib
). - Suppresses warnings to streamline the output.
# Loading Datasets
pd.set_option('display.max_columns', 10, 'display.width', 1000)
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
train.head()
- Reads the training and testing datasets from CSV files into pandas DataFrames.
- Adjusts display options for better readability of DataFrame outputs.
- Displays the first few rows of the training dataset for a quick overview.
# Display shape
train.shape
- Outputs the number of rows and columns in the training dataset.
- Rows: 891
- Columns: 12
Details about handling missing values, encoding categorical variables, and feature engineering are covered later in the notebook. (Placeholder for additional steps once reviewed.)
The notebook implements a machine learning pipeline using RandomForestClassifier
from scikit-learn to predict survival. It involves splitting data, training the model, and evaluating performance metrics like confusion matrix and classification report.
Matplotlib is used to create visualizations (plots to be detailed once reviewed).
- Finalize preprocessing steps and model evaluation details.
- Add summary metrics and insights derived from the analysis.
- license