First Step : We have to partition the original dataset into training(60%) and validation sets(40%) .The model will be fit to the training data and evaluated on the validation set
Based on this estimation results we have: Fare = -23.95 + 7.04COUPON – 1.80NEW + 0.01HI + 0.01S_INCOME + 0.00E_INCOME + 0.00S_POP + 0.00E_POP + 0.07DISTANCE – 0.00PAX – 33.71VACATION_YES – 39.39SW_YES + 19.16SLOT_CTRL + 22.93*GATE_CONS
We are going to predict how mile and flight run by South affect the average airfare prices
If other variables are held constant (all else being equal), one mile increase in distance increases fare by $0.07 on average.
If other variables are held constant (all else being equal), a flight run by Southwest Airlines is $39.39 cheaper than other airlines on average
We used Backward variable selection to reduce the number of predictors Variables COUPON and NEW are removed by backward elimination
We focused on RMSE and Adjusted R^2 Result: The two models perform similarly. It shows that even without COUPON and NEW, we can achieve the same level of performance. In other words, COUPON and NEW are not useful in prediction of FARE.
- Vacation and route that Southwest will have low fares therefore, companies should invest more flights into routes that have these 2 factors.
- New carriers will also have lower fares. There will be more coupon if fare is lower.
- The place has more population will sell more tickets. They should price based on these factors. By giving more coupon and put more route into high population city.
- The high income city will have a higher fares therefore, companies should invests more route into these cities.