The project uses a dataset of mental health statements tagged with various mental health statuses on kaggle. It includes data preprocessing, exploratory data analysis, model training, and evaluation.
- Data cleaning and preprocessing
- Exploratory Data Analysis (EDA)
- Text preprocessing using spaCy
- TF-IDF vectorization
- Oversampling using SMOTE
- Multiple classification models comparison
- Model evaluation and fine-tuning
- Model Saving
- Python 3.11.x
- Libraries: numpy, pandas, scikit-learn, spacy, imbalanced-learn, matplotlib, seaborn, tqdm, pickle, scipy
- Clone this repository
- Install required packages:
pip install -r requirements.txt
- Download the spaCy English model:
python -m spacy download en_core_web_sm
- Run the Jupyter notebook
Sentiment_Analysis.ipynb
- The notebook will guide you through the entire process from data loading to model evaluation
- Logistic Regression
- Decision Tree Classifier
- Extra Tree Classifier
- AdaBoost Classifier
- Random Forest Classifier
- Extra Trees Classifier
- Gradient Boosting Classifier
- Bagging Classifier
- SGD Classifier
- SVC
- MLP Classifier
The project includes fine-tuning of the best-performing model using RandomizedSearchCV.
- Piyawat Nulek ([email protected])