Analyze the Influence of Features Extracted from Audios on the Results of IELTS Speaking Test

This is a implementation of Analyze the Influence of Features Extracted from Audios on the Results of IELTS Speaking Test by me. It is accepted for presentation at the 2024 International Conference on Advanced Technologies for Communications (ATC) and soon will be published in the IEEE Xplore Digital Library

Introduction

The demand for international English certifications like IELTS, TOEIC, and TOEFL has significantly increased in Vietnam's educational and employment sectors. While self-practice in reading and listening is common, developing speaking and writing skills requires supervision. Unfortunately, access to reputable preparation centers is limited, especially in rural areas, where high tuition fees can be prohibitive. Additionally, instructors struggle to correct speaking errors for multiple students, highlighting the need for an automated assessment system.

Our study aims to fill this research gap by enhancing English language learning for Vietnamese learners. We utilize data from a test preparation center in Ho Chi Minh City to extract meaningful features from audio recordings of speaking tests. Through statistical analysis, we examine how these features affect IELTS speaking scores. We also employ Machine Learning algorithms to predict scores based on the extracted features, yielding promising results. Our research provides valuable insights into speaking criteria assessment and identifies areas for improvement, enabling Vietnamese learners and teachers to adopt effective strategies for enhancing their scores.

Data collection

Attribute	Meaning
Mean hesitation	The average degree of hesitation in speech. It can have values ranging from 0.0 to 1.0. It is calculated by the time gap between the pronunciation of the preceding word and the following word (subtracting the start time of the following word from the end time of the preceding word).
Mean conf	The average confidence level, which ranges from 0.0 to 1.0.
Grammar point	The grammatical score of the speech is calculated using Grammarly’s algorithm.
Unique word	The percentage of unique words spoken in the speech. Reflects the speaker’s vocabulary.
Rare word	The percentage of words spoken that are rare or less common. Reflects the speaker’s proficiency in advanced vocabulary.
Number of words	The total number of words in the speech (including repeated words).
Score	The IELTS speaking score, which is also the target variable.

Data Analysis

Pearson Correlation Coeficients between featurea extracted and the target value

Dependent Variable	r	p-value
Mean conf	0.136	8.581e-5
Mean hesitation	-0.017	0.61452
Grammar point	0.187	6.197e-8
Unique word	0.299	1.649e-18
Rare word	0.256	7.635e-14
Number of words	0.298	2.162e-18

The statistical data indicate that when considering the target variable as a continuous variable, the extracted features are not significant with the target variable except for the variable’s Unique word and Number of words, which have correlation coefficients ≈0.3 at a significance level <0.001.

One-way ANOVA Analysis between IELTS speaking score and feature values

Dependent Variable	F	p-value
Mean conf	11.499	5.630 × 10−14
Mean hesitation	17.857	8.812 × 10−22
Grammar point	4.548	5.409 × 10−05
Unique word	13.602	1.060 × 10−16
Rare word	11.565	4.621 × 10−14
Number of word	15.198	9.332 × 10−19

The boxplots overlap significantly and do not demonstrate classification between score levels. The result of the F one-way ANOVA, with a low p value at high confidence (p value < 0.0001), indicates no significant difference between the groups. However, when visualizing the groupings between the dataset with IELTS speaking scores above 6.5 and the entire dataset in spider chart, we observe differences in the values of the variables Grammar point, Unique word, Rare word, Number of words, and Mean hesitation.

Model Prediction

Metrics	MAE	MSLE	MAP
Random Forest	4.485	0.014	7.829
Logistic Regression	4.606	0.013	8.150
SVM	5.515	0.013	7.840
LightGBM	5.447	0.014	9.277
CatBoost	5.442	0.014	9.269
XGBoost	4.769	0.011	8.220
AdaBoost	5.909	0.019	10.641
SMOTE+Random Forest	4.879	0.016	9.002
SMOTE+Logistic Regression	6.667	0.027	12.552
SMOTE+SVM	5.394	0.019	9.762
SMOTE+LightGBM	5.565	0.014	9.644
SMOTE+CatBoost	5.551	0.013	9.621
SMOTE+XGBoost	5.113	0.012	8.889
SMOTE+AdaBoost	5.939	0.022	11.289
SMOTE+NearMiss+Random Forest	5.061	0.018	9.293
SMOTE+NearMiss+Logistic Regression	7.121	0.029	13.410
SMOTE+NearMiss+SVM	5.818	0.022	10.571
SMOTE+NearMiss+LightGBM	5.549	0.014	9.595
SMOTE+NearMiss+CatBoost	5.537	0.013	9.574
SMOTE+NearMiss+XGBoost	5.001	0.012	8.735
SMOTE+NearMiss+AdaBoost	6.030	0.023	11.401

We observe that Machine Learning models can predict IELTS speaking scores using data extracted from features. The table of results highlights that the Random Forest model achieved the best overall performance with the lowest MAE of 4.485 and the best MAP of 7.829, demonstrating reliable predictions. Additionally, the MAPE of 7.829 indicates that the model’s predictions deviate from the true scores by an average range of 0.313 to 0.626, suggesting a reliable performance. XGBoost stands out with the lowest MSLE of 0.011, indicating its effectiveness in handling relative errors. Moreover, the use of oversampling techniques like SMOTE and NearMiss generally did not improve model performance and often resulted in higher error metrics, indicating that these methods may not be suitable for the given dataset.

Conclusion

This study presents the first data set collected on the characteristics of IELTS speaking practice audios and is also the first to analyze the influence of these characteristics on speaking test scores. Despite the considerable time and effort required for data collection, labeling, and feature extraction, we have achieved promising results. Our proposed model, capable of predicting IELTS speaking test scores based on extracted features, has the potential to reduce the cost of obtaining English proficiency certificates for Vietnamese individuals. Additionally, analyzing the effects of these features on IELTS test scores can help learners and teachers identify specific elements of speaking skills that need improvement.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
vosk-model-en-us-0.22		vosk-model-en-us-0.22
IELTS_Features_Influence.ipynb		IELTS_Features_Influence.ipynb
IELTSaudio_features_extraction.ipynb		IELTSaudio_features_extraction.ipynb
README.md		README.md
visualization_and_model.ipynb		visualization_and_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyze the Influence of Features Extracted from Audios on the Results of IELTS Speaking Test

Introduction

Data collection

Data Analysis

Model Prediction

Conclusion

About

Releases

Packages

Languages

HaMy-DS/IELTS-speaking-features-extraction-prediction

Folders and files

Latest commit

History

Repository files navigation

Analyze the Influence of Features Extracted from Audios on the Results of IELTS Speaking Test

Introduction

Data collection

Data Analysis

Model Prediction

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages