Test development

The Speechace test is made fully automatic through extensive use of natural processing techniques. The test is capable of rating students within ±0.54 point of qualified IELTS examiners. This incredible breakthrough was achieved through painstaking data gathering, manual rating, machine learning modeling and user testing over an 18 month period.

The test was originally introduced as a practice activity in Speechace’s IELTSAce (https://play.google.com/store/apps/details?id=com.ielts.speechace.ieltsace&hl=en_US&gl=US) app. The app is geared towards helping IELTS students prepare for the speaking section on the IELTS exam. The app is available for both Android and iOS devices and has been downloaded by over 1 millon students as evident below:

Audio samples collected from the app were transcribed samples using industry leading speech recognition and then manually graded by 3 qualified IELTS examiners on a variety of parameters including pronunciation, vocabulary, grammar, coherence and relevance. If 2 IELTS raters had more than one point difference in rating then the 3rd rater was asked to arbitrate. Here are the key statistics observed with regard to inter-rater agreement:

% of items on which raters gave exactly the same grade = 21.8% % of items on which raters were within 0.5 IELTS points = 72.7% % of items on which raters were within 1 IELTS points = 98.2% Cohen’s kappa = 0.794 Pearson’s correlation between raters = 0.883 RMSE = 0.674

Once the data was graded, we ran algorithms that evaluated thousands of English syntax and sematic rules to determine which rules mattered the most for assessing pronunciation, fluency, vocabulary, grammar, cohesion and relevance. We then built deep learning machine learning models on the filtered set of rules to accurately predict IELTS scores on a 9 point scale for any arbitrary audio sample. Note that test re-test reliability of our models is found to be 0.82.

PreviousUse cases NextTest reliability

Last updated 2 years ago