Analysis of Progress in Speech Recognition Models

Artificial Intelligence

Date: June 2024

Miguel Angel Peñaloza Perez

Miguel Angel Peñaloza Perez

This paper focuses on the collection and modeling from scaling laws of a dataset oriented to speech recognition models, with the aim of estimating trends in their capabilities. The dataset was created by reviewing 172 studies related to speech recognition, collecting variables such as the number of model parameters, floating-point operations (FLOPS), and word error rates (WER). The main challenges in developing this research include difficulties encountered when the number of FLOPS is not reported in the reviewed studies. It was found that the architectures with the lowest error rates (WER) are Transformer (2.6% WER) and E-Branchformer (1.81% WER). Change rates in trends for different benchmarks were estimated (Common Voice Spanish 5 months, LibrarySpeech Test Clean 7 months, LibrarySpeech Test Other 10 months). Finally, the high uncertainty of the estimates was noted due to the small sample size, and potential future research directions were suggested.

This paper focuses on the collection and modeling from scaling laws of a dataset oriented to speech recognition models, with the aim of estimating trends in their capabilities. The dataset was created by reviewing 172 studies related to speech recognition, collecting variables such as the number of model parameters, floating-point operations (FLOPS), and word error rates (WER). The main challenges in developing this research include difficulties encountered when the number of FLOPS is not reported in the reviewed studies. It was found that the architectures with the lowest error rates (WER) are Transformer (2.6% WER) and E-Branchformer (1.81% WER). Change rates in trends for different benchmarks were estimated (Common Voice Spanish 5 months, LibrarySpeech Test Clean 7 months, LibrarySpeech Test Other 10 months). Finally, the high uncertainty of the estimates was noted due to the small sample size, and potential future research directions were suggested.