Views : 1185       Downloads : 319 Download PDF




End-to-End Bengali Speech Recognition using DeepSpeech

Corresponding Author : M. M. H. Nahid (nahid-cse@sust.edu)

Authors : M. M. (nahid-cse@sust.edu)

Keywords : Bengali Speech Recognition, Bengali ASR, DeepSpeech, LSTM, CTC

Abstract :

One of the main challenges in Bengali speech recognition is modeling phoneme alignment. The exact prior probabilities of phoneme transitions in an individual word are not known; in this paper, we seek to investigate the DeepSpeech network for recognizing individual Bengali speech samples. There are recurrent LSTM layers at the heart of this network for modeling internal phoneme representation. We have added convolutional layers at the bottom which obviates the need of assuming anything about internal phoneme alignment. We have trained the model with a connectionist temporal classification (CTC) loss function and constructed the transcript by using a beam search decoder. We have tested our method on Bengali real number speech dataset; it incurs 8.20%-word error rate and 3.00%-character error rate on this dataset. The results significantly outperform current existing methods on this dataset.

Published on August 4th, 2019 in Volume 1 Issue 1, Computer Science, Electrical and Electronics