Arabic Speech Recognition Systems
Eljagmani, Hamda M. M.
MetadataShow full item record
Arabic automatic speech recognition is one of the difficult topics of current speech recognition research field. Its difficulty lies on rarity of researches related to Arabic speech recognition and the data available to do the experiments. Moreover, to build Arabic speech recognition system with an optimal word error rate (WER), the system has to be completely trained to the individual user. Even though speaker dependent system can effectively achieve this by training it explicitly for this one speaker, it requires a large amount of training data. In addition speaker dependent system requires to be trained to each speaker individually. For this reasons speaker dependent systems are too time expensive and not suitable for Arabic speech recognition systems where such training sets are not easily available. However, the mentioned problem related to amount of data can be tackled by using speaker independent systems. Since in speaker independent systems there are no relations between the training and test set, their performance is lower than in speaker dependent systems. Additionally, the word error rate is usually high for Arabic automatic speech recognition systems that are trained by native speakers and later used by nonnative speakers. This is because of both acoustic and pronunciation differences and varying accents. The challenge that non native speech recognition faces is to maximize the recognition performance with small amount of non native data available. The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build, evaluate and adapt Arabic speech recognition system. First, Arabic digits speech recognition system is built by using speaker dependent and speaker independent systems to show how the relations between training set and test set affect the recognizer's performance. Furthermore, different test sets are used to test speaker independent system in order to see how variety among speakers will contribute to the recognition performance. Second, Arabic digits speech recognition system is constructed by using native Arabic speakers and tested by both native Arabic and non-native Arabic speakers to show how the differences in pronunciations among non-native speaker and native Arabic speakers have a direct impact on the performance of the system. Finally, Maximum Likelihood Linear Regression (MLLR) adaptation technique is proposed to improve the accuracy of both speaker independent system and native Arabic digits system that is used by non-native speakers. This start off sampling speech data from the new speaker and update the acoustic model according to the features which are extracted from the speech in order to minimize the difference between the acoustic model and the selected speaker. The results show the acoustic model adaptation technique is beneficial to both systems. The systems were evaluated using word level recognition. An overall improvement in absolute recognition rate of 13% and 6.29% for speaker independent and Arabic digits speech recognition system to foreign accented speakers adaptation have been obtained respectively.