Digital Automatic Speech Recognition using Kaldi
The Automatic speech recognition (ASR) system is one of most important technologies that used for human-machine interaction. The main goal of the ASR system is to recognize different natural languages that are spoken by humans. The difficulties of these recognition systems depend on many factors, such as noises, variability of the speakers, and problems of continuous speech. For that reason, many researchers and foundations have designed different kinds of licensed toolkits and software that are specialized in building speech recognition systems, including , Julius, Sphinx-4, RWTH ASR, and HTK toolkits. In this thesis, Kaldi toolkit, which is one of the most notable speech recognition tools that is written in C++ and released under the Apache License v2.0, is used to build, train, and evaluate a digital ASR system. First, the speech recognition system has been explained in detail and built using the TIDIGITS corpus. Second, different training approaches(including discriminative training methods) have been studied and applied to improve the accuracy of the speech recognition system. The ASR system accuracy has been evaluated using two evolution metrics: the word error rate(WER) and the sentence error rate (SER). The overall obtained system performance is ranged from 99.05% to 99.55% depending on the training methods that have been applied.