Performance Evaluation of Speech Recognition System Using Conventional and Hybrid Features and Hidden Markov Model Classifier
MetadataShow full item record
Extracting the auditory information from speech signal is considered a computationally demanding task. However, past researches in mathematics, acoustics, and speech technology have provided many methods for signal processing and modeling. Although, all methods have their strengths and weaknesses, but they remain a serious attempt towards speech recognition system. Using multivariate statistical machine learning (Hidden Markov Model), this work investigates the performance of selected conventional and new hybrid feature extraction algorithms in both clean and noisy environments. The resultant conventional features include MFCC, LPCC, PLP, and RASTA-PLP, while the new hybrid features include LPR, MLP, MLR, and MPR. The whole speech system was designed using MATLAB software, and evaluated using isolated-word human voice corpus (TIDIGITS). This data set are consists of eleven words (zero to nine and the letter O), sampled at 8-kHz and digitalized with a resolution of 16 bit, recorded from 208 different adult speakers (men & women), each person uttered each word two times. Giving a dependency in multi-dimensions through transition probabilities organized in a Markov mesh, HMMs Pattern matching technique considers the observations statistically dependent on neighboring observations as shown; In training session HMM, generates several reference models and stored in for later use. With a statistical model in hand, we can perform several important tasks related to speech recognition. In testing session, statistical models were applied to find the highest probability that helps to generate the decision in order to recognize the unknown word. Consequently, training models are derived in to evaluate the behavior of the proposal speech recognition system based on WER scale, and all the results are compared with some ready published models. The results showed that the acoustic signals extracted using LPC and LPR algorithms are given the best recognition rate at 99.9949% and 99.9733% in quite condition, while in noisy condition, RASTA-PLP algorithm was provides the best recognition rate by 98.9999%, 98.7945%, 94.7672, and 93.9809% at 30, 20, 10, 5db respectively. As far as the validity of the commonly used models is concerned, the comparison to the measurements reveals that the applicability of those models for the studied environment is still debatable. The main technical contribution of this research is a way of estimating the parameters of new four hybrid feature extraction algorithms comparing with conventional features. So, this research can serve as a useful reference for the engineers to design ASR applications.