Performance Evaluation of Speech Recognition System Using Conventional and Hybrid Features and Hidden Markov Model Classifier
Abstract
Extracting the auditory information from speech signal is considered a
computationally demanding task. However, past researches in mathematics,
acoustics, and speech technology have provided many methods for signal processing
and modeling. Although, all methods have their strengths and weaknesses, but they
remain a serious attempt towards speech recognition system.
Using multivariate statistical machine learning (Hidden Markov Model), this
work investigates the performance of selected conventional and new hybrid feature
extraction algorithms in both clean and noisy environments. The resultant
conventional features include MFCC, LPCC, PLP, and RASTA-PLP, while the new
hybrid features include LPR, MLP, MLR, and MPR. The whole speech system was
designed using MATLAB software, and evaluated using isolated-word human voice
corpus (TIDIGITS). This data set are consists of eleven words (zero to nine and the
letter O), sampled at 8-kHz and digitalized with a resolution of 16 bit, recorded from 208 different adult speakers (men & women), each person uttered each word two
times.
Giving a dependency in multi-dimensions through transition probabilities
organized in a Markov mesh, HMMs Pattern matching technique considers the
observations statistically dependent on neighboring observations as shown; In
training session HMM, generates several reference models and stored in for later use.
With a statistical model in hand, we can perform several important tasks related to
speech recognition. In testing session, statistical models were applied to find the
highest probability that helps to generate the decision in order to recognize the
unknown word. Consequently, training models are derived in to evaluate the
behavior of the proposal speech recognition system based on WER scale, and all the
results are compared with some ready published models.
The results showed that the acoustic signals extracted using LPC and LPR
algorithms are given the best recognition rate at 99.9949% and 99.9733% in quite
condition, while in noisy condition, RASTA-PLP algorithm was provides the best
recognition rate by 98.9999%, 98.7945%, 94.7672, and 93.9809% at 30, 20, 10, 5db
respectively. As far as the validity of the commonly used models is concerned, the
comparison to the measurements reveals that the applicability of those models for
the studied environment is still debatable. The main technical contribution of this
research is a way of estimating the parameters of new four hybrid feature extraction
algorithms comparing with conventional features. So, this research can serve as a
useful reference for the engineers to design ASR applications.