Show simple item record

dc.contributor.advisorKepuska, Veton Z.
dc.contributor.authorAbdulaziz, Azhar Sabah
dc.date.accessioned2018-06-26T18:03:55Z
dc.date.available2018-06-26T18:03:55Z
dc.date.created2018-07
dc.date.issued2018-05
dc.date.submittedJuly 2018
dc.identifier.urihttp://hdl.handle.net/11141/2511
dc.descriptionThesis (Ph.D.) - Florida Institute of Technology, 2018en_US
dc.description.abstractThe automatic speech recognition (ASR) is a set of complicated algorithms that convert the intended spoken utterance into a textual form. Acoustic features, which are extracted from the speech signal, are matched against a trained network of linguistic and acoustic models. The ASR performance is degraded significantly when the ambient noise is different than that of the training data. Many approaches have been introduced to address this problem with various degrees of complexity and improvement rates. The general pattern of solving this issue lies in three categories: empowering features, train a general acoustic model and transform models to match noisy features. The acoustic noise is added to the training speech data after collecting them for two reasons: firstly because the data are usually recorded in a specific environment and secondly to control the environments during the training and testing phases. The speech and noise signals are usually combined in the electrical domain using straightforward linear addition. Although this procedure is commonly used, it is investigated in depth in this research. It has been proven that the linear addition is no more than an approximation of the real acoustic combination, and it is valid if the speech and noise are non-coherent signals. The adaptive model switching (AMS) solution is proposed, so that the ASR measures the noise level then picks the model that should produce as minimum errors as possible. This solution is a trade-off between model generalization and transformation properties, so that both error and speed costs are maintained as minimum as possible. The short time of silence (STS), which is a signal-to-noise ratio (SNR) level detector, was designed specifically for the proposed system. The proposed AMS approach is a general recipe that could be applied to any other ASR systems, although it was tested on Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) recognizer. The AMS ASR has outperformed the model generalization and multiple-decoder maximum score voting for both accuracy and decoding speed. The average error rate reduction was around 34.11% , with a decoding speed improvement of about 37.79% relatively, both compared to the baseline ASR.en_US
dc.format.mimetypeapplication/pdf
dc.language.isoen_USen_US
dc.rightsCopyright held by author.en_US
dc.titleAutomatic Speech Recognition Adaptation for Various Noise Levelsen_US
dc.typeDissertationen_US
dc.date.updated2018-05-30T18:43:08Z
thesis.degree.nameDoctor of Philosophy in Computer Engineeringen_US
thesis.degree.levelDoctoralen_US
thesis.degree.disciplineComputer Engineeringen_US
thesis.degree.departmentElectrical and Computer Engineeringen_US
thesis.degree.grantorFlorida Institute of Technologyen_US
dc.type.materialtext


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record