Analysis, A Technique, and Incremental Learning of Wake-Up-Word Speech Recognition
Hasanain, Ahmad Zuhair S.
MetadataShow full item record
Even thought the cutting-edge speaker-independent Automatic Speech Recognition (ASR) systems demand big training data, they barely handle time-varying speaking rates, tolerate various uttering alterations, or are robust to noise. In contrast, our Wake-Up-Word (WUW) technique is tuned to these challenges in the light ASR systems with minimal number of initial training samples. It is crucial that users of ASR systems be capable of rolling out new WUW calls swiftly and modifying ASR vocabulary at any time, such as in the cases of foreign WUW addition and adaptation to phonetic change. We had tested our proposed methodologies in the acoustic WUW-II corpus , and they guaranteed roughly 89% (±0.5%) for both Out-Of-Vocabulary (OOV) and In-Vocabulary (INV) word recognition rates. We recommend dual directional (Bidirectional) Dynamic Time Warping (BDTW), a chronological contrast model, and a semi-supervised training procedure. Not only can BDTW produce accurate time alignment of phonemic states, but it can also be utilized for WUW isolation, whereby boundaries of similar sounding patterns are precisely located for autonomous segmenting/retrieval of WUWs from continuous speech streams. A suggested distance/similarity model extracts time warping from the contrast of phones comprising WUWs themselves by exploiting acoustic evidence up front. Hence, minimal prior knowledge about language is needed with the proposed solutions. Additionally, just one utterance is capable of initiating speaker-independent ASR systems when incremental learning is enabled after each test such that the cognitive matching utterance of a speaker is the most probable hypothesis given that each session contains at least one similar utterance. This work also conveys an overview and analysis of fundamentals, implementations, and most importantly, results of empirical tests and future focus.