Improving Grapheme-to-Phoneme Translation with the use of NALU Attention Mechanisms
Smith, Brian Sheldon
MetadataShow full item record
In recent years, Natural Language Processing in the field of machine learning has seen some major improvements. Data scientists have shown that neural networks are capable of breaking down semantics of sentences, translating languages, and answering complex questions with fast recall. While impressive, these feats all hinge on having access to a massive amount of clean text, or data sets with almost perfect grammar and spelling. Without this, neural networks will usually fail to converge on a meaningful result. To partially this dependency, Grapheme-to-Phoneme conversion can be employed. This is the conversion of words from their spellings to a form that more closely matches their pronunciations. Since most spelling errors hold their phonetic pronunciation, word conversion to phonemes should improve network convergence in datasets that contain occasional spelling errors. Phoneme conversion is a well-researched topic, with state-of-the-art models having a 20% word error rate. This error rate stems from model training being stopped early to retain accuracy on out-of-vocabulary words. To alleviate this, this paper employs the use of Neural Arithmetic Logic Units. A recent study on these neurons show that they have greatly increased generalization capabilities over standard neural network layers. When used in a recurrent attention mechanism, phoneme conversion models overfit at a much slower rate, allowing for a word error rate of less than 10%.