Improving Grapheme-to-Phoneme Translation with the use of NALU Attention Mechanisms
Abstract
In recent years, Natural Language Processing in the field of machine learning has
seen some major improvements. Data scientists have shown that neural networks
are capable of breaking down semantics of sentences, translating languages, and answering complex questions with fast recall. While impressive, these feats all hinge
on having access to a massive amount of clean text, or data sets with almost perfect
grammar and spelling. Without this, neural networks will usually fail to converge
on a meaningful result. To partially this dependency, Grapheme-to-Phoneme conversion can be employed. This is the conversion of words from their spellings to
a form that more closely matches their pronunciations. Since most spelling errors
hold their phonetic pronunciation, word conversion to phonemes should improve
network convergence in datasets that contain occasional spelling errors. Phoneme
conversion is a well-researched topic, with state-of-the-art models having a 20%
word error rate. This error rate stems from model training being stopped early to
retain accuracy on out-of-vocabulary words. To alleviate this, this paper employs
the use of Neural Arithmetic Logic Units. A recent study on these neurons show
that they have greatly increased generalization capabilities over standard neural
network layers. When used in a recurrent attention mechanism, phoneme conversion models overfit at a much slower rate, allowing for a word error rate of less
than 10%.