dc.description.abstract | One of the difficulties of genetic research is the asymmetrical relationship between data
collection techniques and data analysis techniques. The goal of this research was to test
a novel application of non-negative matrix factorization, which would allow researchers
to more easily identify co-mutations. Those co-mutations then can then be further verified by frequency analysis. This pruning process allows researchers to identify more
fruitful research opportunities, saving time, energy, and funding. Past research has utilized non-negative matrix factorization to extract factors which meaningfully express
underlying data features. This study extends the depth of non-negative matrix factorization knowledge in various ways. First, a novel cost function was utilized to convert
raw genetic data into numerical values appropriate for matrix operations. Second, this
research utilized the alternating non-negative least squares matrix factorization variant
for its faster convergence time compared to the more traditional multiplicative update
approach. Third, traditionally data sets were not factored at multiple factor counts, but this study extends previously established methods by performing an analysis over
multiple factor counts. Fourth, this study suggests evidence that factors produced
by non-negative matrix factorization contain co-mutations, which were verified by a
statistical analysis. Fifth, this study demonstrated that non-negative matrix factorization has an unsupervised ability to partition a data set into chronologically separated
clusters. This research indicates that non-negative matrix factorization is a scalable
algorithm for identifying genetic co-mutations within a practical computational time
frame. | en_US |