On the Characterization of Natural Language Structure and Literary Stylometry - A Network Science Approach
Al Rozz, Younis Anas Younis
MetadataShow full item record
Natural language processing (NLP) techniques have been through many advancements in recent years, linguistics and scientist utilized these techniques to solve many challenges related to written language and literary. Problems such as finding the genetic relationships among languages, attributing author of a text and categorizing text by genre have been treated throughout the years using conventional statistical methods, for instance, bag of words (BoW), N-gram, the frequency of words and the lexical distance between words. By considering written language as a complex system, network science tools and techniques can be used to address those problems. A unified methodology is proposed in this dissertation to achieve this task by (i) Propose a framework for characterizing written language as a complex system; (ii) Define three language related fields that need to be addressed by the proposed methodology; and (iii) For each field: Review related literature to get a solid background of the subject; Collect and process the data then construct the networks; Extract network measures and statistics to build the dataset; Deploy machine learning algorithms to cluster, classify the datasets; Compare and contrast results obtained with one from traditional methods.