Characterization of Written Text Using Data and Network Science
Hamoodat, Harith A. Hamdon
MetadataShow full item record
The success of humans cannot be attributed to language, but it is certainly true that language and humans are inseparable. Since the first language appeared, we have seen that language continually evolving over space and social gatherings to formed around 7,000 languages today. The origin and evolution of languages still vague, and state-of-the-art in languages evolution still lack a comprehensive characterization. In general, this problem is mainly tackled by statistical measuring the changes on the part of the language ( e.g., words and sounds). Given the current availability of data and computational power, this dissertation proposes a comprehensive data-driven characterization of language evolution using vocabulary in two main fields. First, extracted and classified the structural and chronological relations between the languages using its vocabulary. Second, studied the Spatio-temporal effect on language vocabulary and its relation with socio-economic factors ( i.e., educational attainment). The results demonstrated that the proposed method is capable of uncovering the relation between languages from both structural and chronological aspects, also we found that the vocabulary levels can reveal the educational attainment of a resident population for specific areas and times.