A Network-Driven Approach for Characterizing Emoji Usage in Social Media
Abstract
With the rapid growth in the number of online users, people tend to use emojis
to enrich their text with emotions. In this dissertation, after an overview of the
previous studies on emojis, we provide a brief introduction to emojis and the way
they are constructed by the Unicode codes points. Then, we extract the emojis
from messages collected from Twitter in different topics. In the first step towards
analyzing the emoji usage on social media, we created the directed weighted co-occurrence network of emojis for each topic. By analyzing these networks, we
realized that emoji usage has a similar structure regardless of topic. Then we
show that most of the emojis are grouped in the top 5 communities of those networks. Later on, we show that most of the emojis are used in positive sentiment
tweets. As a further exploration, by analyzing the distribution of the position of
emojis, we realized that most of the emojis are used at the end of tweets, and
this happens independent of the sentiment of emojis. We also showed that the
semantics of emojis are changing through different categories. In order to find the
cultural differences reflected by emojis, we consider languages and countries as two
indicators of culture. We divide the whole data set with respect to the language of
the tweets and we call these the subject-based language data sets. Then, we create
the network of each subject-based language data set. Following this, we extracted the node betweenness, and PageRank scores of the emojis. After calculating the
rank correlation between the pairs of the subject-based language data sets, we
cluster them using Ward’s method of hierarchical clustering. We show that some
languages are similar in spite of the fact that they may seem to have less similarity based on the language family they come from. We follow the same procedure
for countries and show the similarity between the subject-based country data sets
and some of the social and economical indices. We also introduce a novel way for
method validation to show that our method of structural hierarchical clustering
can find meaningful clusters.