Estimating Effectiveness of Twitter Messages with a Personalized Machine Learning Approach
Abstract
In Twitter, many aspects of retweeting behavior, which is the most effective indicator of
spreading effectiveness of the tweet, have been researched, such as whether a reader will
retweet a certain tweet or not. However, the total number of retweets of the tweet, which is
the quantitative measure of quality, has not been well addressed by existing work. To
estimate the number of retweets and associated factors, this paper proposes a procedure to
develop a personalized model for one author. The training data comes from the author’s
past tweets. We propose 3 types of new features based on the contents of the tweets: Entity,
Pair, and Cluster features, and combine them with features used in prior work. The
experiments on 7 authors demonstrate that comparing to the previous features only. Pair
feature has a statistically significant improvement on the correlation coefficient between
the prediction and the actual number of retweets. We studied all combinations of the 3
types of features, and the combination of the Pair and Cluster features has the best
performance overall. As an application, this work can be used as a personalized tool for an
author to evaluate his/her tweet before posting it, so that he/she can improve the tweet to
achieve more attention.