Social networks are huge continuous sources of information that can be used to analyze people’s behavior and thoughts.
Our goal is to extract such information and predict political inclinations of users.
In particular, we investigate the importance of syntactic features of texts written by users when they post on social media. Our hypothesis is that people belonging to the same political party write in similar ways, thus they can be classified properly on the basis of the words that they use.
We analyze tweets because Twitter is commonly used in Italy for discussing about politics; moreover, it provides an official API that can be easily exploited for data extraction. Many classifiers were applied to different kinds of features and NLP vectorization methods in order to obtain the best method capable of confirming our hypothesis.
To evaluate their accuracy, a set of current Italian deputies with consistent activity in Twitter has been selected as ground truth, and we have then predicted their political party. Using the results of our analysis, we also got interesting insights into current Italian politics. Here are the clusters of users:
Results in understanding political alignment are quite good, as reported in the confusion matrix here:
Our study is described in detail in the paper published in the IEEE Big Data 2018 conference and linked at:
The article can be downloaded here, if you don’t have access to IEEE library.
You can also look at the slides on SlideShare:
You can cite the paper as follows:
M. Di Giovanni, M. Brambilla, S. Ceri, F. Daniel and G. Ramponi, “Content-based Classification of Political Inclinations of Twitter Users,” 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 4321-4327.