Content-based Classification of Political Inclinations of Twitter Users

Social networks are huge continuous sources of information that can be used to analyze people’s behavior and thoughts.

Our goal is to extract such information and predict political inclinations of users.

In particular, we investigate the importance of syntactic features of texts written by users when they post on social media. Our hypothesis is that people belonging to the same political party write in similar ways, thus they can be classified properly on the basis of the words that they use.

We analyze tweets because Twitter is commonly used in Italy for discussing about politics; moreover, it provides an official API that can be easily exploited for data extraction. Many classifiers were applied to different kinds of features and NLP vectorization methods in order to obtain the best method capable of confirming our hypothesis.

To evaluate their accuracy, a set of current Italian deputies with consistent activity in Twitter has been selected as ground truth, and we have then predicted their political party. Using the results of our analysis, we also got interesting insights into current Italian politics. Here are the clusters of users:

ieee-big-data-2018-twitter-elections-clusters

Results in understanding political alignment are quite good, as reported in the confusion matrix here: ieee-big-data-2018-twitter-elections-parties

Our study is described in detail in the paper published in the IEEE Big Data 2018 conference and linked at:

DOI: 10.1109/BigData.2018.8622040

The article can be downloaded here, if you don’t have access to IEEE library.

You can also look at the slides on SlideShare:

You can cite the paper as follows:

M. Di Giovanni, M. Brambilla, S. Ceri, F. Daniel and G. Ramponi, “Content-based Classification of Political Inclinations of Twitter Users,” 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 4321-4327.
doi: 10.1109/BigData.2018.8622040

Predictive Analysis on U.S. Midterm Elections on Twitter with RNN

We implemented an analysis (meaning both a method and a system) that aim to gauge local support for the two major US political parties in the 68 most competitive House of Representative districts during the 2018 U.S. mid-term elections.

The analysis attempts to mirror the “Generic Ballot” poll, i.e., a survey of voters of a particular district which aims to measure local popularity of national parties by querying participants on the likelihood they would vote for a “generic” Democrat or Republican candidate. We collect the tweets containing national parties and politicians in the 68 most competitive districts. By most competitive we mean that they are rated as: toss up50%-50%, or lean by the Cook Political Report.

This means we are addressing an extremely challenging analysis and prediction problem, while disregarding the simpler cases (everyone is good at predicting the obvious!).

Our solution employs the Twitter Search API to query for tweets mentioning a national leader or party, posted form a limited geographic region (i.e., each specific congressional district). For example, the following query extracts tweets on Republicans:

TRUMP OR REPS OR Republicans OR Republican OR MCCCONNELL OR ‘MIKE PENCE’ OR ‘PAUL RYAN’ OR #Republicans OR #REPS OR @realDonaldTrumpOR @SpeakerRyan OR @senatemajldr OR @VP OR GOP OR @POTUS

To limit the search to each congressional district, we use the geocode field in the search query of the API, which queries a circular area based on the coordinates of the center and the radius. Because of the irregular shape of the congressional districts, multiple queries are needed for each of them, therefore we built a custom set of bubbles that approximate the district shape.

For the analysis of the tweets, we adopted a Recurrent Neural Network, namely a RNN-LSTM binary classifier trained on tweets.

To build training and testing data we collected tweets of users with clear political affiliation, including candidates, political activists, and also lesser know users, well versed in the political vernacular.
The accounts selected yielded around 280,000 tweets in 6 months before election day, labeled based on the author’s political affiliation.

Notice that the method is a general political-purpose language-independent analysis framework, that can be applied to any national or local context.

Further details and the results can be found on this Medium post.

This work has been published as a short scientific paper presented at IEEE Big Data Conference in Seattle, WA on December 2018 and on a previous Medium post by Antonio Lopardo.

You can also download a poster format reporting the work:

poster-midterm

In case you want to cite the work, you can do it in this way:

A. Lopardo and M. Brambilla, “Analyzing and Predicting the US Midterm Elections on Twitter with Recurrent Neural Networks,” 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 5389–5391.
doi: 10.1109/BigData.2018.8622441.
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8622441&isnumber=8621858

The online running prototype, the full description of the project, its results, and source code are available at http://www.twitterpoliticalsentiment.com/USA/.

Understanding Polarized Political Events through Social Media Analysis

Predicting the outcome of elections is a topic that has been extensively studied in political polls, which have generally provided reliable predictions by means of statistical models. In recent years, online social media platforms have become a potential alternative to traditional polls, since they provide large amounts of post and user data, also referring to socio-political aspects.

In this context, we designed a research that aimed at defining a user modeling pipeline to analyze dis cussions and opinions shared on social media regarding polarized political events (such as a public poll or referendum).

The pipeline follows a four-step methodology.

 

  • First, social media posts and users metadata are crawled.
  • Second, a filtering mechanism is applied to filter out spammers and bot users.
  • Third, demographics information is extracted out of the valid users, namely gender, age, ethnicity and location information.
  • Fourth, the political polarity of the users with respect to the analyzed event is predicted.

In the scope of this work, our proposed pipeline is applied to two referendum scenarios:

  • independence of Catalonia in Spain
  • autonomy of Lombardy in Italy

We used these real-world examples to assess the performance of the approach with respect to the capability of collecting correct insights on the demographics of social media users and of predicting the poll results based on the opinions shared by the users.

Cursor_and_KDWEB_2018_paper_1_pdf

Experiments show that the method was effective in predicting the political trends for the Catalonia case, but not for the Lombardy case. Among the various motivations for this, we noticed that in general Twitter was more representative of the users opposing the referendum than the ones in favor.

The work has been presented at the KDWEB workshop at the ICWE 2018 conference.

A preprint of the paper can be downloaded from ArXiv and cited as reported here:

Roberto Napoli, Ali Mert Ertugrul, Alessandro Bozzon, Marco Brambilla. A User Modeling Pipeline for Studying Polarized Political Events in Social Media. KDWeb Workshop 2018, co-located with ICWE 2018, Caceres, Spain, June 2018. arXiv:1807.09459