We implemented an analysis (meaning both a method and a system) that aim to gauge local support for the two major US political parties in the 68 most competitive House of Representative districts during the 2018 U.S. mid-term elections.
The analysis attempts to mirror the “Generic Ballot” poll, i.e., a survey of voters of a particular district which aims to measure local popularity of national parties by querying participants on the likelihood they would vote for a “generic” Democrat or Republican candidate. We collect the tweets containing national parties and politicians in the 68 most competitive districts. By most competitive we mean that they are rated as: toss up, 50%-50%, or lean by the Cook Political Report.
This means we are addressing an extremely challenging analysis and prediction problem, while disregarding the simpler cases (everyone is good at predicting the obvious!).
Our solution employs the Twitter Search API to query for tweets mentioning a national leader or party, posted form a limited geographic region (i.e., each specific congressional district). For example, the following query extracts tweets on Republicans:
TRUMP OR REPS OR Republicans OR Republican OR MCCCONNELL OR ‘MIKE PENCE’ OR ‘PAUL RYAN’ OR #Republicans OR #REPS OR @realDonaldTrumpOR @SpeakerRyan OR @senatemajldr OR @VP OR GOP OR @POTUS
To limit the search to each congressional district, we use the geocode field in the search query of the API, which queries a circular area based on the coordinates of the center and the radius. Because of the irregular shape of the congressional districts, multiple queries are needed for each of them, therefore we built a custom set of bubbles that approximate the district shape.
For the analysis of the tweets, we adopted a Recurrent Neural Network, namely a RNN-LSTM binary classifier trained on tweets.
To build training and testing data we collected tweets of users with clear political affiliation, including candidates, political activists, and also lesser know users, well versed in the political vernacular.
The accounts selected yielded around 280,000 tweets in 6 months before election day, labeled based on the author’s political affiliation.
Notice that the method is a general political-purpose language-independent analysis framework, that can be applied to any national or local context.
Further details and the results can be found on this Medium post.
This work has been published as a short scientific paper presented at IEEE Big Data Conference in Seattle, WA on December 2018 and on a previous Medium post by Antonio Lopardo.
You can also download a poster format reporting the work:
In case you want to cite the work, you can do it in this way:
A. Lopardo and M. Brambilla, “Analyzing and Predicting the US Midterm Elections on Twitter with Recurrent Neural Networks,” 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 5389–5391.
The online running prototype, the full description of the project, its results, and source code are available at http://www.twitterpoliticalsentiment.com/USA/.