Online social media are changing the news industry and revolutionizing the traditional role of journalists and newspapers. In this scenario, investigating the behaviour of users in relationship to news sharing is relevant, as it provides means for understanding the impact of online news, their propagation within social communities, their impact on the formation of opinions, and also for effectively detecting individual stance relative to specific news or topics, as well as for understanding the role of journalism today.
Our contribution is two-fold.
First, we build a robust pipeline for collecting datasets describing news sharing; the pipeline takes as input a list of news sources and generates a large collection of articles, of the accounts that provide them on the social media either directly or by retweeting, and of the social activities performed by these accounts.
The dataset is published on Harvard Dataverse:
Second, we also provide a large-scale dataset that can be used to study the social behavior of Twitter users and their involvement in the dissemination of news items. Finally we show an application of our data collection in the context of political stance classification and we suggest other potential usages of the presented resources.
The code is published on GitHub:
The details of our approach is published in a paper at ICWSM 2019 accessible online.
You can cite the paper as:
Giovanni Brena, Marco Brambilla, Stefano Ceri, Marco Di Giovanni, Francesco Pierri, Giorgia Ramponi. News Sharing User Behaviour on Twitter: A Comprehensive Data Collection of News Articles and Social Interactions. AAAI ICWSM 2019, pp. 592-597.
Slides are on Slideshare:
You can also download a summary poster.
Predicting the outcome of elections is a topic that has been extensively studied in political polls, which have generally provided reliable predictions by means of statistical models. In recent years, online social media platforms have become a potential alternative to traditional polls, since they provide large amounts of post and user data, also referring to socio-political aspects.
In this context, we designed a research that aimed at defining a user modeling pipeline to analyze dis cussions and opinions shared on social media regarding polarized political events (such as a public poll or referendum).
The pipeline follows a four-step methodology.
- First, social media posts and users metadata are crawled.
- Second, a filtering mechanism is applied to filter out spammers and bot users.
- Third, demographics information is extracted out of the valid users, namely gender, age, ethnicity and location information.
- Fourth, the political polarity of the users with respect to the analyzed event is predicted.
In the scope of this work, our proposed pipeline is applied to two referendum scenarios:
- independence of Catalonia in Spain
- autonomy of Lombardy in Italy
We used these real-world examples to assess the performance of the approach with respect to the capability of collecting correct insights on the demographics of social media users and of predicting the poll results based on the opinions shared by the users.
Experiments show that the method was effective in predicting the political trends for the Catalonia case, but not for the Lombardy case. Among the various motivations for this, we noticed that in general Twitter was more representative of the users opposing the referendum than the ones in favor.
The work has been presented at the KDWEB workshop at the ICWE 2018 conference.
A preprint of the paper can be downloaded from ArXiv and cited as reported here:
Roberto Napoli, Ali Mert Ertugrul, Alessandro Bozzon, Marco Brambilla. A User Modeling Pipeline for Studying Polarized Political Events in Social Media. KDWeb Workshop 2018, co-located with ICWE 2018, Caceres, Spain, June 2018. arXiv:1807.09459
Together with the Urbanscope team, we gave a TedX talk on the topics and results of the project here at Politecnico di Milano. The talk was actually given by our junior researchers, as we wanted it to be a choral performance as opposed to the typical one-man show.
The message is that cities are not mere physical and organizational devices only: they are informational landscapes where places are shaped more by the streams of data and less by the traditional physical evidences. We devise tools and analysis for understanding these streams and the phenomena they represent, in order to understand better our cities.
Two layers coexist: a thick and dynamic layer of digital traces – the informational membrane – grows everyday on top of the material layer of the territory, the buildings and the infrastructures. The observation, the analysis and the representation of these two layers combined provides valuable insights on how the city is used and lived.
You can now find the video of the talk on the official TedX YouTube channel:
Urbanscope is a research laboratory where collection, organization, analysis, and visualization of cross domain geo-referenced data are experimented.
The research team is based at Politecnico di Milano and encompasses researchers with competencies in Computing Engineering, Communication and Information Design, Management Engineering, and Mathematics.
The aim of Urbanscope is to systematically produce compelling views on urban systems to foster understanding and decision making. Views are like new lenses of a macroscope: they are designed to support the recognition of specific patterns thus enabling new perspectives.
If you enjoyed the show, you can explore our beta application at:
and discover the other data science activities we are conducting at the Data Science Lab of Politecnico, DEIB.