News Sharing Behaviour on Twitter. A Dataset and a Pipeline

Online social media are changing the news industry and revolutionizing the traditional role of journalists and newspapers. In this scenario, investigating the behaviour of users in relationship to news sharing is relevant, as it provides means for understanding the impact of online news, their propagation within social communities, their impact on the formation of opinions, and also for effectively detecting individual stance relative to specific news or topics, as well as for understanding the role of journalism today.

Our contribution is two-fold.

First, we build a robust pipeline for collecting datasets describing news sharing; the pipeline takes as input a list of news sources and generates a large collection of articles, of the accounts that provide them on the social media either directly or by retweeting, and of the social activities performed by these accounts.

The dataset is published on Harvard Dataverse:

Second, we also provide a large-scale dataset that can be used to study the social behavior of Twitter users and their involvement in the dissemination of news items. Finally we show an application of our data collection in the context of political stance classification and we suggest other potential usages of the presented resources.

The code is published on GitHub:

The details of our approach is published in a paper at ICWSM 2019 accessible online.

You can cite the paper as:

Giovanni Brena, Marco Brambilla, Stefano Ceri, Marco Di Giovanni, Francesco Pierri, Giorgia Ramponi. News Sharing User Behaviour on Twitter: A Comprehensive Data Collection of News Articles and Social Interactions. AAAI ICWSM 2019, pp. 592-597.

Slides are on Slideshare:

Modeling and data science for citizens: multicultural diversity and environmental monitoring at ICWSM

This year we decided to be present at ICWSM 2016 in Cologne, with two contributions that basically blend model driven software engineering and big data analysis, to provide value to users and citizens both in terms of high quality software and added value information provision.

We joined with two papers, respectively:
Model Driven Development of Social Media Environmental Monitoring Applications presented at the SWEEM (Workshop on the Social Web for Environmental and Ecological Monitoring) workshop.

Slides here:


Studying Multicultural Diversity of Cities and Neighborhoods through Social Media Language Detection, presented at the CityLab workshop at ICWSM 2016. The focus of this work is to study cities as melting pots of people with different culture, religion, and language. Through multilingual analysis of Twitter contents shared within a city, we analyze the prevalent language in the different neighborhoods of the city and we compare the results with census data, in order to highlight any parallelisms or discrepancies between the two data sources. We show that the officially identified neighborhoods are actually representing significantly different communities and that the use of the social media as a data source helps to detect those weak signals that are not captured from traditional data. Slides here:

We now continuously look for new dataset and computational challenges. Feel free to ask or to propose ideas!

