Our lab is participating in the PERISCOPE H2020 project, a partnership of 30+ top European universities and associations of professionals worked together for the last two years to study data, policies, actions, and effects of pandemic management. Besides the high-impact research results, the consortium worked on implementing educational materials and courses.
“One Health: pandemic preparedness, prevention, and response”, produced by the Karolinska Institutet and the Federation of European Academies of Medicine (FEAM).
“Strengthening territorial response for better health”, realised by the European Regional and Local Health Authorities (EUREGHA), Mental Health Europe (MHE), and Agència de Qualitat i Avaluació Sanitàries de Catalunya (AQuAS).
I’ve been invited to give a keynote talk at the WISE 2022 Conference. Thinking about it, I decided to focus on my idea of a bi-verse. To me, the bi-verse is the duality between the physical and digital worlds.
On one side, the Web and social media are the environments where people post their content, opinions, activities, and resources. Therefore, a considerable amount of user-generated content is produced every day for a wide variety of purposes.
On the other side, people live their everyday life immersed in the physical world, where society, economy, politics, and personal relations continuously evolve. These two opposite and complementary environments are today fully integrated: they reflect each other and they interact with each other in a stronger and stronger way.
Exploring and studying content and data coming from both environments offers a great opportunity to understand the ever-evolving modern society, in terms of topics of interest, events, relations, and behavior.
This slidedeck summarizes my contribution:
In my speech, I discuss business cases and socio-political scenarios, to show how we can extract insights and understand reality by combining and analyzing data from the digital and physical world, so as to reach a better overall picture of reality itself. Along this path, we need to keep into account that reality is complex and varies in time, space, and many other dimensions, including societal and economic variables. The speech highlights the main challenges that need to be addressed and outlines some data science strategies that can be applied to tackle these specific challenges.
Machine learning and AI are facing a new challenge: making models more explainable.
This means to develop new methodologies to describe the behaviour of widely adopted black-box models, i.e., high-performing models whose internal logic is challenging to describe, justify, and understand from a human perspective.
The final goal of an explainability method is to faithfully describe the behaviour of a (black-box) model to users who can get a better understanding of its logic, thus increasing the trust and acceptance of the system.
Unfortunately, state-of-the-art explainability approaches may not be enough to guarantee the full understandability of explanations from a human perspective. For this reason, human-in-the-loop methods have been widely employed to enhance and/or evaluate explanations of machine learning models. These approaches focus on collecting human knowledge that AI systems can then employ or involving humans to achieve their objectives (e.g., evaluating or improving the system).
Based on these assumptions and requirements, we published a review article that aims to present a literature overview on collecting and employing human knowledge to improve and evaluate the understandability of machine learning models through human-in-the-loop approaches. The paper features a discussion on the challenges, state-of-the-art, and future trends in explainability.
The paper starts from the definition of the notion of “explanation” as an “interface between humans and a decision-maker that is, at the same time, both an accurate proxy of the decision-maker and comprehensible to humans”. Such a description highlights two fundamental features an explanation should have. It must be accurate, i.e., it must faithfully represent the model’s behaviour, and comprehensible, i.e., any human should be able to understand the meaning it conveys.
The Role of Human Knowledge in Explainable AI
The figure above summarizes the four main ways to use human knowledge in explainability, namely: knowledge collection for explainability (red), explainability evaluation (green), understanding human’s perspective in explainability (blue), and improving model explainability (yellow). In the schema, the icons represent human actors.
Despite the increasing limitations for unvaccinated people, in many European countries, there is still a non-negligible fraction of individuals who refuse to get vaccinated against SARS-CoV-2, undermining governmental efforts to eradicate the virus.
Within the PERISCOPE project, we studied the role of online social media in influencing individuals’ opinions about getting vaccinated by designing a large-scale collection of Twitter messages in three different languages — French, German, and Italian — and providing public access to the data collected. This work was implemented in collaboration with Observatory on Social Media, Indiana University, Bloomington, USA.
Focusing on the European context, we devised an open dataset called VaccinEU, that aims to help researchers to better understand the impact of online (mis)information about vaccines and design more accurate communication strategies to maximize vaccination coverage.
The dataset is openly accessible in a Dataverse repository and a GitHub repository.
Furthermore, a description has been published in a paper at ICWSM 2022 (open access), which can be cited as:
Di Giovanni, M., Pierri, F., Torres-Lugo, C., & Brambilla, M. (2022). VaccinEU: COVID-19 Vaccine Conversations on Twitter in French, German and Italian. Proceedings of the International AAAI Conference on Web and Social Media, 16(1), 1236-1244. https://ojs.aaai.org/index.php/ICWSM/article/view/19374
The spread of AI and black-box machine learning models makes it necessary to explain their behavior. Consequently, the research field of Explainable AI was born. The main objective of an Explainable AI system is to be understood by a human as the final beneficiary of the model.
In our research we just published on Frontiers in Artificial Intelligence, we frame the explainability problem from the crowd’s point of view and engage both users and AI researchers through a gamified crowdsourcing framework. We research whether it’s possible to improve the crowd’s understanding of black-box models and the quality of the crowdsourced content by engaging users in gamified activities through a crowdsourcing framework called EXP-Crowd. While users engage in such activities, AI researchers organize and share AI- and explainability-related knowledge to educate users.
The next diagram shows the interaction flows of researchers (dashed cyan arrows) and users (orange plain arrows) with the activities devised within our framework. Researchers organize users’ knowledge and set up activities to collect data. As users engage with such activities, they provide Content to researchers. In turn, researchers give the user feedback about the activity they performed. Such feedback aims to improve users’ understanding of the activity itself, the knowledge, and the context provided within it.
Interaction flows of researchers (dashed cyan arrows) and users (orange plain arrows) in the EXP-Crowd framework.
One of the crucial steps in the process is the questions and annotation challenge, where Player 1 asks yes/no questions about the entity to be explained. Player 2 answers such questions, and then is asked to complete a series of simple tasks to identify the guessed feature by answering questions and potentially annotating the picture as shown below.
Questioning and annotation steps within the explanation game.
If you are interested in more details, you can read the full EXP-Crowd paper on the journal site (full open access):
You can cite the paper as:
Tocchetti A., Corti L., Brambilla M., and Celino I. (2022). EXP-Crowd: A Gamified Crowdsourcing Framework for Explainability. Frontiers in Artificial Intelligence 5:826499. doi: 10.3389/frai.2022.826499
We will join and contribute to the final TRIGGER conference is scheduled for May 31st, 2022 in Brussels.
The theme is: “Rethinking the EU’s role in global governance”. In this context, the TRIGGER project is going to present the main research outcomes of the H2020 research program that started in 2018, setting the stage for the collaboration among 14 international partners.
We will present our main contributions, namely PERSEUS and COCTEAU.
A quick intro to PERSEUS is available in this video:
Further details about the event are available here:
Online reviews have long represented a valuable source for data analysis in the tourism field, but these data sources have been mostly studied in terms of the numerical ratings offered by the review platforms.
In a recent article (available as full open-access) and a respective blog post, we explored if social media and online review platforms can be a good source of quantitative evaluation of service quality of cultural venues, such as museums, theaters and so on. Our paper applies automatic analysis of online reviews, by comparing two different automated analysis approaches to evaluate which of the two is more adequate for assessing the quality dimensions. The analysis covers user-generated reviews over the top 100 Italian museums.
Specifically, we compare two approaches:
a ‘top-down’ approach that is based on a supervised classification based upon strategic choices defined by policy makers’ guidelines at the national level;
a ‘bottom-up’ approach that is based on an unsupervised topic model of the online words of reviewers.
The misalignment of the results of the ‘top-down’ strategic studies and ‘bottom-up’ data-driven approaches highlights how data science can offer an important contribution to decision making in cultural tourism. Both the analysis approaches have been applied to the same dataset of 14,250 Italian reviews.
We identified five quality dimensions that follow the ‘top-down’ perspective:Ticketing and Welcoming, Space, Comfort, Activities, and Communication. Each of these dimensions has been considered as a class in a classification problem over user reviews. The top down approach allowed us to tag each review as descriptive of one of those 5 dimensions. Classification has been implemented both as a machine learning classification problem (using BERT, accuracy 88%) and as and keyword-based tagging (accuracy 80%).
The ‘bottom-up’ approach has been implemented through an unsupervised topic modelling approach, namely LDA (Latent Dirichlet Allocation), implemented and tuned over a range up to 30 topics. The best ‘bottom-up’ model we selected identifies 13 latent dimensions in review texts. We further integrated them in 3 main topics: Museum Cultural Heritage, Personal Experience and Museum Services.
The ‘top-down’ approach (based on a set of keywords defined from the standards issued by the policy maker) resulted in 63% of online reviews that did not fit into any of the predefined quality dimension.
63% of the reviews could not be assessed against the official top-down service quality categories.
The ‘bottom-up’ data-driven approach overcomes this limitation by searching for the aspects of interest using reviewers’ own words. Indeed, usually museum reviews discuss more about a museum’s cultural heritage aspects (46% average probability) and personal experiences (31% average probability) than the services offered by the museum (23% average probability).
Among the various quantitative findings of the study, I think the most important point is that the aspects considered as quality dimensions by the decision maker can be highly different from those aspects perceived as quality dimensions by museum visitors.
You can find out more about this analysis by reading the full article published online as open-access, or this longer blog post . The full reference to the paper is:
Agostino, D.; Brambilla, M.; Pavanetto, S.; Riva, P. The Contribution of Online Reviews for Quality Evaluation of Cultural Tourism Offers: The Experience of Italian Museums. Sustainability2021, 13, 13340. https://doi.org/10.3390/su132313340
Frequent words and co-occurrences used by pro-vaccination and anti-vaccination communities.
In this study, we map the Twitter discourse around vaccinations in English along four years, in order to:
discover the volumes and trends of the conversation;
compare the discussion on Twitter with newspapers’ content; and
classify people as pro- or anti- vaccination and explore how their behavior is different.
Datasets. We collected four years of Twitter data (January 2016 – January 2020) about vaccination, before the advent of the Covid-19 pandemic, using three keywords: ’vaccine’, ’vaccination’, and ’immunization’, obtaining around 6.5 MLN tweets. The collection has been analyzed across multiple dimensions and aspects. General
Analysis. The analysis shows that the number of tweets related to the topic in- creased through the years, peaking in 2019. Among others, we identified the 2019 measles outbreak as one of the main reasons for the growth, given the correlation of the tweets volume with CDC (Centers for Disease Control and Prevention) data on measles cases in the United States in 2019 and with the high number of newspaper articles on the topic, which both significantly increased in 2019. Other demographic, space-time, and content analysis have been performed too.
Subjects. Besides the general data analysis, we considered a number of specific topics often addressed within the vaccine conversation, such as the flu vaccine, hpv, polio, and others. We identified the temporal trends and performed specific analysis related to these subjects, also in connection with the respective media coverage.
News Sources. We analyzed the news sources most cited in the tweets, which include Youtube, NaturalNews (which is generally considered as a biased and fake news website) and Facebook. Overall, among the most cited sources, 32% can be labeled as reliable and 25% as conspiracy/fake news sources. Furthermore 32% of the references point to social networks (including Youtube). This analysis shows how social media and non-reliable sources of information frequently drive vaccine-related conversation on Twitter.
User Stance. We applied stance analysis on the authors of the tweets, to determine the user’s orientation toward a given (pre-chosen) target of interest. Our initial content analysis revealed that a large amount of the content is of satirical or derisive nature, causing a number of classification techniques to perform poorly on the dataset. Given that other studies considered the presence of stance-indicative hashtags as an effective way to discover polarized tweets and users, a rule-based classification was applied, based on a selection of 100+ hashtags that allowed to automatically classify a tweet as pro-vaccination or vaccination-skeptic, obtain- ing a total of 250,000+ classified tweets over the 4 years.
Share of pro- and anti- vaccine discourse in time. Pro-vaccine tweet volumes appear to be larger than anti-vaccine tweets and to increase over time.
The words used by the two groups of users to discuss of vaccine-related topics are profoundly different, as are the sources of information they refer to. Anti-vaccine users cited mostly fake news websites and very few reliable sources, which are instead largely cited by pro-vaccine users. Social media (primarily Youtube) represent a large portion of linked content in both cases.
Additionally, we performed demographics (age, gender, ethnicity) and spatial analysis over the two categories of users with the aim of understanding the features of the two communities. Our analysis also shows to which extent the different states are polarized pro or against vaccination in the U.S. on Twitter.
Stance of US states towards vaccination.
A video presenting our research is available on YouTube:
This work has been presented at the IC2S2 conference.
PERISCOPE (“Pan-European Response to the Impacts of COVID-19 and future Pandemics and Epidemics”) is a large-scale project that aims at mapping and analysing the impacts of the COVID-19 pandemic, developing solutions and guidance for policymakers and health authorities on how to mitigate the impact of the pandemic, and enhancing Europe’s preparedness for future similar events. We plan to promote science-based policies for the post-pandemic society, in a way that orients future recovery towards enhanced resilience and sustainability. PERISCOPE is funded by the European Union Horizon 2020 programme for research and innovation, for the period November 2020-October 2023.
In our three-year journey, we plan to continuously collect good practices and innovative solutions that have proven effective in the containment of the pandemic, in the protection of the economy and society, in the management and organisation of healthcare facilities, or in the mitigation of indirect effects of the restrictions adopted throughout Europe, including mental health and inequalities. From the reorganisation of hospitals to the use of technology in social distancing and contact tracing, to innovative modes of disbursing funds to citizens and businesses, we commit to keeping our eyes open to all successful applications or solutions that could potentially be emulated in other parts of Europe, or inspire socially beneficial innovation.
We are launching a call for good practices directed at public authorities, businesses, civil society, academics from all over Europe and beyond, in order to identify solutions implemented during 2020, which proved useful and effective in achieving their intended objectives. We only ask respondents to provide us with a very short description, help us classify the good practices according to the categories specified below, and possibly be available for further clarifications in case we need important information. We at PERISCOPE will do the rest. We will analyse the proposed practice and evaluate its transferability to other parts of the European territory, and identify good practices to be promoted throughout Europe.
The areas of interest in our collection of good practices include: Education and training: (for example, modes of distance learning, organising student rotations at school, training teachers on online tools, training healthcare professionals, etc.); use of digital technologies (e.g. contact-tracing apps; use of data from mobile operators or tech platforms; crowdsourcing solutions; use of Artificial Intelligence in testing and tracing; etc.); financial aid to citizens and businesses (direct payments, access to subsidies, rating resilience or sustainability of recipients of funds); reorganisation of hospital and intensive care facilities; transportation and logistics; and more.
The first cut-off date for submitting good practices is December 31, 2020. After that date, we will compile a first report and publish it on our website, on the press and in scientific articles. By contributing valuable experience, you can help us learn and transfer practices that can save lives and improve individual well-being in Europe and beyond.
Starting today, our team at the Data Science Lab Polimi will participate to the PERISCOPE European project.
PERISCOPE will investigate the broad socio-economic and behavioral impacts of
the COVID-19 pandemic, to make Europe more resilient and prepared for future
large-scale risks.
The European Commission approved PERISCOPE (PAN-EUROPEAN RESPONSE TO THE IMPACTS OF COVID-19 AND FUTURE PANDEMICS AND EPIDEMICS), a large-scale research project that brings together 32 European institutions and is coordinated by the University of Pavia. PERISCOPE is a Horizon 2020 research project that was funded with almost 10 million Euros under the Coronavirus Global Response initiative launched in May 2020 by the European Commission President Ursula von der Leyen. The goal of PERISCOPE is to shed light into the broad socio-economic and behavioral impacts of COVID-19. A multidisciplinary consortium will bring together experts in all aspects of the current outbreak: clinic and epidemiologic; socio-economic and political; statistical and technological.
The partners of the consortium will carry out theoretical and experimental research to contribute to a deeper
understanding of the short- and long-term impacts of the pandemic and the measures adopted to contain it. Such
research-intensive activities will allow the consortium to propose measures to prepare Europe for future pandemics
and epidemics in a relatively short timeline.
The main goals of PERISCOPE are:
to gather data on the broad impacts of COVID-19 in order to develop a comprehensive, user-friendly, openly accessible COVID Atlas, which should become a reference tool for researchers and policymakers, and a dynamic source of information to disseminate to the general public;
to perform innovative statistical analysis on the collected data, with the help of various methods including machine learning tools;
to identify successful practices and approaches adopted at the local level, which could be scaled up at the pan-European level for a better containment of the pandemic and its related socio-economic impacts; and
to develop guidance for policymakers at all levels of government, in order to enhance Europe’s preparedness for future similar events and proposed reforms in the multi-level governance of health.
PERISCOPE started on 1 November 2020 and will last until 31 October 2023. You can reach the project members and follow our activities through these social media profiles: