Blog

Analysis of Online Reviews for Evaluating the Quality of Cultural Tourism

Online reviews have long represented a valuable source for data analysis in the tourism field, but these data sources have been mostly studied in terms of the numerical ratings offered by the review platforms.

In a recent article (available as full open-access) and a respective blog post, we explored if social media and online review platforms can be a good source of quantitative evaluation of service quality of cultural venues, such as museums, theaters and so on. Our paper applies automatic analysis of online reviews, by comparing two different automated analysis approaches to evaluate which of the two is more adequate for assessing the quality dimensions. The analysis covers user-generated reviews over the top 100 Italian museums. 

Specifically, we compare two approaches:

  • a ‘top-down’ approach that is based on a supervised classification based upon strategic choices defined by policy makers’ guidelines at the national level; 
  • a ‘bottom-up’ approach that is based on an unsupervised topic model of the online words of reviewers.

The misalignment of the results of the ‘top-down’ strategic studies and ‘bottom-up’ data-driven approaches highlights how data science can offer an important contribution to decision making in cultural tourism.  Both the analysis approaches have been applied to the same dataset of 14,250 Italian reviews.

We identified five quality dimensions that follow the ‘top-down’ perspective: Ticketing and Welcoming, Space, Comfort, Activities, and Communication. Each of these dimensions has been considered as a class in a classification problem over user reviews. The top down approach allowed us to tag each review as descriptive of one of those 5 dimensions. Classification has been implemented both as a machine learning classification problem (using BERT, accuracy 88%) and as and keyword-based tagging (accuracy 80%).

The ‘bottom-up’ approach has been implemented through an unsupervised topic modelling approach, namely LDA (Latent Dirichlet Allocation), implemented and tuned over a range up to 30 topics. The best ‘bottom-up’ model we selected identifies 13 latent dimensions in review texts. We further integrated them in 3 main topics: Museum Cultural Heritage, Personal Experience and Museum Services.

The ‘top-down’ approach (based on a set of keywords defined from the standards issued by the policy maker) resulted in 63% of online reviews that did not fit into any of the predefined quality dimension.

63% of the reviews could not be assessed against the official top-down service quality categories.

The ‘bottom-up’ data-driven approach overcomes this limitation by searching for the aspects of interest using reviewers’ own words. Indeed, usually museum reviews discuss more about a museum’s cultural heritage aspects (46% average probability) and personal experiences (31% average probability) than the services offered by the museum (23% average probability).

Among the various quantitative findings of the study, I think the most important point is that the aspects considered as quality dimensions by the decision maker can be highly different from those aspects perceived as quality dimensions by museum visitors.

You can find out more about this analysis by reading the full article published online as open-access, or this longer blog post . The full reference to the paper is:

Agostino, D.; Brambilla, M.; Pavanetto, S.; Riva, P. The Contribution of Online Reviews for Quality Evaluation of Cultural Tourism Offers: The Experience of Italian Museums. Sustainability 2021, 13, 13340. https://doi.org/10.3390/su132313340

A Model-Driven Approach for Multi-experience Development Platforms

Modern User Interfaces (UIs) are becoming complex software artifacts themselves, through integration of AI-enhanced software components that enable even more natural interactions, including the possibility to use Natural Language Processing (NLP) via chatbots or voicebots (aka., Conversational User Interfaces or CUIs).

Some times, several types of UIs are combined as part of the same application (e.g. a chatbot in a web page), what it is known as Multiexperience User Interface. These multiexperience UIs may be built together by using a Multiexperience Development Platform (MXDP).

“Multiexperience development involves ensuring a consistent user experience across web, mobile, wearable, conversational and immersive touchpoints”. [Gartner]

A typical scenario of multiexperience user interaction could unroll as follows (see image below too). Suppose that a customer on a Sunday morning wants to buy a new technical product (a cell phone or a home theater system). He first interacts with his home assistant (like Alexa or Google assistant) to ask it to find the best nearby tech store open on Sunday. With this information in mind, he looks at the store web site on his PC and, being satisfied with the kind of store, he asks the web site chatbot to find the type of products he is looking for. After browsing the various alternatives, he finds one item he likes, and sets the place and the product as preferences on his mobile phone. He reads the details of the product on the phone while walking to his car. When he reaches the car, he transfers the information about the place to the car navigation system and drives there. Finally, in the stores he looks around, tries various items, reads the reviews about them on a dedicated mobile app, and finally picks up the product and pays for it.

This kind of dynamic and seamless interaction demands a variety of complex design and implementation mechanisms to be put in place. Clearly, also very critical integration, evolution, and maintenance challenges need to be faced for these CUIs. Developers need to handle the coordination of the cognitive services to build multiexperience UIs, integrate them with external services, and worry about extensibility, scalability, and maintenance.

We believe a model-driven approach for MXDP could be an important first step towards facilitating the specification of rich UIs able to coordinate and collaborate to provide the best experience for end-users. Indeed, most non-trivial systems adhere to some kind of model-based philosophy, where software design models (including GUI models) are transformed into the production code the system executes at run-time. This transformation can be (semi)automated in some cases.

Our recent research tackles the application of model-driven techniques to the development of software applications embedding a multiexperience UI.

The research has been published in our recent paper Towards a Model-Driven Approach for Multiexperience AI-based User Interfaces, co-authored by Elena PlanasGwendal DanielMarco Brambilla and Jordi Cabot, recently published in the International Journal on Software and Systems Modeling (SoSyM) available online here (open access).

The paper contribution is twofold:

  • we raise the abstraction level used in the definition of this new kind of conversational and smart interfaces.
  • we show how these CUI models can be used in conjunction with more “traditional” GUI models to combine the benefits of all these different types of interfaces in a multiexperience development project.

In practice, we propose a new Domain Specific Language (DSL), that generalizes the one defined by the Xatkit model to cover all types of CUIs, and we show how this seamlessly integrates with appropriate extensions of the IFML model to design comprehensive multi-experience interfaces.

IFML model integrating traditional navigation of a web interface and a chatbot component.

You can refer to the full paper here for covering the details. The paper reference is:

Planas, E., Daniel, G., Brambilla, M., Cabot, J. Towards a model-driven approach for multiexperience AI-based user interfaces. Software and System Modeling (SoSyM) 20, 997–1009 (2021). https://doi.org/10.1007/s10270-021-00904-y

(open access, CC-BY license)

Large-Scale Analysis of On-line Conversation about Vaccines before COVID-19

Frequent words and co-occurrences used by pro-vaccination and anti-vaccination communities.

In this study, we map the Twitter discourse around vaccinations in English along four years, in order to:

  • discover the volumes and trends of the conversation;
  • compare the discussion on Twitter with newspapers’ content; and
  • classify people as pro- or anti- vaccination and explore how their behavior is different.

Datasets. We collected four years of Twitter data (January 2016 – January 2020) about vaccination, before the advent of the Covid-19 pandemic, using three keywords: ’vaccine’, ’vaccination’, and ’immunization’, obtaining around 6.5 MLN tweets. The collection has been analyzed across multiple dimensions and aspects. General

Analysis. The analysis shows that the number of tweets related to the topic in- creased through the years, peaking in 2019. Among others, we identified the 2019 measles outbreak as one of the main reasons for the growth, given the correlation of the tweets volume with CDC (Centers for Disease Control and Prevention) data on measles cases in the United States in 2019 and with the high number of newspaper articles on the topic, which both significantly increased in 2019. Other demographic, space-time, and content analysis have been performed too.

Subjects. Besides the general data analysis, we considered a number of specific topics often addressed within the vaccine conversation, such as the flu vaccine, hpv, polio, and others. We identified the temporal trends and performed specific analysis related to these subjects, also in connection with the respective media coverage.

News Sources. We analyzed the news sources most cited in the tweets, which include Youtube, NaturalNews (which is generally considered as a biased and fake news website) and Facebook. Overall, among the most cited sources, 32% can be labeled as reliable and 25% as conspiracy/fake news sources. Furthermore 32% of the references point to social networks (including Youtube). This analysis shows how social media and non-reliable sources of information frequently drive vaccine-related conversation on Twitter.

User Stance. We applied stance analysis on the authors of the tweets, to determine the user’s orientation toward a given (pre-chosen) target of interest. Our initial content analysis revealed that a large amount of the content is of satirical or derisive nature, causing a number of classification techniques to perform poorly on the dataset. Given that other studies considered the presence of stance-indicative hashtags as an effective way to discover polarized tweets and users, a rule-based classification was applied, based on a selection of 100+ hashtags that allowed to automatically classify a tweet as pro-vaccination or vaccination-skeptic, obtain- ing a total of 250,000+ classified tweets over the 4 years.

Share of pro- and anti- vaccine discourse in time. Pro-vaccine tweet volumes appear to be larger than anti-vaccine tweets and to increase over time.

The words used by the two groups of users to discuss of vaccine-related topics are profoundly different, as are the sources of information they refer to. Anti-vaccine users cited mostly fake news websites and very few reliable sources, which are instead largely cited by pro-vaccine users. Social media (primarily Youtube) represent a large portion of linked content in both cases.

Additionally, we performed demographics (age, gender, ethnicity) and spatial analysis over the two categories of users with the aim of understanding the features of the two communities. Our analysis also shows to which extent the different states are polarized pro or against vaccination in the U.S. on Twitter.

Stance of US states towards vaccination.

A video presenting our research is available on YouTube:

This work has been presented at the IC2S2 conference.

The cover image  by NIAID is licensed under CC BY 2.0.

Call for good practices proven effective in the management and containment of the COVID-19 pandemic effects on economy, society, and healthcare

PERISCOPE (“Pan-European Response to the Impacts of COVID-19 and future Pandemics and Epidemics”) is a large-scale project that aims at mapping and analysing the impacts of the COVID-19 pandemic, developing solutions and guidance for policymakers and health authorities on how to mitigate the impact of the pandemic, and enhancing Europe’s preparedness for future similar events. We plan to promote science-based policies for the post-pandemic society, in a way that orients future recovery towards enhanced resilience and sustainability. PERISCOPE is funded by the European Union Horizon 2020 programme for research and innovation, for the period November 2020-October 2023.

In our three-year journey, we plan to continuously collect good practices and innovative solutions that have proven effective in the containment of the pandemic, in the protection of the economy and society, in the management and organisation of healthcare facilities, or in the mitigation of indirect effects of the restrictions adopted throughout Europe, including mental health and inequalities. From the reorganisation of hospitals to the use of technology in social distancing and contact tracing, to innovative modes of disbursing funds to citizens and businesses, we commit to keeping our eyes open to all successful applications or solutions that could potentially be emulated in other parts of Europe, or inspire socially beneficial innovation.

Give us a hint. We’ll do the rest

We are launching a call for good practices directed at public authorities, businesses, civil society, academics from all over Europe and beyond, in order to identify solutions implemented during 2020, which proved useful and effective in achieving their intended objectives. We only ask respondents to provide us with a very short description, help us classify the good practices according to the categories specified below, and possibly be available for further clarifications in case we need important information. We at PERISCOPE will do the rest. We will analyse the proposed practice and evaluate its transferability to other parts of the European territory, and identify good practices to be promoted throughout Europe.

The areas of interest in our collection of good practices include: Education and training: (for example, modes of distance learning, organising student rotations at school, training teachers on online tools, training healthcare professionals, etc.); use of digital technologies (e.g. contact-tracing apps; use of data from mobile operators or tech platforms; crowdsourcing solutions; use of Artificial Intelligence in testing and tracing; etc.); financial aid to citizens and businesses (direct payments, access to subsidies, rating resilience or sustainability of recipients of funds); reorganisation of hospital and intensive care facilitiestransportation and logistics; and more.

The link to the online form is:

https://ec.europa.eu/eusurvey/runner/PERISCOPEgoodpractices

The first cut-off date for submitting good practices is December 31, 2020. After that date, we will compile a first report and publish it on our website, on the press and in scientific articles. By contributing valuable experience, you can help us learn and transfer practices that can save lives and improve individual well-being in Europe and beyond.

PERISCOPE: the EU project on socio-economic and behavioral impacts of the COVID-19 pandemic

Starting today, our team at the Data Science Lab Polimi will participate to the PERISCOPE European project.

PERISCOPE will investigate the broad socio-economic and behavioral impacts of the COVID-19 pandemic, to make Europe more resilient and prepared for future large-scale risks.

The European Commission approved PERISCOPE (PAN-EUROPEAN RESPONSE TO THE IMPACTS OF COVID-19 AND FUTURE PANDEMICS AND EPIDEMICS), a large-scale research project that brings together 32 European institutions and is coordinated by the University of Pavia. PERISCOPE is a Horizon 2020 research project that was funded with almost 10 million Euros under the Coronavirus Global Response initiative launched in May 2020 by the European Commission President Ursula von der Leyen.
The goal of PERISCOPE is to shed light into the broad socio-economic and behavioral impacts of COVID-19. A multidisciplinary consortium will bring together experts in all aspects of the current outbreak: clinic and epidemiologic; socio-economic and political; statistical and technological.

The partners of the consortium will carry out theoretical and experimental research to contribute to a deeper understanding of the short- and long-term impacts of the pandemic and the measures adopted to contain it. Such research-intensive activities will allow the consortium to propose measures to prepare Europe for future pandemics and epidemics in a relatively short timeline.

The main goals of PERISCOPE are:

  • to gather data on the broad impacts of COVID-19 in order to develop a comprehensive, user-friendly, openly accessible COVID Atlas, which should become a reference tool for researchers and policymakers, and a dynamic source of information to disseminate to the general public;
  • to perform innovative statistical analysis on the collected data, with the help of various methods including machine learning tools;
  • to identify successful practices and approaches adopted at the local level, which could be scaled up at the pan-European level for a better containment of the pandemic and its related socio-economic impacts; and
  • to develop guidance for policymakers at all levels of government, in order to enhance Europe’s preparedness for future similar events and proposed reforms in the multi-level governance of health.

PERISCOPE started on 1 November 2020 and will last until 31 October 2023. You can reach the project members and follow our activities through these social media profiles:

Twitter: @PER1SCOPE_EU

Linkedin: http://www.linkedin.com/company/periscopeproject/

Instagram: @periscope_project

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Weblogs represent the navigation activity generated by a specific amount of users on a given website. This type of data is fundamental because it contains information on the behaviour of users and how they interface with the company’s product itself (website or application). If a company could have a realistic weblog before the release of its product, it would have a significant advantage because it can use the techniques explained above to see the less navigated web pages or those to put in the foreground.

A large audience of users and typically a long time frame are needed to produce sensible and useful log data, making it an expensive task. 

To address this limit, we propose a method that focuses on the generation of REALISTIC NAVIGATIONAL PATHS, i.e., web logs .

Our approach is extremely relevant because it can at the same time tackle the problem of lack of publicly available data about web navigation logs, and also be adopted in industry for AUTOMATIC GENERATION OF REALISTIC TEST SETTINGS of Web sites yet to be deployed.

The generation has been implemented using deep learning methods for generating more realistic navigation activities, namely

  • Recurrent Neural Network, which are very well suited to temporally evolving data
  • Generative Adversarial Network: neural networks aimed at generating new data, such as images or text, very similar to the original ones and sometimes indistinguishable from them, that have become increasingly popular in recent years.

We run experiments using open data sets of weblogs as training, and we run tests for assessing the performance of the methods. Results in generating new weblog data are quite good, as reported in this summary table, with respect to the two evaluation metrics adopted (BLEU and Human evaluation).

Picture1

Comparison of performance of baseline statistical approach, RNN and GAN for generating realistic web logs. Evaluation is done using human assessments and BLEU metrics

 

Our study is described in detail in the paper published at ICWE 2020 – International Conference on Web Engineering with DOI: 10.1007/978-3-030-50578-3. It’s available online on the Springer Web site. and can be cited as:

Pavanetto S., Brambilla M. (2020) Generation of Realistic Navigation Paths for Web Site Testing Using Recurrent Neural Networks and Generative Adversarial Neural Networks. In: Bielikova M., Mikkonen T., Pautasso C. (eds) Web Engineering. ICWE 2020. Lecture Notes in Computer Science, vol 12128. Springer, Cham

The slides are online too:

Together with a short presentation video:

 

Coronavirus stories and data

Coronavirus COVID-19 is an extreme challenge for our society, economy, and individual life. However, governments should have learnt from each other. The impact has been spreading slowly across countries. There has been plenty of time to take action. But apparently people and government can’t grasp the risk until it’s onto them. And the way European and American governments are acting is to slow and incremental.

I live in Italy, we rank second in the world for healthcare quality. The mindset of “this won’t happen here” was the attitude at the beginning of this challenge, and look at  what happened. I’m reporting here two links to articles that mention a data-driven vision, but also the human, psychological an behavioural aspects involved. They are two simple stories that report the Italian perspective on the virus.

Coronavirus Stories From Italy

And why now it’s the time for YOU to worry, fellow Europeans and Americans

#Coronavirus: Updates from the Italian Front

A preview of what will happen in a week in the rest of the world. Things have dramatically changed in our society

Data Science for Business Innovation. Live courses for executives and managers in Italy and The Netherlands

Starting October 2019, we open a new opportunity for companies:

a 2-day hands-on course on Data-driven innovation for executive and managers.

The course is specially developed for executives, managers, and decision-makers that need to handle the foundations of data analysis for taking informed decisions on data-driven business, innovation path and strategies within the enterprise. It consists of keynotes, success stories, and quick  introductory lectures spanning big data, machine learning, data valorization and communication. The course covers terminology and concepts, tools and methods, use cases and success stories of data science applications.

The course explains  what value Data Science can create, what Data Science can solve, what the difference is between descriptive, predictive and prescriptive analytics, and what the roles of machine learning and artificial intelligence are.

The teaching style will be very practical, with use cases, hands on sessions, workgroup activities, and networking sessions for applying what you learn directly on real projects.

The live events will be:

If you are interested, you can visit the pages for the Italian [ITA] and English [ENG] editions respectively, and/o download the detailed brochures:

You can always get in touch to ask for more details.

Similar initiatives that we held in the past included the Urban Data Science Bootcamp, delivered in Milano and Amsterdam in 2017 (see a Medium story on the event here, to understand the style and activities, although you should consider that those reported there are about the specific sector of smartcity).

The event is also integrated with an online mini MOOC available on Coursera.

The course is offered by Politecnico di Milano in collaboration with Cefriel and EIT Digital.

 

Are open source projects governed by rich clubs?

The network of collaborations in an open source project can reveal relevant emergent properties that influence its prospects of success.

In our recent joint work with the Open University of Catalunya / ICREA, we analyze open source projects to determine whether they exhibit a rich-club behavior, that is a phenomenon where contributors with a high number of collaborations (i.e., strongly connected within the collaboration network) are likely to cooperate with other well-connected individuals.

ownCloud-open-source-accessibilityThe presence or absence of a rich-club has an impact on the sustainability and robustness of the project. In fact, if a member of the rich club leaves the project, it is easier for other members of the rich club to take over. Less collaborations would require more effort from more users.

The work has been presented at OpenSym 2019, the 15th International Symposium on Open Collaboration, in Skövde (Sweden), on August 20-22, 2019.

The full paper is available on the conference Web Site (or locally here), and the slides presenting our results are available on Slideshare:

For this analysis, we build and study a dataset with the 100 most popular projects in GitHub, exploiting connectivity patterns in the graph structure of collaborations that arise from commits, issues and pull requests. Results show that rich-club behavior is present in all the projects, but only few of them have an evident club structure.

For instance, this network of contributors for the Materialize project seems to go against the open source paradigma. The project is “owned” by very  few users:

richclubEstablished in 2014 by a team of 4 developers, at the time of the analysis it featured 3,853 commits and 252 contributors. Nevertheless, the project only has two top contributors (with more than 1,000 commits), which belong to the original team, and no other frequent contributors.

For all the projects, we compute coefficients both for single source graphs and the overall interaction graph, showing that rich-club behavior varies across different layers of software development. We provide possible explanations of our results, as well as implications for further analysis.

Data Science for Business Innovation. A new MOOC on Coursera

Breaking news!

We just published our new MOOC “Data Science for Business Innovation” on Coursera!

Our course is available for free on Coursera and is jointly offered by Politecnico di Milano and EIT Digital, as a compendium of the must-have expertise in data science for non-technical people, including executives, middle-managers to foster data-driven innovation.

The course is an introductory, non-technical overview of the concepts of data science.

You can enrol in the first edition of the course starting today.

The course is completely free and you can enjoy content at any time, with professional English speakers and animated, engaging materials.

Here is a short intro to the course:

The course consists of introductory lectures spanning big data, machine learning, data valorization and communication.
All the remaining details can be found on Coursera:

eit

Topics cover the essential concepts and intuitions on data needs, data analysis, machine learning methods, respective pros and cons, and practical applicability issues. The course covers terminology and concepts, tools and methods, use cases and success stories of data science applications.

The course explains what is Data Science and why it is so hyped. It discusses the value that Data Science can create, the main classes of problems that Data Science can solve, the difference is between descriptive, predictive and prescriptive analytics, and the roles of machine learning and artificial intelligence.

From a more technical perspective, the course covers supervised, unsupervised and semi-supervised methods, and explains what can be obtained with classification, clustering, and regression techniques. It discusses the role of NoSQL data models and technologies, and the role and impact of scalable cloud-based computation platforms.

All topics are covered with example-based lectures, discussing use cases, success stories and realistic examples.

If you are interested in these topics, feel free to look at it on Coursera.

We look forward to seeing you there!