The Final TRIGGER Conference

We will join and contribute to the final TRIGGER conference is scheduled for May 31st, 2022 in Brussels.

The theme is: “Rethinking the EU’s role in global governance”. In this context, the TRIGGER project is going to present the main research outcomes of the H2020 research program that started in 2018, setting the stage for the collaboration among 14 international partners. 

We will present our main contributions, namely PERSEUS and COCTEAU.

A quick intro to PERSEUS is available in this video:

Further details about the event are available here:

Generation of Realistic Navigation Paths for Web Site Testing using RNNs and GANs

Weblogs represent the navigation activity generated by a specific amount of users on a given website. This type of data is fundamental because it contains information on the behaviour of users and how they interface with the company’s product itself (website or application). If a company could have a realistic weblog before the release of its product, it would have a significant advantage because it can use the techniques explained above to see the less navigated web pages or those to put in the foreground.

A large audience of users and typically a long time frame are needed to produce sensible and useful log data, making it an expensive task. 

To address this limit, we propose a method that focuses on the generation of REALISTIC NAVIGATIONAL PATHS, i.e., web logs .

Our approach is extremely relevant because it can at the same time tackle the problem of lack of publicly available data about web navigation logs, and also be adopted in industry for AUTOMATIC GENERATION OF REALISTIC TEST SETTINGS of Web sites yet to be deployed.

The generation has been implemented using deep learning methods for generating more realistic navigation activities, namely

  • Recurrent Neural Network, which are very well suited to temporally evolving data
  • Generative Adversarial Network: neural networks aimed at generating new data, such as images or text, very similar to the original ones and sometimes indistinguishable from them, that have become increasingly popular in recent years.

We run experiments using open data sets of weblogs as training, and we run tests for assessing the performance of the methods. Results in generating new weblog data are quite good, as reported in this summary table, with respect to the two evaluation metrics adopted (BLEU and Human evaluation).

Picture1

Comparison of performance of baseline statistical approach, RNN and GAN for generating realistic web logs. Evaluation is done using human assessments and BLEU metrics

 

Our study is described in detail in the paper published at ICWE 2020 – International Conference on Web Engineering with DOI: 10.1007/978-3-030-50578-3. It’s available online on the Springer Web site. and can be cited as:

Pavanetto S., Brambilla M. (2020) Generation of Realistic Navigation Paths for Web Site Testing Using Recurrent Neural Networks and Generative Adversarial Neural Networks. In: Bielikova M., Mikkonen T., Pautasso C. (eds) Web Engineering. ICWE 2020. Lecture Notes in Computer Science, vol 12128. Springer, Cham

The slides are online too:

Together with a short presentation video:

 

Coronavirus stories and data

Coronavirus COVID-19 is an extreme challenge for our society, economy, and individual life. However, governments should have learnt from each other. The impact has been spreading slowly across countries. There has been plenty of time to take action. But apparently people and government can’t grasp the risk until it’s onto them. And the way European and American governments are acting is to slow and incremental.

I live in Italy, we rank second in the world for healthcare quality. The mindset of “this won’t happen here” was the attitude at the beginning of this challenge, and look at  what happened. I’m reporting here two links to articles that mention a data-driven vision, but also the human, psychological an behavioural aspects involved. They are two simple stories that report the Italian perspective on the virus.

Coronavirus Stories From Italy

And why now it’s the time for YOU to worry, fellow Europeans and Americans

#Coronavirus: Updates from the Italian Front

A preview of what will happen in a week in the rest of the world. Things have dramatically changed in our society

IEEE Big Data Conference 2017: take home messages from the keynote speakers

I collected here the list of my write-ups of the first three keynote speeches of the conference:

Driving Style and Behavior Analysis based on Trip Segmentation over GPS Information through Unsupervised Learning

Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage “better” driving behaviour through immediate feedback while driving, or by scaling auto insurance rates based on the aggressiveness of the driving style.
In this work we report on our study of driving behaviour profiling based on unsupervised data mining methods. The main goal is to detect the different driving behaviours, and thus to cluster drivers with similar behaviour. This paves the way to new business models related to the driving sector, such as Pay-How-You-Drive insurance policies and car rentals. Here is the presentation I gave on this topic:

Driver behavioral characteristics are studied by collecting information from GPS sensors on the cars and by applying three different analysis approaches (DP-means, Hidden Markov Models, and Behavioural Topic Extraction) to the contextual scene detection problems on car trips, in order to detect different behaviour along each trip. Subsequently, drivers are clustered in similar profiles based on that and the results are compared with a human-defined ground-truth on drivers classification.

The proposed framework is tested on a real dataset containing sampled car signals. While the different approaches show relevant differences in trip segment classification, the coherence of the final driver clustering results is surprisingly high.

 


This work has been published at the 4th IEEE Big Data Conference, held in Boston in December 2017. The full paper can be cited as:

M. Brambilla, P. Mascetti and A. Mauri, “Comparison of different driving style analysis approaches based on trip segmentation over GPS information,” 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, 2017, pp. 3784-3791.
doi: 10.1109/BigData.2017.8258379

You can download the full paper PDF from the IEEE Explore Library, at this url:

https://ieeexplore.ieee.org/document/8258379/

If you are interested in further contributions at the conference, here you can find my summaries of the keynote speeches on human-in-the-loop machine learning and on increasing human perception through text mining.

Using Crowdsourcing for Domain-Specific Languages Specification

In the context of Domain-Specific Modeling Language (DSML) development, the involvement of end-users is crucial to assure that the resulting language satisfies their needs.

In our paper presented at SLE 2017 in Vancouver, Canada, on October 24th within the SPLASH Conference context, we discuss how crowdsourcing tasks can exploited to assist in domain-specific language definition processes. This is in line with the vision towards cognification of model-driven engineering.

The slides are available on slideshare:

 

Indeed, crowdsourcing has emerged as a novel paradigm where humans are employed to perform computational and information collection tasks. In language design, by relying on the crowd, it is possible to show an early version of the language to a wider spectrum of users, thus increasing the validation scope and eventually promoting its acceptance and adoption.

SLE2017-v2
Ready to accept improper use of your tools?

We propose a systematic (and automatic) method for creating crowdsourcing campaigns aimed at refining the graphical notation of DSMLs. The method defines a set of steps to identify, create and order the questions for the crowd. As a result, developers are provided with a set of notation choices that best fit end-users’ needs. We also report on an experiment validating the approach.

Improving the quality of the language notation may improve dramatically acceptance and adoption, as well as the way people use your notation and the associated tools.

Essentially, our idea is to spawn to the crowd a bunch of questions regarding the concrete syntax of visual modeling languages, and collect opinions. Based on different strategies, we generate an optimal notation and then we check how good it is.

In the paper we also validate the approach and experiment it in a practical use case, namely studying some variations over the BPMN modeling language.

The full paper can be found here: https://dl.acm.org/citation.cfm?doid=3136014.3136033. The paper is titled: “Better Call the Crowd: Using Crowdsourcing to Shape the Notation of Domain-Specific Languages” and was co-authored by Marco Brambilla, Jordi Cabot, Javier Luis Cánovas Izquierdo, and Andrea Mauri.

You can also access the Web version on Jordi Cabot blog.

The artifacts described in this paper are also referenced on findresearch.org, namely referring to the following materials:

Myths and Challenges in Knowledge Extraction and Big Data Analysis

For centuries, science (in German “Wissenschaft”) has aimed to create (“schaften”) new knowledge (“Wissen”) from the observation of physical phenomena, their modelling, and empirical validation.

Recently, a new source of knowledge has emerged: not (only) the physical world any more, but the virtual world, namely the Web with its ever-growing stream of data materialized in the form of social network chattering, content produced on demand by crowds of people, messages exchanged among interlinked devices in the Internet of Things. The knowledge we may find there can be dispersed, informal, contradicting, unsubstantiated and ephemeral today, while already tomorrow it may be commonly accepted.

Picture2The challenge is once again to capture and create consolidated knowledge that is new, has not been formalized yet in existing knowledge bases, and is buried inside a big, moving target (the live stream of online data).

The myth is that existing tools (spanning fields like semantic web, machine learning, statistics, NLP, and so on) suffice to the objective. While this may still be far from true, some existing approaches are actually addressing the problem and provide preliminary insights into the possibilities that successful attempts may lead to.

I gave a few keynote speeches on this matter (at ICEIS, KDWEB,…), and I also use this argument as a motivating class in academic courses for letting students understand how crucial is to focus on the problems related to big data modeling and analysis. The talk, reported in the slides below, explores through real industrial use cases, the mixed realistic-utopian domain of data analysis and knowledge extraction and reports on some tools and cases where digital and physical world have brought together for better understanding our society.

The presentation is available on SlideShare and are reported here below:

Urban Data Science Bootcamp

We organize a crash-course on how the science of urban data can be applied to solve metropolitan issues.

crowdinsights_bootcamp_2017_en

The course is a 2 days face-to-face event with teaching sessions, workshops, case study discussions and hands-on activities for non-IT professionals in the field of city management. It is issued in two editions along the year:

  • in Milan, Italy, on  November 8th-9th, 2017
  • in Amsterdam, The Netherlands, on November 30th-December 1st, 2017.

You can download the flyer and program of the Urban datascience bootcamp 2017.

Ideal participants include: Civil servants, Professionals, Students, Urban planners, and managers of city utilities and services. No previous experience in data science or computer science is required. Attendees should have experience in areas such as economic affairs, urban development, management support, strategy & innovation, health & care, public order & safety.

Data is the catalyst needed to make the smart city vision a reality in a transparent and evidence-based (i.e. data-driven) manner. The skills required for data-driven urban analysis and design activities are diverse, and range from data collection (field work, crowdsensing, physical sensor processing, etc.); data processing by employing established big data technology frameworks; data exploration to find patterns and outliers in spatio-temporal data streams; and data visualization conveying the right information in the right manner.

The CrowdInsights professional school “Urban Data Science Bootcamp” provides a no-frills, hands-on introduction to the science of urban data; from data creation, to data analysis, data visualization and sense-making, the bootcamp will introduce more than 10 real-world application uses cases that exemplifies how urban data can be applied to solve metropolitan issues. Attendees will explore the challenges and opportunities that come from the adoption of novel types of urban data source, including social media, mobile phone data, IoT networks, etc.

Analysis of user behaviour and social media content for art and culture events

In our most recent study, we analysed the user behaviour and profile, as well as the textual and visual content posted on social media for art and culture events.

The corresponding paper has been presented at CD-MAKE 2017 in Reggio Calabria on August 31st, 2017.

Nowadays people share everything on online social networks, from daily life stories to the latest local and global news and events. In our paper, we address the specific problem of user behavioural profiling in the context of cultural and artistic events.

We propose a specific analysis pipeline that aims at examining the profile of online users, based on the textual content they published online. The pipeline covers the following aspects: data extraction and enrichment, topic modeling based on LDA, dimensionality reduction, user clustering, prediction of interest, content analysis including profiling of images and subjects.

Picture1We show our approach at work for the monitoring of participation to a large-scale artistic installation that collected more than 1.5 million visitors in just two weeks (namely The Floating Piers, by Christo and Jeanne-Claude). In the paper we report our findings and discuss the pros and cons of the work.

The full paper is published by Springer in the LNCS series in volume 10410, pages 219-236.

The slides used for the presentation are available on SlideShare:

 

Urbanscope: Digital Whispers from the Urban Landscape. TedX Talk Video

Together with the Urbanscope team, we gave a TedX talk on the topics and results of the project here at Politecnico di Milano. The talk was actually given by our junior researchers, as we wanted it to be a choral performance as opposed to the typical one-man show.

The message is that cities are not mere physical and organizational devices only: they are informational landscapes where places are shaped more by the streams of data and less by the traditional physical evidences. We devise tools and analysis for understanding these streams and the phenomena they represent, in order to understand better our cities.

Two layers coexist: a thick and dynamic layer of digital traces – the informational membrane – grows everyday on top of the material layer of the territory, the buildings and the infrastructures. The observation, the analysis and the representation of these two layers combined provides valuable insights on how the city is used and lived.

You can now find the video of the talk on the official TedX YouTube channel:

Urbanscope is a research laboratory where collection, organization, analysis, and visualization of cross domain geo-referenced data are experimented.
The research team is based at Politecnico di Milano and encompasses researchers with competencies in Computing Engineering, Communication and Information Design, Management Engineering, and Mathematics.

The aim of Urbanscope is to systematically produce compelling views on urban systems to foster understanding and decision making. Views are like new lenses of a macroscope: they are designed to support the recognition of specific patterns thus enabling new perspectives.

If you enjoyed the show, you can explore our beta application at:

http://www.urbanscope.polimi.it

and discover the other data science activities we are conducting at the Data Science Lab of Politecnico, DEIB.