I collected here the list of my write-ups of the first three keynote speeches of the conference:
Over one billion cars interact with each other on the road every day. Each driver has his own driving style, which could impact safety, fuel economy and road congestion. Knowledge about the driving style of the driver could be used to encourage “better” driving behaviour through immediate feedback while driving, or by scaling auto insurance rates based on the aggressiveness of the driving style.
In this work we report on our study of driving behaviour profiling based on unsupervised data mining methods. The main goal is to detect the different driving behaviours, and thus to cluster drivers with similar behaviour. This paves the way to new business models related to the driving sector, such as Pay-How-You-Drive insurance policies and car rentals. Here is the presentation I gave on this topic:
Driver behavioral characteristics are studied by collecting information from GPS sensors on the cars and by applying three different analysis approaches (DP-means, Hidden Markov Models, and Behavioural Topic Extraction) to the contextual scene detection problems on car trips, in order to detect different behaviour along each trip. Subsequently, drivers are clustered in similar profiles based on that and the results are compared with a human-defined ground-truth on drivers classification.
The proposed framework is tested on a real dataset containing sampled car signals. While the different approaches show relevant differences in trip segment classification, the coherence of the final driver clustering results is surprisingly high.
This work has been published at the 4th IEEE Big Data Conference, held in Boston in December 2017. The full paper can be cited as:
M. Brambilla, P. Mascetti and A. Mauri, “Comparison of different driving style analysis approaches based on trip segmentation over GPS information,” 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, 2017, pp. 3784-3791.
You can download the full paper PDF from the IEEE Explore Library, at this url:
If you are interested in further contributions at the conference, here you can find my summaries of the keynote speeches on human-in-the-loop machine learning and on increasing human perception through text mining.
Within the context of our data science research track, we have been involved a lot in fashion industry problems recently.
We already showcased some studies in fashion, for instance related to the analysis of the Milano Fashion Week events and their social media impact.
Starting this year, we are also involved in a research and innovation project called FaST – Fashion Sensing Technology. FaST is a project meant to design, experiment with, and implement an ICT tool that could monitor and analyze the activity of Italian emerging Fashion brands on social media. FaST aims at providing SMEs in the Fashion industry with the ability to better understand and measure the behaviours and opinions of consumers on social media, through the study of the interactions between brands and their communities, as well as support a brand’s strategic business decisions.
Given the importance of Fashion as an economic and cultural resource for Lombardy Region and Italy as a whole, the project aims at leveraging on the opportunities given by the creation of an hybrid value chain fashion-digital, in order to design a tool that would allow the codification of new organizational models. Furthermore, the project wants to promote process innovation within the fashion industry but with a customer-centric approach, as well as the design of services that could update and innovate both creative processes and the retail channel which, as of today, represents the core to the sustainability and competitiveness of brands and companies on domestic and international markets.
Within the project, we study social presence and digital / communication strategies of brands, and we will look for space for optimization. We are already crunching a lot of data and running large scale analyses on the topic. We will share our exciting results as soon as available!
FaST – Fashion Sensing Technology is a project supported by Regione Lombardia through the European Regional Development Fund (grant: “Smart Fashion & Design”). The project is being developed by Politecnico di Milano – Design dept. and Electronics, Information and Bioengineering dept. – in collaboration with Wemanage Group, Studio 4SIGMA, and CGNAL.
In the context of Domain-Specific Modeling Language (DSML) development, the involvement of end-users is crucial to assure that the resulting language satisfies their needs.
In our paper presented at SLE 2017 in Vancouver, Canada, on October 24th within the SPLASH Conference context, we discuss how crowdsourcing tasks can exploited to assist in domain-specific language definition processes. This is in line with the vision towards cognification of model-driven engineering.
The slides are available on slideshare:
Indeed, crowdsourcing has emerged as a novel paradigm where humans are employed to perform computational and information collection tasks. In language design, by relying on the crowd, it is possible to show an early version of the language to a wider spectrum of users, thus increasing the validation scope and eventually promoting its acceptance and adoption.
We propose a systematic (and automatic) method for creating crowdsourcing campaigns aimed at refining the graphical notation of DSMLs. The method defines a set of steps to identify, create and order the questions for the crowd. As a result, developers are provided with a set of notation choices that best fit end-users’ needs. We also report on an experiment validating the approach.
Improving the quality of the language notation may improve dramatically acceptance and adoption, as well as the way people use your notation and the associated tools.
Essentially, our idea is to spawn to the crowd a bunch of questions regarding the concrete syntax of visual modeling languages, and collect opinions. Based on different strategies, we generate an optimal notation and then we check how good it is.
In the paper we also validate the approach and experiment it in a practical use case, namely studying some variations over the BPMN modeling language.
The full paper can be found here: https://dl.acm.org/citation.cfm?doid=3136014.3136033. The paper is titled: “Better Call the Crowd: Using Crowdsourcing to Shape the Notation of Domain-Specific Languages” and was co-authored by Marco Brambilla, Jordi Cabot, Javier Luis Cánovas Izquierdo, and Andrea Mauri.
You can also access the Web version on Jordi Cabot blog.
The artifacts described in this paper are also referenced on findresearch.org, namely referring to the following materials:
- Overview on crowdsearcher site: http://crowdsearcher.deib.polimi.it/casestudies/
- Code of the UI (including the configuration of the tasks and of the modeling alternatives analyized): https://github.com/janez87/tef
- Code of the crowdsourcing platform: https://github.com/janez87/crowdsearcher/tree/modeling-new
- Results summary: http://crowdsearcher.deib.polimi.it/casestudies/crowd-experiment-results-sle2017.xlsx
For centuries, science (in German “Wissenschaft”) has aimed to create (“schaften”) new knowledge (“Wissen”) from the observation of physical phenomena, their modelling, and empirical validation.
Recently, a new source of knowledge has emerged: not (only) the physical world any more, but the virtual world, namely the Web with its ever-growing stream of data materialized in the form of social network chattering, content produced on demand by crowds of people, messages exchanged among interlinked devices in the Internet of Things. The knowledge we may find there can be dispersed, informal, contradicting, unsubstantiated and ephemeral today, while already tomorrow it may be commonly accepted.
The challenge is once again to capture and create consolidated knowledge that is new, has not been formalized yet in existing knowledge bases, and is buried inside a big, moving target (the live stream of online data).
The myth is that existing tools (spanning fields like semantic web, machine learning, statistics, NLP, and so on) suffice to the objective. While this may still be far from true, some existing approaches are actually addressing the problem and provide preliminary insights into the possibilities that successful attempts may lead to.
I gave a few keynote speeches on this matter (at ICEIS, KDWEB,…), and I also use this argument as a motivating class in academic courses for letting students understand how crucial is to focus on the problems related to big data modeling and analysis. The talk, reported in the slides below, explores through real industrial use cases, the mixed realistic-utopian domain of data analysis and knowledge extraction and reports on some tools and cases where digital and physical world have brought together for better understanding our society.
The presentation is available on SlideShare and are reported here below:
We organize a crash-course on how the science of urban data can be applied to solve metropolitan issues.
The course is a 2 days face-to-face event with teaching sessions, workshops, case study discussions and hands-on activities for non-IT professionals in the field of city management. It is issued in two editions along the year:
- in Milan, Italy, on November 8th-9th, 2017
- in Amsterdam, The Netherlands, on November 30th-December 1st, 2017.
You can download the flyer and program of the Urban datascience bootcamp 2017.
Ideal participants include: Civil servants, Professionals, Students, Urban planners, and managers of city utilities and services. No previous experience in data science or computer science is required. Attendees should have experience in areas such as economic affairs, urban development, management support, strategy & innovation, health & care, public order & safety.
Data is the catalyst needed to make the smart city vision a reality in a transparent and evidence-based (i.e. data-driven) manner. The skills required for data-driven urban analysis and design activities are diverse, and range from data collection (field work, crowdsensing, physical sensor processing, etc.); data processing by employing established big data technology frameworks; data exploration to find patterns and outliers in spatio-temporal data streams; and data visualization conveying the right information in the right manner.
The CrowdInsights professional school “Urban Data Science Bootcamp” provides a no-frills, hands-on introduction to the science of urban data; from data creation, to data analysis, data visualization and sense-making, the bootcamp will introduce more than 10 real-world application uses cases that exemplifies how urban data can be applied to solve metropolitan issues. Attendees will explore the challenges and opportunities that come from the adoption of novel types of urban data source, including social media, mobile phone data, IoT networks, etc.
In our most recent study, we analysed the user behaviour and profile, as well as the textual and visual content posted on social media for art and culture events.
The corresponding paper has been presented at CD-MAKE 2017 in Reggio Calabria on August 31st, 2017.
Nowadays people share everything on online social networks, from daily life stories to the latest local and global news and events. In our paper, we address the specific problem of user behavioural profiling in the context of cultural and artistic events.
We propose a specific analysis pipeline that aims at examining the profile of online users, based on the textual content they published online. The pipeline covers the following aspects: data extraction and enrichment, topic modeling based on LDA, dimensionality reduction, user clustering, prediction of interest, content analysis including profiling of images and subjects.
We show our approach at work for the monitoring of participation to a large-scale artistic installation that collected more than 1.5 million visitors in just two weeks (namely The Floating Piers, by Christo and Jeanne-Claude). In the paper we report our findings and discuss the pros and cons of the work.
The full paper is published by Springer in the LNCS series in volume 10410, pages 219-236.
The slides used for the presentation are available on SlideShare:
Together with the Urbanscope team, we gave a TedX talk on the topics and results of the project here at Politecnico di Milano. The talk was actually given by our junior researchers, as we wanted it to be a choral performance as opposed to the typical one-man show.
The message is that cities are not mere physical and organizational devices only: they are informational landscapes where places are shaped more by the streams of data and less by the traditional physical evidences. We devise tools and analysis for understanding these streams and the phenomena they represent, in order to understand better our cities.
Two layers coexist: a thick and dynamic layer of digital traces – the informational membrane – grows everyday on top of the material layer of the territory, the buildings and the infrastructures. The observation, the analysis and the representation of these two layers combined provides valuable insights on how the city is used and lived.
Urbanscope is a research laboratory where collection, organization, analysis, and visualization of cross domain geo-referenced data are experimented.
The research team is based at Politecnico di Milano and encompasses researchers with competencies in Computing Engineering, Communication and Information Design, Management Engineering, and Mathematics.
The aim of Urbanscope is to systematically produce compelling views on urban systems to foster understanding and decision making. Views are like new lenses of a macroscope: they are designed to support the recognition of specific patterns thus enabling new perspectives.
If you enjoyed the show, you can explore our beta application at:
and discover the other data science activities we are conducting at the Data Science Lab of Politecnico, DEIB.
This is our perspective on the world: it’s all about modeling.
So, why is it that model-driven engineering is not taking over the whole technological and social eco-system?
Let me make the case that it is.
In the occasion of the 25th edition of the Italian Symposium of Database Systems (SEBD 2017) we (Stefano Ceri and I) have been asked to write a retrospective on the last years of database and systems research from our perspective, published in a dedicated volume by Springer. After some brainstorming, we agreed that it all boils down to this: modeling, modeling, modeling.
Long time ago, in the past century, the International DB Research Community used to meet for assessing new research directions, starting the meetings with 2-minutes gong shows to tell each one’s opinion and influencing follow-up discussion. Bruce Lindsay from IBM had just been quoted for his message:
There are 3 important things in data management: performance, performance, performance.
Stefano Ceri had a chance to speak out immediately after and to give a syntactically similar but semantically orthogonal message:
There are 3 important things in data management: modeling, modeling, modeling.
Data management is continuously evolving for serving the needs of an increasingly connected society. New challenges apply not only to systems and technology, but also to the models and abstractions for capturing new application requirements.
In our retrospective paper, we describe several models and abstractions which have been progressively designed to capture new forms of data-centered interactions in the last twenty five years – a period of huge changes due to the spreading of web-based applications and the increasingly relevant role of social interactions.
We initially focus on Web-based applications for individuals, then discuss applications among enterprises, and this is all about WebML and IFML; then we discuss how these applications may include rankings which are computed using services or using crowds, and this is related to our work on crowdsourcing (liquid query and crowdsearcher tool); we conclude with hints to a recent research discussing how social sources can be used for capturing emerging knowledge (the social knowledge extractor perspective and tooling).
It’s also true that model-driven engineering is not necessarily the tool of choice for this to happen. Why? As technician, we always tend to blame the customer for not understanding our product. But maybe we should look into ourselves and the kind of tools (conceptual and technical) the MDE community is offering. I’m pretty sure we could find plenty of space for improvement.
Any idea on how to do this?
We believe cognification could drastically improve the benefits and reduce the costs of adopting MDSE, and thus boost its adoption.
At the practical level, cognification comprises tools that go from artificial intelligence (machine learning, deep learning, as well as human cognitive capabilities, exploited through online activities, crowdsourcing, gamification and so on.
Opportunities (and challenges) for MDE
Here is a set of MDSE tasks and tools whose benefits can be especially boosted thanks to cognification.
- A modeling bot playing the role of virtual assistant in the modeling tasks
- A model inferencer able to deduce a common schema behind a set of unstructured data coming from the software process
- A code generator able to learn the style and best practices of a company
- A real-time model reviewer able to give continuous quality feedback
- A morphing modeling tool, able to adapt its interface at run-time
- A semantic reasoning platform able to map modeled concepts to existing ontologies
- A data fusion engine that is able to perform semantic integration and impact analysis of design-time models with runtime data
- A tool for collaboration between domain experts and modeling designers
Obviously, we are aware that some research initiatives aiming at cognifying specific tasks in Software Engineering exist (including some activities of ours). But what we claim here is a change in magnitude of their coverage, integration, and impact in the short-term future.
If you want to get a more detailed description, you can go through the detailed post by Jordi Cabot that reports the whole content of the paper.