Analysis of user behaviour and social media content for art and culture events

In our most recent study, we analysed the user behaviour and profile, as well as the textual and visual content posted on social media for art and culture events.

The corresponding paper has been presented at CD-MAKE 2017 in Reggio Calabria on August 31st, 2017.

Nowadays people share everything on online social networks, from daily life stories to the latest local and global news and events. In our paper, we address the specific problem of user behavioural profiling in the context of cultural and artistic events.

We propose a specific analysis pipeline that aims at examining the profile of online users, based on the textual content they published online. The pipeline covers the following aspects: data extraction and enrichment, topic modeling based on LDA, dimensionality reduction, user clustering, prediction of interest, content analysis including profiling of images and subjects.

Picture1We show our approach at work for the monitoring of participation to a large-scale artistic installation that collected more than 1.5 million visitors in just two weeks (namely The Floating Piers, by Christo and Jeanne-Claude). In the paper we report our findings and discuss the pros and cons of the work.

The full paper is published by Springer in the LNCS series in volume 10410, pages 219-236.

The slides used for the presentation are available on SlideShare:


Pattern-Based Specification of Crowdsourcing Applications – ICWE 2014 best paper

I’m really proud to announce that our paper “Pattern-Based Specification of Crowdsourcing Applications” has received the BEST PAPER award at ICWE 2014 (International Conference on Web Engineering), held in Toulouse in July 2014. The paper was authored by Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Andrea Mauri, and Riccardo Volonterio.

The work addresses the fact that in many crowd-based applications, the interaction with performers is decomposed in several tasks that, collectively, produce the desired results.
A number of emerging crowd-based applications cover very different scenarios, including opinion mining, multimedia data annotation, localised information gathering, marketing campaigns, expert response gathering, and so on.
In most of these scenarios, applications can be decomposed in tasks that collectively produce their results; Tasks interactions give rise to arbitrarily complex workflows.

In this paper we propose methods and tools for designing crowd-based workflows as interacting tasks.
We describe the modelling concepts that are useful in such framework, including typical workflow patterns, whose function is to decompose a cognitively complex task into simple interacting tasks so that the complex task is co-operatively solved.
We then discuss how workflows and patterns are managed by CrowdSearcher, a system for designing, deploying and monitoring applications on top of crowd-based systems, including social networks and crowdsourcing platforms. Tasks performed by humans consist of simple operations which apply to homogeneous objects; the complexity of aggregating and interpreting task results is embodied within the framework. We show our approach at work on a validation scenario and we report quantitative findings, which highlight the effect of workflow design on the final results.

Here are the slides presented by Alessandro Bozzon during the ICWE conference:


Here is Alessandro Bozzon presenting:

and here is the picture of the actual award:

ICWE 2014 Best Paper Award Certificate to Pattern-Based Specification of Crowdsourcing Applications. Bozzon, Brambilla, Ceri, Mauri, Volonterio

Web Information Retrieval – the book

We are finally ready to announce our new book:

Web Information Retrieval

Publisher: Springer Verlag
Series: Data-Centric Systems and Applications
Authors: Stefano Ceri, Alessandro Bozzon, Marco Brambilla, Emanuele Della Valle, Piero Fraternali, Silvia Quarteroni.
282 pages, 120 illustrations.


The book will be launched at VLDB in August 2013, but you are able to preorder it on the Springer Web site or on Amazon.

The book is intended as an introduction to information retrieval and its application to the Web context. It takes the readers from the foundations of modern information retrieval to the most advanced challenges of Web Information Retrieval (IR). To this end, their book is divided into 3 parts.

  • The first part addresses the principles of IR and provides a systematic and compact description of basic information retrieval techniques (including binary, vector space and probabilistic models as well as natural language search processing) before focusing on its application to the Web.
  • The second part addresses the foundational aspects of Web IR by discussing the general architecture of search engines (with a focus on the crawling and indexing processes), describing link analysis methods (specifically Page Rank and HITS), addressing recommendation and diversification, and finally presenting advertising in search (the main source of revenues for search engines).
  • The third and final part describes advanced aspects of Web search, each chapter providing a self-contained, up-to-date survey on current Web research directions. Topics in this part include meta-search and multi-domain search, semantic search, search in the context of multimedia data, and crowd search.

Success story paper: Large-scale Model-Driven Engineering of Web User Interaction with WebML and WebRatio

Our paper “Large-scale Model-Driven Engineering of Web User Interaction: The WebML and WebRatio experience” has been published online on Elsevier’s journal: Science of Computer Programming, in the special issue Success Stories in Model Driven Engineering (edited by Davide Di Ruscio, Richard Paige, Alfonso Pierantonio).

The history we report spans across a decade that has seen a dramatic  change in the way software applications are built, which can be summarized  in three fundamental factors that impacted the evolution of WebML and  WebRatio:
  • The progressive consolidation of theWeb as an application development platform
  • At the front-end, the multiplication of access devices and usage scenarios
  • At the back-end, Business Process Models emerged as a uniform way of representing cross-organization functionality, and Service Oriented Architecture as the technical vehicle for deploying process enactment on top of heterogeneous IT infrastructures.
These change drivers put much strain on a DSL like WebML, born for capturing the  features of the Web, and produced the timeline shown below:

The paper reports on our experience with WebML and WebRatio and describes the perspective of the new IFML standard adopted by OMG. The report tells the story of our company in the MDE tool market, facing the challenges of deploying MDE solutions in large-scale industrial players, with a focus on the model-driven design of user interaction and on code generation across all the tiers of Web/SOA applications. We describe our decisions on the DSL (domain specific language) and on the features we decided to implement (or not) in the tool. 
The paper includes an overview of WebRatio and of its accompanying DSL for Web application design (WebML); it describes the parallel evolution of the WebML language and of the WebRatio development environment; it reports on the the lessons learnt from the joint design of the DSL and of its support tool; it presents a sample of customer histories and reports some quantitative measures on the WebRatio usage, together with some statistics on WebML models size and development effort. Finally, we take the occasion to reflect on the success and failure factors for MDE emerged from the WebRatio experience.

The paper is available from Elsevier and also here in our open-access preprint version.

Andrei Broder, Yahoo! VP on computational advertising: Seminar on Targeted Advertising.

Andrei Broder,
Yahoo! Research

Andrei Broder from Yahoo! Research gave two seminars at Politecnico di Milano on introduction to internet monetization (in Como) and on targeted advertising (in Milano) as a branch of computational advertising.
Computational advertising is about finding the best match between a given user in a given context and a suitable advertisement.
The context can be a web search result page, or a content page provided by a portal. The ads should match these contents.
Ads aim at showing the product, provide information, induce direct action (e.g., direct marketing such as expiring coupons, which induce some urgency in the reader) but also build a general and long-lasting image for a brand.
The core motivation of targeting the advertising is that sending the appropriate advertising to more interested users is the best option for everybody: the advertiser (who gets to the users he is really looking for), the advertising company (who gets more clicks on the ads, i.e., a higher click per view rate, and therefore earns more money), and the user himself (who can finally get advertising interesting for him).
Advertisers use targeting in several ways (geotargeting, demographics, adv. channel, and so on).
The problem addressed by this discipline can be summarized in the problem introduced by information overloading. When there is a large availability of information, there is a scarcity of interest of users. In modern time one must put a huge effort for getting users’ attention.
Technically speaking, the goal of targeted advertising is to raise the Accuracy-Reach curve (i.e., precision-recall curve).

Targeted advertising can be achieved through two main approaches:

  • rule based: the advertiser provides a set of rules for targeting the right segments of users (e.g., based on age, geographical info, sex, and so on).
  • model based: the advertisement platform defines a user model that lets it address better the kind of users that might be interested in an ad.

Modern targeting is model-based and it relies on the concept of persona, which is basically a relevant behaviour and profile of user. A persona is a facet of personalities. A single user may cover different personas.
The other crucial aspect is based on interests or topics that people like. These can change in time. To play good targeting you need to draw temporal pictures and consider combination of topics that are relevant at the same time.
Demographic targeting is the classical approach.
One important technique is re-targeting, which is a particular case of behavioral targeting: you do something on a web site and later on, when doing searches or other online activities, you get  advertisements from that web site.
That can be simply achieved through cookies added to your browser when visiting the original site. The question is how much is this acceptable? Do users feel their privacy has been violated?
One basic solution is obviously to delete cookies in the browser. Statistics show that people tend to be more careful in time and delete cookies more often.

On mobile application development: native, web or both?

I’ve been solicited to write this short post by the interesting reading of the article “Mobile Application Development: Web vs. Native” by Andre Charland, Brian Leroux that recently appeared on Communications of the ACM, Vol. 54 No. 5.

Let me first clarify the definitions:

  • Native mobile application: application developed for a specific mobile platform (e.g., iPhone, iPad, Android, …)
  • Mobile web application: web site that has been designed appropriately to fit size, performance, and appearance of a mobile device

I appreciated that someone finally explicitly raised the question of whether is convenient to develop the one or the other. I think that the article mainly tackles technical problem (access to device input/output, performance, usability, and so on), which is fine. But I think that some other issues should be considered. In particular, also taking into consideration some input gathered at the keynote speech that Tim Berners Lee recently gave at WWW 2011 in Hyderabad, India, I’m keen to opt for the mobile option for a few reasons:

  1. The data that are dealt with would remain on the web, instead of being canned into some apps.
  2. The pages would be accessible and indexed by standard search engines
  3. The development would be (almost) platform independent (for sure much more than developing alternative native applications).
  4. Also developing domain specific languages and code generators would be much easier (see experiences such as Mobl-lang)
  5. From a business perspective, except for a few success stories, most of the native application exploit a well known brand to raise the number of downloads. For the rest of the world, the web model of linking and connecting resources is still the best one for gathering reasonable traffic.

As for the possible downsides, I think most of them can be easily solved:

  1. graphical coherency with the plaform of choice can be obtained reasonably easily
  2. integration with the in/out devices and sensors should be granted at the API level by the browser and should not require ad hoc coding

WebRatio also did some industrial implementations of mobile web applications. See for instance the B&B site for iPad.

    Thesis and Project proposals for students from Politecnico

    [ENG] Here are some recent proposals for theses and projects for students:

    • implementation, using AJAX and other rich internet technologies, of online web engineering tools. The tools will allow to draw diagrams of web sites and web applications, and will dynamically deploy/update the corresponding web application
    • integration of web application model transformations (MDD/MDA) with existing verification framework, for checking the correctness of the designs

    [ITA] Proposte recenti per progetti e tesi presso il Politecnico di Milano:

    • implementazione, basata su AJAX e altre tecnologie per Rich Internet Applications, di strumenti CASE online per la progettazione di applicazioni web. L’applicazione deve consentire di realizzare diagrammi di applicazioni e generare (e fare deploy) dinamicamente le corrispondenti applicazioni
    • integrazione di trasformatori di modelli di applicazioni web (secondo l’approccio MDD/MDA) con strumenti esistenti di verifica formale, per controllare la correttezza di progetti web