Data Science for Good City Life

On March 10, 2017 we hosted a seminar by Daniele Quercia in the Como Campus of Politecnico di Milano, on the topic:

Good City Life

daniele-quercia-good-city-life-smartcity
Daniele Quercia

Daniele Quercia leads the Social Dynamics group at Bell Labs in Cambridge
(UK)
. He has been named one of Fortune magazine’s 2014 Data All-Stars, and spoke about “happy maps” at TED.  His research has been focusing in the area of urban informatics and received best paper awards from Ubicomp 2014 and from ICWSM 2015, and an honourable mention from ICWSM 2013. He was Research Scientist at Yahoo Labs, a Horizon senior researcher at the University of Cambridge, and Postdoctoral Associate at the department of Urban Studies and Planning at MIT. He received his PhD from UC London. His thesis was sponsored by Microsoft Research and was nominated for BCS Best British PhD dissertation in Computer Science.

His presentation will contrast the corporate smart-city rhetoric about efficiency, predictability, and security with a different perspective on the cities, which I think is very inspiring and visionary.

“You’ll get to work on time; no queue when you go shopping, and you are safe because of CCTV cameras around you”. Well, all these things make a city acceptable, but they don’t make a city great.

This slideshow requires JavaScript.

Daniele is launching goodcitylife.org – a global group of like-minded people who are passionate about building technologies whose focus is not necessarily to create a smart city but to give a good life to city dwellers. The future of the city is, first and foremost, about people, and those people are increasingly networked. We will see how a creative use of network-generated data can tackle hitherto unanswered research questions. Can we rethink existing mapping tools [happy-maps]? Is it possible to capture smellscapes of entire cities and celebrate good odors [smelly-maps]? And soundscapes [chatty-maps]?

The complete video of the seminar has been streamed live on youtube and is now available online at https://www.youtube.com/watch?v=Z0IprrZ7phc&w=560&h=315 and embedded here:

The seminar was open to the public and hosted at the Polo Regionale di Como headquarters of Politecnico di Milano, located in Via Anzani 42, III floor, Como.

You can also download the Good City Life flyer.

When a Smart City gets Personal

When people talk about smart cities, the tendency is to think about them in a technology-oriented or sociology-oriented manner.

However, smart cities are the places where we leave and work everyday now.

Here is a very broad perspective (in Italian) about the experience of big data analysis and smart city instrumentation for the town of Como, in Italy: an experience on how phone calls, mobility data, social media, people counters can contribute to take and evaluate decisions.

skype-2

You can read it on my Medium channel.

View story at Medium.com

The role of Big Data in Banks

I was listening at R. Martin Chavez, Goldman Sachs deputy CFO just last month in Harvard at the ComputeFest 2017 event, more precisely, the SYMPOSIUM ON THE FUTURE OF COMPUTATION IN SCIENCE AND ENGINEERING on “Data, Dollars, and Algorithms: The Computational Economy” held in Harvard on Thursday, January 19, 2017.

His claim was that

Banks are essentially API providers.

The entire structure and infrastructure of Goldman Sachs is being restructured for that. His case is that you should not compare a bank with a shop or store, you should compare it with Google. Just imagine that every time you want to search on Google you need to get in touch (i.e., make a phone call or submit a request) to some Google employee, who at some points comes back to you with the result. Non sense, right?  Well, but this is what actually happens with banks. It was happening with consumer-oriented banks before online banking, and it’s still largely happening for business banks.

But this is going to change. Amount of data and speed and volume of financial transaction doesn’t allow that any more.

Banks are actually among the richest (not [just] in terms of money, but in data ownership). But they are also craving for further “less official” big data sources.

c4tmizavuaa1fc3
Juri Marcucci: Importance of Big Data for Central (National) Banks.

Today at the ISTAT National Big Data Committee meeting in Rome, Juri Marcucci from Bank of Italy discussed their research activity in integration of Google Trends information in their financial predictive analytics.

Google Trends provide insights of user interests in general, as the probability that a random user is going to search for a particular keyword (normalized and scaled, also with geographical detail down to city level).

Bank of Italy is using Google Trends data for complementing their prediction of unemployment rates in short and mid term. It’s definitely a big challenge, but preliminary results are promising in terms of confidence on the obtained models. More details are available in this paper.

Paolo Giudici from University of Pavia showed how one can correlate the risk of bank defaults with their exposition on Twitter:

c4tuo4yxuae86gm
Paolo Giudici: bank risk contagion based (also) on Twitter data.

Obviously, all this must take into account the bias of the sources and the quality of the data collected. This was pointed out also by Paolo Giudici from University of Pavia. Assessment of “trustability” of online sources is crucial. In their research, they defined the T-index on Twitter accounts in a very similar way academics define the h-index for relevance of publications, as reported in the photographed slide below.

dig
Paolo Giudici: T-index describing the quality of Twitter authors in finance.

It’s very interesting to see how creative the use of (non-traditional, web based) big data is becoming, in very diverse fields, including very traditional ones like macroeconomy and finance.

And once again, I think the biggest challenges and opportunities come from the fusion of multiple data sources together: mobile phones, financial tracks, web searches, online news, social networks, and official statistics.

This is also the path that ISTAT (the official institute for Italian statistics) is pursuing. For instance, in the calculation of official national inflation rates, web scraping techniques (for ecommerce prices) upon more than 40.000 product prices are integrated in the process too.

 

 

The Dawn of a new Digital Renaissance in Cultural Heritage

Fluxedo joined forces with the Observatory of Digital Innovation in Arts & Culture Heritage (Osservatorio per l’innovazione digitale nei beni e attività culturali) by the School of Management (MIP) of Politecnico di Milano, for covering the social media analytics of Italian and international museums.

The results of the work have been presented during an event on January 19th, 2017 hosted by Piccolo Teatro di Milano, which was very successful.

The live dashboard of the SocialOmeters analysis on the museums is available here:

www.socialometers.com/osservatoriomusei/

socialometers_musei_

A summary of the event through social media content of the event as generated via Storify is available here.

beni_e_attivita_culturali__l_alba_del_rinascimento_digitale__with_images__tweets__%c2%b7_marcobrambi_%c2%b7_storify

The official hashtag of the event #OBAC17 has become Twitter trend  in Italy, with 579 tweets, 187 users, around 600 likes and retweets, and a potential audience of 2.2 million users.

The event had a huge visibility on the national media, as reported in this press review:

1.      La rivoluzione dei musei online. Il primato di Triennale e Pinacoteca – read Il Corriere della Sera Milano

2.      L’innovazione prolifera (ma fatica) – read Il Sole 24 Ore Nòva

3.      I musei italiani e la digitalizzazione: il punto del Politecnico di Milano – read Advertiser

4.      Osservatorio Politecnico, musei social ma con pochi servizi digitali – read Arte Magazine

5.      Arte & Innovazione. Musei italiani sempre più social (52%) e virtuali (20%) – read
Corriere del Web

6.      Boom di visitatori nei musei ma è flop dei servizi digitali – read Il Sole 24 Ore Blog

7.      Capitolini, comunali e Maxxi di Roma tra i musei più popolari sui social network – read La Repubblica Roma

8.      La pagina Facebook della Reggia di Venaria è la più apprezzata d’Italia con oltre 166 mila “like” – read La Stampa Torino

9.      Il 52% dei musei italiani è social ma i servizi digitali per la fruizione delle opere sono limitati. Un’analisi dell’Osservatorio Innovazione Digitale nei Beni e Attività Culturali – read Lombard Street
10.  Musei sempre più social, ecco i più cliccati  – read TTG Italia

11.  Tanta cultura, poco digitale: solo il 52% dei musei italiani è sui social e il 43% non ha ancora un sito – read Vodafone News

12.  Tra Twitter e Instagram, 52% musei italiani punta sui social media – read ADNKronos

13.  Musei strizzano occhi a social, ma strada è lunga – read ANSA ViaggiArt

14.  Oltre la metà dei musei italiani è online e sui social, ma i servizi digitali evoluti e quelli on site sono ancora scarsi – read Brand News

15.  Il 52% dei musei italiani è social ma i servizi digitali per la fruizione delle opere sono limitati – read DailyNet

16.  Musei italiani sempre più social, ma i servizi digitali sono limitati – read Diario Innovazione

17.  Musei e social network – read Inside Art

18.  Musei Vaticani e Maxxi tra i più social d’Italia – read Il Messaggero

19.  I musei si fanno spazio sui social – read Italia Oggi

20.  Musei italiani social, ma non troppo – read La Repubblica

21.  Capitolini e Maxxi da record sui social – read La Repubblica Roma

22.  Musei lucani poco social e poco visitabili in web – read La Siritide

23.  Venaria regina dei social – read La Stampa Torino

24.  In calo nel 2016 il numero degli ingressi nei luoghi di cultura in Basilicata – read Oltre

25.  Musei sempre più social, ma poco interattivi – read QN ILGIORNO – il Resto del Carlino – LA NAZIONE

26.  Dal marketing alle guide per disabili. Cultura, boom dell’industria digitale – read QN ILGIORNO – il Resto del Carlino – LA NAZIONE

27.  Musei romani sempre più social – read Radio Colonna

28.  Social network: il Maxxi tra i musei più popolari – read Roma2Oggi

29.  Musei italiani sempre più social, ma la strada è ancora lunga – read Travel No Stop

30.  Musei italiani sempre più social e virtuali – read Uomini & Donne della Comunicazione

31.  Tra Twitter e Instagram, un museo su due in Italia scommette sui …  – read Italia per Me

32.  Beni culturali, musei lucani poco social – read La Nuova del Sud

33.  Fb, Instagram e Twitter: i musei italiani puntano sui social ma non basta – read  La Repubblica

34.  Tra Twitter e Instagram, un museo su due in Italia scommette sui social media – read La Stampa

35.  Il 52% dei musei italiani è social ma i servizi digitali per la fruizione delle opere sono limitati – read Sesto Potere

36.  Il 52% dei musei italiani è social, ma la fruizione delle opere digital è limitata – read Il Sole 24 Ore

The Harvard-Politecnico Joint Program on Data Science in full bloom

After months of preparation, here we are.

This week we kicked off the second edition of the DataShack program on Data Science that brings together interdisciplinary teams of data science, software engineering & computer science, and design students from Harvard (Institute of Applied Computational Science) and Politecnico di Milano (faculties of Engineering and Design).

The students will address big data extraction, analysis, and visualization problems provided by two real-world stakeholders in Italy: the Como city municipality and Moleskine.

logo-moleskineThe Moleskine Data-Shack project will explore the popularity and success of different Moleskine products co-branded with other famous brands (also known as special editions) and launched in specific periods in time. The main field of analysis is the impact that different products have on social media channels. Social media analysis then will be correlated with product distribution and sales performance data, along multiple dimensions (temporal, geographical, etc.) and product features.

logo-comoThe project consists of collecting and analyzing data about the city and the way people live and move within it, by integrating multiple and diverse data sources. The problems to be addressed may include providing estimates of human density and movements within the city, predicting the impact of hypothetical future events, determining the best allocation of sensors in the streets, and defining optimal user experience and interaction for exploring the city data.

img_3doc5n
The kickoff meeting of the DataShack 2017 projects, in Harvard. Faculties Pavlos Protopapas, Stefano Ceri, Paola Bertola, Paolo Ciuccarelli and myself (Marco Brambilla) are involved in the program.

The teams have been formed, and the problems assigned. I really look forward to advising the groups in the next months and seeing the results that will come out. The students have shown already commitment and engagement. I’m confident that they will be excellent and innovative this year!

For further activities on data science within our group you can refer to the DataScience Lab site, Socialometers, and Urbanscope.

 

Modeling and Analyzing Engagement in Social Network Challenges

Within a completely new line of research, we are exploring the power of modeling for human behaviour analysis, especially within social networks and/or in occasion of large scale live events. Participation to challenges within social networks is a very effective instrument for promoting a brand or event and therefore it is regarded as an excellent marketing tool.
Our first reasearch has been published in November 2016 at WISE Conference, covering the analysis of user engagement within social network challenges.
In this paper, we take the challenge organizer’s perspective, and we study how to raise the
engagement of players in challenges where the players are stimulated to
create and evaluate content, thereby indirectly raising the awareness about the brand or event itself. Slides are available on slideshare:

We illustrate a comprehensive model of the actions and strategies that can be exploited for progressively boosting the social engagement during the challenge evolution. The model studies the organizer-driven management of interactions among players, and evaluates
the effectiveness of each action in light of several other factors (time, repetition, third party actions, interplay between different social networks, and so on).
We evaluate the model through a set of experiment upon a real case, the YourExpo2015 challenge. Overall, our experiments lasted 9 weeks and engaged around 800,000  users on two different social platforms; our quantitative analysis assesses the validity of the model.

cross-platform_pdf

Business Process Management & Enterprise Architecture track of ACM SAC 2017

This year I’m co-organizing with Davide Rossi and a bunch of experts in Business Process Management and Enterprise Architecture a new event called BPM-EA, which aims at bringing together the broad topics of business processes, modeling, and enterprise architecture.

These disciplines are quickly evolving and intertwining with each other, and are often referred to with the broad term of business modeling.
I believe there is a strong need of exploring new paths of improvement, integration and consolidation of these disciplines.
If you are interested to participate and contribute, we seek contributions in the areas of enterprise and systems architecture and modeling, multilevel models tracing and alignment, models transformation, IT & business alignment (both in terms of modeling and goals), tackling both technical (languages, systems, patterns, tools) and social (collaboration, human-in-the-loop) issues.
The deadline for submitting a paper is September 15, 2016.
You can find the complete call and further details on the event website:
BPMEA track at SAC 2017

Feel free to share your ideas, opinions and criticisms here or as a submission to the event.

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

Keynote from Google Research on Building Knowlege Bases at #ICWE2016

I report here some highlights of the keynote speech by Xin Luna Dong at the 16th International Conference on Web Engineering (ICWE 2016). Incidentally, she is now moving to Amazon for starting a new project on building an Amazon knowledge base.
Building knowledge bases still remains a challenging task.
First, one has to decide how to build the knowledge: automatically or manually?
A survey in 2014 reported the following list of large efforts in knowledge building: the top 4 approaches are manually curated, the bottom 3 are automatic.
Google’s knowledge vault and knowledge Graph are the big winners in terms of volume.
When you move to long tail content, curation does not scale. Automation must be viable and precise.
This is in line with our own research line we are starting on Extracting Changing Knowledge (we presented a short paper at a Web Science 2016 workshop last month). Here is a summary of our approach:
Where knowledge can be extracted from? In Knowledge Valut:
  • largest share of the content comes from DOM structured documents
  • then textual content
  • then annotated content
  • and a small share from web tables

Knowledge Vault is a matrix based approach to knowledge building, with rows = entities and columns= attributes.

It assumes the entities to be available (e.g. in Freebase), and builds a training over that.
One can build KBs by building buckets of triples, with similar probability of being correct. It’s important to precisely estimate correctness probability.
Errors can include mistakes on:
  • triple identification
  • entity linkage
  • predicate linkage
  • source data

Besides general purpose KBs, Google built lightweight vertical knowledge bases (more than 100 available now).

When extracting knowledge, the ingredients are: datasource, extractor approach, the data items themselves, facts and their probability of truth.

Several models can be used for extracting knowledge. Two extremes of the spectrum are:

  1. Single-truth model. Every fact has only one truth. We trust the value of the highest number of datasources.
  2. Multilaeyer model. separates source quality from extractor quality and data errors from extraction errors. One can build a knowledge-based trust model, defining trustworthiness of web pages. One can compare this measure with respect to page rank of web pages:

In general, the challenge is to move from individual information and data points, to integrated and connected knowledge. Building the right edges is really hard though.
Overall, a lot of ingredients influence the correctness of knowledge: temporal aspects, data source correctness, capability of extraction and validation, and so on–

In summary: Plenty of research challenges to be addressed, both by the datascience and modeling communities!

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

Modeling and data science for citizens: multicultural diversity and environmental monitoring at ICWSM

This year we decided to be present at ICWSM 2016 in Cologne, with two contributions that basically blend model driven software engineering and big data analysis, to provide value to users and citizens both in terms of high quality software and added value information provision.

We joined with two papers, respectively:
Model Driven Development of Social Media Environmental Monitoring Applications presented at the SWEEM (Workshop on the Social Web for Environmental and Ecological Monitoring) workshop.

Slides here:

and:

Studying Multicultural Diversity of Cities and Neighborhoods through Social Media Language Detection, presented at the CityLab workshop at ICWSM 2016. The focus of this work is to study cities as melting pots of people with different culture, religion, and language. Through multilingual analysis of Twitter contents shared within a city, we analyze the prevalent language in the different neighborhoods of the city and we compare the results with census data, in order to highlight any parallelisms or discrepancies between the two data sources. We show that the officially identified neighborhoods are actually representing significantly different communities and that the use of the social media as a data source helps to detect those weak signals that are not captured from traditional data. Slides here:

We now continuously look for new dataset and computational challenges. Feel free to ask or to propose ideas!

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

Ready to crowdsourcing your modeling language notation?

As model-driven engineering practitioners, we sometimes encounter weird modelling notations for the languages we use… and this is also definitely true for modelling language adopters!

We always end up wondering who could ever think about such or such terrible syntax for a language, also for very well established notations (including, for instance, some pieces of UML or BPMN). I take it for granted this is a common experience (raise your hand if not).

 This lead to the idea that also syntax definition should be a more collaborative task. Therefore, we decided to give it a try and test whether crowdsourcing techniques can be used to create and validate language constructs, in particular, its concrete syntax (i.e. notation). 
As part of our research work in this area, together with Jordi Cabot’s group, we have setup as an experiment a crowdsourcing campaign using our tool CrowdSearcher
This boils down to a very simple case: we are asking anyone on the web to look into a very small subset of  BPMN, and to participate into 3 simple tasks, including questions for selecting the best notation for some of the BPMN concepts (it won’t take more than 3 minutes). 
Please help us responding these 3 quick questions!
(and feel free to share the link with anyone else)
You can access to the campaign in the following link:
Some disclaimers:
1. we don’t care if you are the world’s expert in BPMN or if you never heard about it. We want you!
2. we ask you to register before taking the task (just click on the Register button once you enter the task), simply to make sure we only have one performance per person. All the analysis will run on anonymous data.
3. The results of the survey will be made publicly available in the following months. 
To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).