The role of Big Data in Banks

I was listening at R. Martin Chavez, Goldman Sachs deputy CFO just last month in Harvard at the ComputeFest 2017 event, more precisely, the SYMPOSIUM ON THE FUTURE OF COMPUTATION IN SCIENCE AND ENGINEERING on “Data, Dollars, and Algorithms: The Computational Economy” held in Harvard on Thursday, January 19, 2017.

His claim was that

Banks are essentially API providers.

The entire structure and infrastructure of Goldman Sachs is being restructured for that. His case is that you should not compare a bank with a shop or store, you should compare it with Google. Just imagine that every time you want to search on Google you need to get in touch (i.e., make a phone call or submit a request) to some Google employee, who at some points comes back to you with the result. Non sense, right?  Well, but this is what actually happens with banks. It was happening with consumer-oriented banks before online banking, and it’s still largely happening for business banks.

But this is going to change. Amount of data and speed and volume of financial transaction doesn’t allow that any more.

Banks are actually among the richest (not [just] in terms of money, but in data ownership). But they are also craving for further “less official” big data sources.

Juri Marcucci: Importance of Big Data for Central (National) Banks.

Today at the ISTAT National Big Data Committee meeting in Rome, Juri Marcucci from Bank of Italy discussed their research activity in integration of Google Trends information in their financial predictive analytics.

Google Trends provide insights of user interests in general, as the probability that a random user is going to search for a particular keyword (normalized and scaled, also with geographical detail down to city level).

Bank of Italy is using Google Trends data for complementing their prediction of unemployment rates in short and mid term. It’s definitely a big challenge, but preliminary results are promising in terms of confidence on the obtained models. More details are available in this paper.

Paolo Giudici from University of Pavia showed how one can correlate the risk of bank defaults with their exposition on Twitter:

Paolo Giudici: bank risk contagion based (also) on Twitter data.

Obviously, all this must take into account the bias of the sources and the quality of the data collected. This was pointed out also by Paolo Giudici from University of Pavia. Assessment of “trustability” of online sources is crucial. In their research, they defined the T-index on Twitter accounts in a very similar way academics define the h-index for relevance of publications, as reported in the photographed slide below.

Paolo Giudici: T-index describing the quality of Twitter authors in finance.

It’s very interesting to see how creative the use of (non-traditional, web based) big data is becoming, in very diverse fields, including very traditional ones like macroeconomy and finance.

And once again, I think the biggest challenges and opportunities come from the fusion of multiple data sources together: mobile phones, financial tracks, web searches, online news, social networks, and official statistics.

This is also the path that ISTAT (the official institute for Italian statistics) is pursuing. For instance, in the calculation of official national inflation rates, web scraping techniques (for ecommerce prices) upon more than 40.000 product prices are integrated in the process too.



GSE Academic Award for Excellence for correlating twitter sentiment analysis and stock price variations

GSE - Guide Share EuropeEkaterina Shabunina recently graduated under my supervision as a M.Sc. student of the Como Campus of Politecnico di Milano with a thesis titled “Approach based on CRF to Sentiment Classification of Twitter Streams related to Companies”. Thanks to the innovation of her work, she won the Grand Prize 2013 for the GSE Academic Award for Excellence.

The work is based on the assumption that information produced and shared on social networks is getting more and more interesting as a source for inferring trends and happenings in the real world. She applied sentiment classification of Twitter streams related to companies and calculated statistical correlation analysis with the companies’ securities prices variation. Tweets are labeled with a tailored classification model, which by itself exhibits solid performance indicators, and then are correlated to stock market values. The approach applies the Conditional Random Fields probabilistic model to company-related Twitter data streams and shows that there is high correlation between the classified results and the stock market values, even when adopting a very simple feature model. In particular, it presents a near-perfect adherence of accumulated number of net positive tweets versus the stock’s closing price with an ideal level of significance of the regression and a 97.56% explanatory capacity of the achieved fitted equation in the best case.

The project will be presented on the GSE Management Summit in Barcelona on October 14th, 2013. Here is a short interview with Ekaterina.


IBM logoGSE (Guide Share Europe), a non-profit association of companies, organizations and individuals who are involved in Information and Communication Technology (ICT) solutions based on IBM architectures, established the GSE Academic Award for Excellence for students.

Further information about the awards is available on GSE website.

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).