The role of Big Data in Banks

I was listening at R. Martin Chavez, Goldman Sachs deputy CFO just last month in Harvard at the ComputeFest 2017 event, more precisely, the SYMPOSIUM ON THE FUTURE OF COMPUTATION IN SCIENCE AND ENGINEERING on “Data, Dollars, and Algorithms: The Computational Economy” held in Harvard on Thursday, January 19, 2017.

His claim was that

Banks are essentially API providers.

The entire structure and infrastructure of Goldman Sachs is being restructured for that. His case is that you should not compare a bank with a shop or store, you should compare it with Google. Just imagine that every time you want to search on Google you need to get in touch (i.e., make a phone call or submit a request) to some Google employee, who at some points comes back to you with the result. Non sense, right?  Well, but this is what actually happens with banks. It was happening with consumer-oriented banks before online banking, and it’s still largely happening for business banks.

But this is going to change. Amount of data and speed and volume of financial transaction doesn’t allow that any more.

Banks are actually among the richest (not [just] in terms of money, but in data ownership). But they are also craving for further “less official” big data sources.

Juri Marcucci: Importance of Big Data for Central (National) Banks.

Today at the ISTAT National Big Data Committee meeting in Rome, Juri Marcucci from Bank of Italy discussed their research activity in integration of Google Trends information in their financial predictive analytics.

Google Trends provide insights of user interests in general, as the probability that a random user is going to search for a particular keyword (normalized and scaled, also with geographical detail down to city level).

Bank of Italy is using Google Trends data for complementing their prediction of unemployment rates in short and mid term. It’s definitely a big challenge, but preliminary results are promising in terms of confidence on the obtained models. More details are available in this paper.

Paolo Giudici from University of Pavia showed how one can correlate the risk of bank defaults with their exposition on Twitter:

Paolo Giudici: bank risk contagion based (also) on Twitter data.

Obviously, all this must take into account the bias of the sources and the quality of the data collected. This was pointed out also by Paolo Giudici from University of Pavia. Assessment of “trustability” of online sources is crucial. In their research, they defined the T-index on Twitter accounts in a very similar way academics define the h-index for relevance of publications, as reported in the photographed slide below.

Paolo Giudici: T-index describing the quality of Twitter authors in finance.

It’s very interesting to see how creative the use of (non-traditional, web based) big data is becoming, in very diverse fields, including very traditional ones like macroeconomy and finance.

And once again, I think the biggest challenges and opportunities come from the fusion of multiple data sources together: mobile phones, financial tracks, web searches, online news, social networks, and official statistics.

This is also the path that ISTAT (the official institute for Italian statistics) is pursuing. For instance, in the calculation of official national inflation rates, web scraping techniques (for ecommerce prices) upon more than 40.000 product prices are integrated in the process too.