Extracting Emerging Knowledge from Social Media

Today I presented our full paper titled “Extracting Emerging Knowledge from Social Media” at the WWW 2017 conference.

The work is based on a rather obvious assumption, i.e., that knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail.

Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately reflects the relevant changes which hide emerging entities.

In the paper we propose a method and a tool for discovering emerging entities by extracting them from social media.

Once instrumented by experts through very simple initialization, the method is capable of finding emerging entities; we propose a mixed syntactic + semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors, built by using terms occurring in their social content, and then ranks the candidates by using their distance from the centroid of seeds, returning the top candidates as result.

The method can be continuously or periodically iterated, using the results as new seeds.

The PDF of the full paper presented at WWW 2017 is available online (open access with Creative Common license).

You can also check out the slides of my presentation on Slideshare.

A demo version of the tool is available online for free use, thanks also to our partners Dandelion and Microsoft Azure.

You can TRY THE TOOL NOW if you want.

My interview on Social Media and Society: what I said (and what I didn’t)

My recent interview on the evolution of social media and its role in modern society is available on YouTube (in Italian only, sorry about that).

While the 3+ minutes of speech necessarily had to be a general overview on the role and recent changes of social media, I wish to summarise here the some technical aspects of it.

As I mentioned in the presentation:

  • social media changed a lot since their early days, from being consumed on PCs to mobile devices, from general purpose social networks connecting friends to digital stages where we “sell” our life to the entire world, from places where to share personal information to platforms where to publish also objective information coming from the real world experience.
  • social media are nowadays a valuable source of information for companies, who look for (and find) their customers through social media marketing and advertising, and public institutions and researchers, that can leverage on a large amount of data for providing benefits to our everyday life
YourExpo2015 - the Instagram Photo Challenge of Expo2015 MilanoWhat I didn’t say is how you can do that. Well, it’s pretty simple. 
The ingredients of the recipe: 
  • A lot of users sharing their profile
  • A lot of content (photos, statuses, geotags, descriptions) shared by people
  • (which makes up a VERY big data problem)
  • crawlers capturing this (or stream capturing systems) and storage as needed
  • MODELS of the context, the problem and the solution
  • and DATA ANALYSIS TOOLS for studying the data and extracting meaningful information
To me, the most valuable points are MODELS and ANALYSIS TOOLS. We are doing a lot of experiments on mixing model-driven techniques with semantic analysis, NLP, and social media monitoring. One example of our experiments is the YourExpo2015 Instagram Photo Challenge
Have a look and participate if you like. More on this coming soon!

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).