Improving Topic Modeling with Knowledge Graph Embeddings

Topic modeling techniques have been applied in many scenarios in recent years, spanning textual content, as well as many different data sources. The existing researches in this field continuously try to improve the accuracy and coherence of the results. Some recent works propose new methods that capture the semantic relations between words into the topic modeling process, by employing vector embeddings over knowledge bases.

In our recent paper presented at the AAAI-MAKE Spring Symposium 2019, held at Stanford University, we studied how knowledge graph embeddings affect topic modeling performance on textual content. In particular, the objective of the work is to determine which aspects of knowledge graph embedding have a significant and positive impact on the accuracy of the extracted topics.

We improve the state of the art by integrating some avanced graph embedding approaches (specifically designed for knowledge graphs) within the topic extraction process.
We also studied how the knowledge base could be expanded by using dataset-specific relations between the words.
We implemented the method and we validated it with a set of experiments with 2 variations of the knowledge base, 7 embedding methods, and 2 methods for incorporation of the embeddings into the topic modeling framework, also considering different parameterizations of topic number and embedding dimensionality.
Besides the specific technical results, the work has also aims at showing the potentials of integrating statistical methods with knowledge-centric methods. The full extent of the impact of these techniques shall be explored further in the future.
The details of the work are reported in the paper, which is available online here, and in the slides, also available online (on SlideShare and here below).

Trends in Web engineering: my 4 takes from ICWE 2012

At this point of the year, just before vacation time, it makes sense to me to think to Web Engineering practices at large and draw some trends and outlook for the field after this year.
As a PC chair of ICWE 2012 (International Conference on Web Engineering), this year I can claim I had a privileged view over the entire event and I think this was a good test for assessing the field.
Furthermore, being directly involved into the organization of MDWE workshop, I have been directly exposed to the specific aspects of the model-driven field for the Web.
I see the following trends in the field:

  • still limited attention to non-functional aspects in Web application development. Honestly, this doesn’t make any sense to me and makes me think that this is one of the reasons why Web Engineering is still seen as a niche sector, both in industry and academia. Actually, at least some awareness starts appearing (e.g., some works on assessing the productivity of model-driven approaches have been discussed), but actual results are still very preliminary. And the Web is ALL ABOUT NON-FUNCTIONAL ISSUES!
  • mashups are still getting a lot of attention, as demonstrated by the successful ComposableWeb workshop too. However, I think we need to rethink a little bit this field. My feeling is that traditional mashups are no longer interesting per se. Even very well known solutions like Yahoo Pipes! have reached a limited audience, and in general mashups have never reached the maturity level that let them be used for producing enterprise-class professional applications. So, am I claiming that mashups are a complete failure? Not at all! They actually represent an important step that enabled the current trends toward Web API integration, which are probably used in most of the existing Web sites. I think that the mashup community in the future should look at the broad problem of Web API integration at large.
  • the role of model-driven approaches seems to slowly move out of its original position, strictly related to modeling of Web user interfaces. The MDWE workshop raised a very interesting discussion on the role of MDE and Agile approaches (thanks also to the support of the nice Instant Community system provided by University of Trento). Some activities, including our work towards integration of WebML modeling with social networks or towards integration with Business Process Management, try to broaden the applicability of the approaches, but I think more effort is needed to revitalize the field.
  • Finally, content and social analysis (both syntactical/textual and semantic) is getting more and more interest in the community. This is demonstrated by the wide set of well-attended ICWE tutorials that addressed these issues (The Web of Data for E-Commerce, Epidemic Intelligence for the Crowd, SPARQL and Queries over Linked Data, Natural Language Processing for the Web).
If you see some other interesting trends in Web Engineering, feel free to share your thoughts!

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).