A Vision towards the Cognification of Model-driven Software Engineering

Jordi Cabot, Robert Clarisó, Marco Brambilla and Sébastien Gerard submitted a visionary paper on Cognifying Model-driven Software Development to the workshop GrandMDE (Grand Challenges in Modeling) co-located with STAF 2017 in Margburg (Germany) on July 17, 2017. The paper advocates for the cross-domain fertilization of disciplines such as machine learning and artificial intelligence, behavioural analytics, social studies, cognitive science, crowdsourcing and many more, in order to help model-driven software development.
But actually, what is cognification?

Cognification is the application of knowledge to boost the performance and impact of any process.

It is recognized as one of the 12 technological forces that will shape our future. We are flooded with data, ideas, people, activities, businesses, and goals. But this flooding could even be helpful.

The thesis of our paper is that cognification will also revolution in the way software is built. In particular, we discuss the opportunities and challenges of cognifying Model-Driven Software Engineering (MDSE or MDE) tasks.

MDE has seen limited adoption in the software development industry, probably because the perception from developers’ and managers’ perspective is that its benefits do not outweigh its costs.

We believe cognification could drastically improve the benefits and reduce the costs of adopting MDSE, and thus boost its adoption.

At the practical level, cognification comprises tools that go from artificial intelligence (machine learning, deep learning, as well as human cognitive capabilities, exploited through online activities, crowdsourcing, gamification and so on.

Opportunities (and challenges) for MDE

Here is a set of MDSE tasks and tools whose benefits can be especially boosted thanks to cognification.

  • A modeling bot playing the role of virtual assistant in the modeling tasks
  • A model inferencer able to deduce a common schema behind a set of unstructured data coming from the software process
  • A code generator able to learn the style and best practices of a company
  • A real-time model reviewer able to give continuous quality feedback
  • A morphing modeling tool, able to adapt its interface at run-time
  • A semantic reasoning platform able to map modeled concepts to existing ontologies
  • A data fusion engine that is able to perform semantic integration and impact analysis of design-time models with runtime data
  • A tool for collaboration between domain experts and modeling designers

A disclaimer

Obviously, we are aware that some research initiatives aiming at cognifying specific tasks in Software Engineering exist (including some activities of ours). But what we claim here is a change in magnitude of their coverage, integration, and impact in the short-term future.

If you want to get a more detailed description, you can go through the detailed post by Jordi Cabot that reports the whole content of the paper.

Extracting Emerging Knowledge from Social Media

Today I presented our full paper titled “Extracting Emerging Knowledge from Social Media” at the WWW 2017 conference.

The work is based on a rather obvious assumption, i.e., that knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail.

Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately reflects the relevant changes which hide emerging entities.

In the paper we propose a method and a tool for discovering emerging entities by extracting them from social media.

Once instrumented by experts through very simple initialization, the method is capable of finding emerging entities; we propose a mixed syntactic + semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors, built by using terms occurring in their social content, and then ranks the candidates by using their distance from the centroid of seeds, returning the top candidates as result.

The method can be continuously or periodically iterated, using the results as new seeds.

The PDF of the full paper presented at WWW 2017 is available online (open access with Creative Common license).

You can also check out the slides of my presentation on Slideshare.

A version of the tool is available online for free use, thanks also to our partners Dandelion API and Microsoft Azure. The most recent version of the tool is available on GitHub here.

Trends in Web engineering: my 4 takes from ICWE 2012

At this point of the year, just before vacation time, it makes sense to me to think to Web Engineering practices at large and draw some trends and outlook for the field after this year.
As a PC chair of ICWE 2012 (International Conference on Web Engineering), this year I can claim I had a privileged view over the entire event and I think this was a good test for assessing the field.
Furthermore, being directly involved into the organization of MDWE workshop, I have been directly exposed to the specific aspects of the model-driven field for the Web.
I see the following trends in the field:

  • still limited attention to non-functional aspects in Web application development. Honestly, this doesn’t make any sense to me and makes me think that this is one of the reasons why Web Engineering is still seen as a niche sector, both in industry and academia. Actually, at least some awareness starts appearing (e.g., some works on assessing the productivity of model-driven approaches have been discussed), but actual results are still very preliminary. And the Web is ALL ABOUT NON-FUNCTIONAL ISSUES!
  • mashups are still getting a lot of attention, as demonstrated by the successful ComposableWeb workshop too. However, I think we need to rethink a little bit this field. My feeling is that traditional mashups are no longer interesting per se. Even very well known solutions like Yahoo Pipes! have reached a limited audience, and in general mashups have never reached the maturity level that let them be used for producing enterprise-class professional applications. So, am I claiming that mashups are a complete failure? Not at all! They actually represent an important step that enabled the current trends toward Web API integration, which are probably used in most of the existing Web sites. I think that the mashup community in the future should look at the broad problem of Web API integration at large.
  • the role of model-driven approaches seems to slowly move out of its original position, strictly related to modeling of Web user interfaces. The MDWE workshop raised a very interesting discussion on the role of MDE and Agile approaches (thanks also to the support of the nice Instant Community system provided by University of Trento). Some activities, including our work towards integration of WebML modeling with social networks or towards integration with Business Process Management, try to broaden the applicability of the approaches, but I think more effort is needed to revitalize the field.
  • Finally, content and social analysis (both syntactical/textual and semantic) is getting more and more interest in the community. This is demonstrated by the wide set of well-attended ICWE tutorials that addressed these issues (The Web of Data for E-Commerce, Epidemic Intelligence for the Crowd, SPARQL and Queries over Linked Data, Natural Language Processing for the Web).
If you see some other interesting trends in Web Engineering, feel free to share your thoughts!

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

ExploreWeb workshop on exploration of (Semantic) Web data at ICWE 2011

Together with Piero Fraternali and Daniel Schwabe I organized a workshop at ICWE 2011 on on Search, Exploration and Navigation of Web Data Sources, named ExploreWeb 2011.
It was a new challenge for us, because it was at its first edition, but I can say it has been a quite successful event.
We got 12 submissions and we accepted 7 of them, and furthermore we invited Soren Auer as a keynote speaker to start the day. The attendance was also very good, as we got 25+ people in the room for the whole day.
Here is a quick summary of the day.

Soren Auer: Exploration and other stages of the Linked Data Life Cycle

Soren Auer at exploreWeb 2011.

Soren Auer from University of Leipzig gave a very nice keynote talk at the beginning of the exploreWeb workshop on the entire lifecycle of Linked Data.
 The talk was centered on the requirements imposed by the continuous growth of the Linked Open Data cloud (LOD) and on the life cycle associated to the LD contents. Such life cycle comprises the following phases:

  1.  Extraction of LOD: extracting linked data is a challenge per se. Indeed, for instance in the case of DBpedia extraction from Wikipedia, the issues to be considered include: keeping aligned the semantic version with the user generated one; and at the same time coping with the messy and incoherent “schemas” offered by Wikipedia infoboxes. For covering this aspect, a “Mapping Wiki” based on a higher level ontology has been created for defining the mapping between labels in Wikipedia.
  2. Storage and Querying of LOD: the critical issue here is that it’s still 5 to 50 time slower than RDBMS. On the other side, it obviously grants increased flexibility (especially at the schema manipulation level). A new benchmark recently performed by Soren’s group provides new performance results for Virtuoso, Sesame, Jena, and BigOWLIM. The benchmark was performed on 25 frequent DBpedia queries and shows that Virtuoso consistently grants speed two times higher than the competitors, while Jena confirms as the most poorly performing platform.
  3. Authoring of LOD: different approaches can be adopted, including: Semantic Wikis (e.g., OntoWiki), in which users do not edit text but semantic descriptions built with forms. We can identify two main classes of semantic wikis: semantic text wikis and semantic data wiki. A new approach is now adopted by the new RDFa Content Editor, which uses OpenCalais and other APIs for helping annotating the text within a WYSIWYG environment.
  4. Linking LOD: approaches to linking can be automatic, semi-automatic (e.g., see the tools SILK and LIMES), or manual (e.g., see Sindice in UIs and Semantic Pingback). 
  5. Evolution of LOD: the evolution of linked data is a critical problem, not yet fully addressed. The EvoPat project is a first attempt to formalize the problem and the solution, by defining a set of evolution patterns and anti-patterns. Some features are already integrated into Ontowiki. 
  6. Exploration of LOD: Challenging because of: size, heterogeneity, distributedness.  Spacial and faceted exploration of LinkedGeoData #ld #semweb.  #freebase is the best search assistant for #ld . Also: Parallax and neofonie faceted browser . domain-specific exploration tools (relationship finder on RDF), visual query builders, … 
  7. Visualization of LOD: on this, Soren highlighted that with the continuously growing size of LOD, the (semantic) data visualization will become more and more important. He presented some preliminary approaches but a lot of work still needs to be done in this field.

 In the discussion and Q&A that followed the keynote, the hot topics have beenthe performance benchmark and the authoring of LOD related to the end users v expert/ technical user

Alessandro Bozzon: A Conceptual Framework for Linked Data Exploration

Alessandro Bozzon

Alessandro discussed some motivation to the problem of exploration and integration of linked data sources and then described the Search Computing approach to linked data exploration, which applies the general purpose SeCo framework to the specific needs of the LOD.
More on this can be found on the Search Computing web site, including also a demo and a video.

Daniel Schwabe: Support for reusable explorations of Linked Data in the Semantic Web
Daniel Schwabe started his talk with some strong motivation statements.
One of the main benefits of linked data should be that data bring their own self-description.
However, if you work on it you may end up doing really dirt works on the data, to make them linked.

Daniel Schwabe

When you go to exploration interfaces, expectations of the end users might be very different with respect to what the exploration tools for tech-savvy user. That gap needs to be filled, and Rexplorator moves in that direction. Explorator was presented in the Linked Data workshop(LDOW) in Madrid in 2009. Now its extension Rexplorator has been demonstrated at ISWC 2010 and now presented extensively at the ExploreWeb workshop.
With it, you can do composition of functions, parametrization of operators, storage and reuse of “use cases”, with a query by example approach. The UI lets you think that you are dealing with resources and sets of resources, but actually the system is dealing only with triples and SPARQL queries.
A pretty interesting approach, which has something in common with the Search Computing one, and also features great UI and expressive power. It also covers faceted search.
Rexplorator is a MVC based application implemented with Ruby using ActiveRDF DSL.

In-Young Ko

Han-Gyu Ko and In-Young Ko. Generation of Semantic Clouds based on Linked Data for Efficient Multimedia Semantic Annotations
The presentation started from the definition of the requirements of semantic cloud generation: the idea is to produce tag clouds and help people annotating multimedia contents (e.g., for IP-TV contents).
The requirements include being able to:

  • identify the optimal number of tag clouds
  • balance the size of the different clouds shown to the users
  • check the coherency between the clouds and avoid ambiguity of each cloud.

The proposed lifecycle includes three phases:

  1. locating the spotting points: with a context-aware searching of linked data, starting from more important and densely connected nodes. More general nodes are more likely to be selected
  2. selecting the relations to traverse: the aim here is to reduce the RDF graph to the set of relevant relations only
  3. identify term similarity and clustering of tags.

If compared with simpler approaches for constructing clouds (e.g., based on rdf:type and SKOS parsing), this approach leads to better and more meaningful clouds of keywords.
The implemented system overlays the generated clouds upon the IPTV screen and let people select the tags.

Mamoun Abu Helou. Segmentation of Geo-Referenced Queries

Mamoun Abu Helou

This work aimed at manipulating natural language, multi-objective queries so as to split them into several simple single-aim queries.
The focus of the work was limited to geographical queries. It exploited Geowordnet, Yago, GeoNames, and Google GeoCoder API for identifying the important geographical concepts in the query. Both instances (e.g., Louvre) and classes (e.g., museum) can be identified.
A benchmark over 250 queries show promising results for the approach.

Peter Dolog.SimSpectrum: A Similarity Based Spectral Clustering Approach to Generate a Tag Cloud
Peter’s work addressed the specific problem of clustering within tag clouds.There are some problems in clouds: recent tags are overlooked because they have lower frequency; frequent ones are often useless; … .
The presentation delved into the discussion on the selection of the best algorithms for clustering of tags.
The aim was to reduce the number of tags, pick the most relevant ones, and put at nearby locations in the cloud the semantically close terms. The evaluation of the approach has been calculated in terms of coverage, overlap and relevance between the queries and the generated clouds, in the medical field.

Matthias Keller. A Unified Approach for Modeling Navigation over Hierarchical, Linear and Networked Structures

Matthias Keller

This is a visionary presentation on the needs and possible directions for a navigational model for data structures.
Data structures are very diverse (trees, graphs, …), and extracting hyperlink/access structure from the content structure is very difficult (basically there is no automatic transformation between the two). CMS enable something of this, but with limited expressive power and difficult configurability.
The idea is then to model:

  • the content organization supporting different graph-based content structures
  • the description of the access structures
  • the relation between the content and the access structure

The authors propose a graphical notation for covering these requirements and define some navigation patterns using this language.

Rober Morales-Chaparro. Data-driven and User-driven Multidimensional Data Visualization
This work aims at extracting automatically a set of optimal visualization of complex data, covering the entire lifecycle:

  1. the data model 
  2. the data mining
  3. the information model
  4. the visualization proposal engine
  5. the visualization model
  6. the code generation
  7. and the final generated application for the end user

To conclude, here is a simple tag cloud generated for the content discussed during the workshop:

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

WebML online tutorials (audio and video)

today I wish to point you to a set of tutorials I recorded some months ago.
They deal with WebML (Web Modeling Language) and the CASE tool WebRatio. They describe the WebML methodology for designing web applications and its modeling primitives.