Textual and Content-Based Search in Repositories of Web Application Models – TWEB paper

The paper “Textual and Content-Based Search in Repositories of Web Application Models” I co-authored together with Bojana Bislimovska, Alessandro Bozzon, and Piero Fraternali has now been published on the ACM Transactions on the Web (TWEB).

The article examines two different techniques for indexing and searching model repositories, with a focus on Web development projects encoded in the domain-specific language WebML. Keyword-based and content-based search (also known as query-by-example) are contrasted with respect to the architecture of the system, the processing of models and queries, and the way in which metamodel knowledge can be exploited to improve search. A thorough experimental evaluation is conducted to examine what parameter configurations lead to better accuracy and to offer an insight in what queries are addressed best by each system.

You can find the full text here:

You can download the full text for free even if you don’t have an ACM subscription, through this link:

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

A bottom-up, knowledge-aware approach to integrating and querying web data services – ACM Trans. on the Web

The October 2013 issue of the ACM Transaction on the Web includes an article of ours on bottom-up domain model design of connected web data sources. This is becoming a more and more important problem as a wealth of data services is becoming available on the Web. Indeed, building and querying Web applications that effectively integrate Web content is increasingly important. However, schema integration and ontology matching with the aim of registering data services often requires a knowledge-intensive, tedious, and error-prone manual process. In the paper we tackle this issue as described below.

The paper has been authored by Stefano Ceri, Silvia Quarteroni and myself within the research project Search Computing.

The full paper is available for download on the ACM Digital Library (free of charge, courtesy of the ACM Author-izer service) through this URL:


This is the summary of the contribution:

We present a bottom-up, semi-automatic service registration process that refers to an external knowledge base and uses simple text processing techniques in order to minimize and possibly avoid the contribution of domain experts in the annotation of data services. The first by-product of this process is a representation of the domain of data services as an entity-relationship diagram, whose entities are named after concepts of the external knowledge base matching service terminology rather than being manually created to accommodate an application-specific ontology. Second, a three-layer annotation of service semantics (service interfaces, access patterns, service marts) describing how services “play” with such domain elements is also automatically constructed at registration time. When evaluated against heterogeneous existing data services and with a synthetic service dataset constructed using Google Fusion Tables, the approach yields good results in terms of data representation accuracy.

We subsequently demonstrate that natural language processing methods can be used to decompose and match simple queries to the data services represented in three layers according to the preceding methodology with satisfactory results. We show how semantic annotations are used at query time to convert the user’s request into an executable logical query. Globally, our findings show that the proposed registration method is effective in creating a uniform semantic representation of data services, suitable for building Web applications and answering search queries.

The bibtex reference is as follows:

author = {Quarteroni, Silvia and Brambilla, Marco and Ceri, Stefano},
title = {A bottom-up, knowledge-aware approach to integrating and querying web data services},
journal = {ACM Trans. Web},
issue_date = {October 2013},
volume = {7},
number = {4},
month = nov,
year = {2013},
issn = {1559-1131},
pages = {19:1--19:33},
articleno = {19},
numpages = {33},
url = {http://doi.acm.org/10.1145/2493536},
doi = {10.1145/2493536},
acmid = {2493536},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Web data integration, Web data services, Web services, natural language Web query, service querying, structured Web search},

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

Answering search queries with CrowdSearcher – WWW2012

Our paper:

 Answering search queries with CrowdSearcher
has been accepted and presented at WWW 2012 in Lyon.

Here is the abstract:
Web users are increasingly relying on social interaction to complete and validate the results of their search activities. While search systems are superior machines to get world-wide information, the opinions collected within friends and expert/local communities can ultimately determine our decisions: human curiosity and creativity is often capable of going much beyond the capabilities of search systems in scouting “interesting” results, or suggesting new, unexpected search directions. Such personalized interaction occurs in most times aside of the search systems and processes, possibly instrumented and mediated by a social network; when such interaction is completed and users resort to the use of search systems, they do it through new queries, loosely related to the previous search or to the social interaction. In this paper we propose CrowdSearcher, a novel search paradigm that embodies crowds as first-class sources for the information seeking process. CrowdSearcher aims at filling the gap between generalized search systems, which operate upon world-wide information – including facts and recommendations as crawled and indexed by computerized systems – with social systems, capable of interacting with real people, in real time, to capture their opinions, suggestions, emotions. The technical contribution of this paper is the discussion of a model and architecture for integrating computerized search with human interaction, by showing how search systems can drive and encapsulate social systems. In particular we show how social platforms, such as Facebook, LinkedIn and Twitter, can be used for crowdsourcing search-related tasks; we demonstrate our approach with several prototypes and we report on our experiment upon real user communities.

The full paper is available here:

The presentation I gave is this one:

The demo video can be found on YouTube:

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

CrowdSearch 2012: my experience at the First International Workshop On Crowdsourcing Web Search at WWW2012

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

ExploreWeb workshop on exploration of (Semantic) Web data at ICWE 2011

Together with Piero Fraternali and Daniel Schwabe I organized a workshop at ICWE 2011 on on Search, Exploration and Navigation of Web Data Sources, named ExploreWeb 2011.
It was a new challenge for us, because it was at its first edition, but I can say it has been a quite successful event.
We got 12 submissions and we accepted 7 of them, and furthermore we invited Soren Auer as a keynote speaker to start the day. The attendance was also very good, as we got 25+ people in the room for the whole day.
Here is a quick summary of the day.

Soren Auer: Exploration and other stages of the Linked Data Life Cycle

Soren Auer at exploreWeb 2011.

Soren Auer from University of Leipzig gave a very nice keynote talk at the beginning of the exploreWeb workshop on the entire lifecycle of Linked Data.
 The talk was centered on the requirements imposed by the continuous growth of the Linked Open Data cloud (LOD) and on the life cycle associated to the LD contents. Such life cycle comprises the following phases:

  1.  Extraction of LOD: extracting linked data is a challenge per se. Indeed, for instance in the case of DBpedia extraction from Wikipedia, the issues to be considered include: keeping aligned the semantic version with the user generated one; and at the same time coping with the messy and incoherent “schemas” offered by Wikipedia infoboxes. For covering this aspect, a “Mapping Wiki” based on a higher level ontology has been created for defining the mapping between labels in Wikipedia.
  2. Storage and Querying of LOD: the critical issue here is that it’s still 5 to 50 time slower than RDBMS. On the other side, it obviously grants increased flexibility (especially at the schema manipulation level). A new benchmark recently performed by Soren’s group provides new performance results for Virtuoso, Sesame, Jena, and BigOWLIM. The benchmark was performed on 25 frequent DBpedia queries and shows that Virtuoso consistently grants speed two times higher than the competitors, while Jena confirms as the most poorly performing platform.
  3. Authoring of LOD: different approaches can be adopted, including: Semantic Wikis (e.g., OntoWiki), in which users do not edit text but semantic descriptions built with forms. We can identify two main classes of semantic wikis: semantic text wikis and semantic data wiki. A new approach is now adopted by the new RDFa Content Editor, which uses OpenCalais and other APIs for helping annotating the text within a WYSIWYG environment.
  4. Linking LOD: approaches to linking can be automatic, semi-automatic (e.g., see the tools SILK and LIMES), or manual (e.g., see Sindice in UIs and Semantic Pingback). 
  5. Evolution of LOD: the evolution of linked data is a critical problem, not yet fully addressed. The EvoPat project is a first attempt to formalize the problem and the solution, by defining a set of evolution patterns and anti-patterns. Some features are already integrated into Ontowiki. 
  6. Exploration of LOD: Challenging because of: size, heterogeneity, distributedness.  Spacial and faceted exploration of LinkedGeoData #ld #semweb.  #freebase is the best search assistant for #ld . Also: Parallax and neofonie faceted browser . domain-specific exploration tools (relationship finder on RDF), visual query builders, … 
  7. Visualization of LOD: on this, Soren highlighted that with the continuously growing size of LOD, the (semantic) data visualization will become more and more important. He presented some preliminary approaches but a lot of work still needs to be done in this field.

 In the discussion and Q&A that followed the keynote, the hot topics have beenthe performance benchmark and the authoring of LOD related to the end users v expert/ technical user

Alessandro Bozzon: A Conceptual Framework for Linked Data Exploration

Alessandro Bozzon

Alessandro discussed some motivation to the problem of exploration and integration of linked data sources and then described the Search Computing approach to linked data exploration, which applies the general purpose SeCo framework to the specific needs of the LOD.
More on this can be found on the Search Computing web site, including also a demo and a video.

Daniel Schwabe: Support for reusable explorations of Linked Data in the Semantic Web
Daniel Schwabe started his talk with some strong motivation statements.
One of the main benefits of linked data should be that data bring their own self-description.
However, if you work on it you may end up doing really dirt works on the data, to make them linked.

Daniel Schwabe

When you go to exploration interfaces, expectations of the end users might be very different with respect to what the exploration tools for tech-savvy user. That gap needs to be filled, and Rexplorator moves in that direction. Explorator was presented in the Linked Data workshop(LDOW) in Madrid in 2009. Now its extension Rexplorator has been demonstrated at ISWC 2010 and now presented extensively at the ExploreWeb workshop.
With it, you can do composition of functions, parametrization of operators, storage and reuse of “use cases”, with a query by example approach. The UI lets you think that you are dealing with resources and sets of resources, but actually the system is dealing only with triples and SPARQL queries.
A pretty interesting approach, which has something in common with the Search Computing one, and also features great UI and expressive power. It also covers faceted search.
Rexplorator is a MVC based application implemented with Ruby using ActiveRDF DSL.

In-Young Ko

Han-Gyu Ko and In-Young Ko. Generation of Semantic Clouds based on Linked Data for Efficient Multimedia Semantic Annotations
The presentation started from the definition of the requirements of semantic cloud generation: the idea is to produce tag clouds and help people annotating multimedia contents (e.g., for IP-TV contents).
The requirements include being able to:

  • identify the optimal number of tag clouds
  • balance the size of the different clouds shown to the users
  • check the coherency between the clouds and avoid ambiguity of each cloud.

The proposed lifecycle includes three phases:

  1. locating the spotting points: with a context-aware searching of linked data, starting from more important and densely connected nodes. More general nodes are more likely to be selected
  2. selecting the relations to traverse: the aim here is to reduce the RDF graph to the set of relevant relations only
  3. identify term similarity and clustering of tags.

If compared with simpler approaches for constructing clouds (e.g., based on rdf:type and SKOS parsing), this approach leads to better and more meaningful clouds of keywords.
The implemented system overlays the generated clouds upon the IPTV screen and let people select the tags.

Mamoun Abu Helou. Segmentation of Geo-Referenced Queries

Mamoun Abu Helou

This work aimed at manipulating natural language, multi-objective queries so as to split them into several simple single-aim queries.
The focus of the work was limited to geographical queries. It exploited Geowordnet, Yago, GeoNames, and Google GeoCoder API for identifying the important geographical concepts in the query. Both instances (e.g., Louvre) and classes (e.g., museum) can be identified.
A benchmark over 250 queries show promising results for the approach.

Peter Dolog.SimSpectrum: A Similarity Based Spectral Clustering Approach to Generate a Tag Cloud
Peter’s work addressed the specific problem of clustering within tag clouds.There are some problems in clouds: recent tags are overlooked because they have lower frequency; frequent ones are often useless; … .
The presentation delved into the discussion on the selection of the best algorithms for clustering of tags.
The aim was to reduce the number of tags, pick the most relevant ones, and put at nearby locations in the cloud the semantically close terms. The evaluation of the approach has been calculated in terms of coverage, overlap and relevance between the queries and the generated clouds, in the medical field.

Matthias Keller. A Unified Approach for Modeling Navigation over Hierarchical, Linear and Networked Structures

Matthias Keller

This is a visionary presentation on the needs and possible directions for a navigational model for data structures.
Data structures are very diverse (trees, graphs, …), and extracting hyperlink/access structure from the content structure is very difficult (basically there is no automatic transformation between the two). CMS enable something of this, but with limited expressive power and difficult configurability.
The idea is then to model:

  • the content organization supporting different graph-based content structures
  • the description of the access structures
  • the relation between the content and the access structure

The authors propose a graphical notation for covering these requirements and define some navigation patterns using this language.

Rober Morales-Chaparro. Data-driven and User-driven Multidimensional Data Visualization
This work aims at extracting automatically a set of optimal visualization of complex data, covering the entire lifecycle:

  1. the data model 
  2. the data mining
  3. the information model
  4. the visualization proposal engine
  5. the visualization model
  6. the code generation
  7. and the final generated application for the end user

To conclude, here is a simple tag cloud generated for the content discussed during the workshop:

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

Search Computing demonstration at WWW 2010, Hyderabad, India

Together with Alessandro Bozzon, I’ve presented a demonstration of the search computing exploratory search paradigm at WWW 2010.

The demonstrated scenario is in the real estate and job search field. Suppose that a user is willing to find a new job with a specific expertise and in a certain city. Based on his findings, he also wants to search for housing opportunities in the closeby neighbourhoods. Hence, he wants to check for additional information on the quality of life in the area, on availability of services (public transportation, schools for his children, and so on). The final decision will be based on a complex function of all these aspects. The figure below shows the graph of actually existing and registered searchable concepts within this scenario. All these concepts are searched through third-party services.

Here is a short video with a summary of the demonstration:

Here you can see Alessandro at work, while demonstrating the approach to some visitor (big prize if you guess who he is:) :

Btw, if you are looking for some more exciting pictures I took in Hyderabad, India you can have a look at this Flickr set of pictures from Hyderabad (while at WWW 2011).

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

New Book on Search Computing

The new Springer LNCS book (volume 5950)

Search Computing – Challenges and Directions

edited by Stefano Ceri and Marco Brambilla is in print.

A preview of the book (with interactive table of contents) is now available. Publication of paper version of the book is scheduled for March 2010.

Invited Challenge Talk on Search Computing at ESEC/ ACM SIGSOFT FSE 2009

I have been invited to present a Challenge Talk at ESEC/FSE 2009, in Amsterdam on Search Computing. I presented the following talk on 28 August 2009 at Vrije University:

Title: Engineering Search Computing Applications: Vision and Challenges
Abstract: Search computing is a novel discipline whose goal is to answer complex, multi-domain queries. Such queries typically require combining in their results domain knowledge extracted from multiple Web resources; therefore, conventional crawling and indexing techniques, which look at individual Web pages, are not adequate for them. This talk will sketch the main characteristics of search computing and highlight how various classical computer science disciplines – including software engineering, Web engineering, service-oriented architectures, data management, and human-computing interaction – are challenged by the search computing approach.

Pharos Federation Day, June 23

On June 23rd, 2009 I chaired the Pharos Federation Day, in Como. The program was the following:

09:30 – 10:00: Welcome and introduction (Marco Brambilla, WBM)

10:00 – 12:00: Exploitation scenarios and possibilities. Open discussion (Tonina Scuderi, ENG)

12:00 – 12:30: Demonstration of the PHAROS platform – front end demo (Alessandro Bozzon, WBM) [30’]

14:00 – 14:30: Demonstration of the PHAROS platform – content provisioning demo (Kathrine Hammervold/ Eric Cai, FAST) [30’]

14:30 – 15:30: The PHAROS platform (Vincenzo Croce, ENG) [1h30]

16:00 – 17:30: Federation role, exploitation and feedbacks – open discussion (Tonina Scuderi, ENG; Marco Brambilla, WBM) [1h]

Here are some pictures.

Pharos Summer School, June 2009

I co-organized the Pharos Summer School in Como from June 22nd to June 26th.
You can have a look to the associated details here:

Some pictures: