Together with Piero Fraternali and Daniel Schwabe I organized a workshop at ICWE 2011 on on Search, Exploration and Navigation of Web Data Sources, named ExploreWeb 2011.
It was a new challenge for us, because it was at its first edition, but I can say it has been a quite successful event.
We got 12 submissions and we accepted 7 of them, and furthermore we invited Soren Auer as a keynote speaker to start the day. The attendance was also very good, as we got 25+ people in the room for the whole day.
Here is a quick summary of the day.
|Soren Auer at exploreWeb 2011.|
Soren Auer from University of Leipzig gave a very nice keynote talk at the beginning of the exploreWeb workshop on the entire lifecycle of Linked Data.
The talk was centered on the requirements imposed by the continuous growth of the Linked Open Data cloud (LOD) and on the life cycle associated to the LD contents. Such life cycle comprises the following phases:
- Extraction of LOD: extracting linked data is a challenge per se. Indeed, for instance in the case of DBpedia extraction from Wikipedia, the issues to be considered include: keeping aligned the semantic version with the user generated one; and at the same time coping with the messy and incoherent “schemas” offered by Wikipedia infoboxes. For covering this aspect, a “Mapping Wiki” based on a higher level ontology has been created for defining the mapping between labels in Wikipedia.
- Storage and Querying of LOD: the critical issue here is that it’s still 5 to 50 time slower than RDBMS. On the other side, it obviously grants increased flexibility (especially at the schema manipulation level). A new benchmark recently performed by Soren’s group provides new performance results for Virtuoso, Sesame, Jena, and BigOWLIM. The benchmark was performed on 25 frequent DBpedia queries and shows that Virtuoso consistently grants speed two times higher than the competitors, while Jena confirms as the most poorly performing platform.
- Authoring of LOD: different approaches can be adopted, including: Semantic Wikis (e.g., OntoWiki), in which users do not edit text but semantic descriptions built with forms. We can identify two main classes of semantic wikis: semantic text wikis and semantic data wiki. A new approach is now adopted by the new RDFa Content Editor, which uses OpenCalais and other APIs for helping annotating the text within a WYSIWYG environment.
- Linking LOD: approaches to linking can be automatic, semi-automatic (e.g., see the tools SILK and LIMES), or manual (e.g., see Sindice in UIs and Semantic Pingback).
- Evolution of LOD: the evolution of linked data is a critical problem, not yet fully addressed. The EvoPat project is a first attempt to formalize the problem and the solution, by defining a set of evolution patterns and anti-patterns. Some features are already integrated into Ontowiki.
- Exploration of LOD: Challenging because of: size, heterogeneity, distributedness. Spacial and faceted exploration of LinkedGeoData #ld #semweb. #freebase is the best search assistant for #ld . Also: Parallax and neofonie faceted browser . domain-specific exploration tools (relationship finder on RDF), visual query builders, …
- Visualization of LOD: on this, Soren highlighted that with the continuously growing size of LOD, the (semantic) data visualization will become more and more important. He presented some preliminary approaches but a lot of work still needs to be done in this field.
In the discussion and Q&A that followed the keynote, the hot topics have beenthe performance benchmark and the authoring of LOD related to the end users v expert/ technical user
Alessandro Bozzon: A Conceptual Framework for Linked Data Exploration
Alessandro discussed some motivation to the problem of exploration and integration of linked data sources and then described the Search Computing approach to linked data exploration, which applies the general purpose SeCo framework to the specific needs of the LOD.
More on this can be found on the Search Computing web site, including also a demo and a video.
Daniel Schwabe: Support for reusable explorations of Linked Data in the Semantic Web
Daniel Schwabe started his talk with some strong motivation statements.
One of the main benefits of linked data should be that data bring their own self-description.
However, if you work on it you may end up doing really dirt works on the data, to make them linked.
When you go to exploration interfaces, expectations of the end users might be very different with respect to what the exploration tools for tech-savvy user. That gap needs to be filled, and Rexplorator moves in that direction. Explorator was presented in the Linked Data workshop(LDOW) in Madrid in 2009. Now its extension Rexplorator has been demonstrated at ISWC 2010 and now presented extensively at the ExploreWeb workshop.
With it, you can do composition of functions, parametrization of operators, storage and reuse of “use cases”, with a query by example approach. The UI lets you think that you are dealing with resources and sets of resources, but actually the system is dealing only with triples and SPARQL queries.
A pretty interesting approach, which has something in common with the Search Computing one, and also features great UI and expressive power. It also covers faceted search.
Rexplorator is a MVC based application implemented with Ruby using ActiveRDF DSL.
Han-Gyu Ko and In-Young Ko. Generation of Semantic Clouds based on Linked Data for Efficient Multimedia Semantic Annotations
The presentation started from the definition of the requirements of semantic cloud generation: the idea is to produce tag clouds and help people annotating multimedia contents (e.g., for IP-TV contents).
The requirements include being able to:
- identify the optimal number of tag clouds
- balance the size of the different clouds shown to the users
- check the coherency between the clouds and avoid ambiguity of each cloud.
The proposed lifecycle includes three phases:
- locating the spotting points: with a context-aware searching of linked data, starting from more important and densely connected nodes. More general nodes are more likely to be selected
- selecting the relations to traverse: the aim here is to reduce the RDF graph to the set of relevant relations only
- identify term similarity and clustering of tags.
If compared with simpler approaches for constructing clouds (e.g., based on rdf:type and SKOS parsing), this approach leads to better and more meaningful clouds of keywords.
The implemented system overlays the generated clouds upon the IPTV screen and let people select the tags.
Mamoun Abu Helou. Segmentation of Geo-Referenced Queries
|Mamoun Abu Helou|
This work aimed at manipulating natural language, multi-objective queries so as to split them into several simple single-aim queries.
The focus of the work was limited to geographical queries. It exploited Geowordnet, Yago, GeoNames, and Google GeoCoder API for identifying the important geographical concepts in the query. Both instances (e.g., Louvre) and classes (e.g., museum) can be identified.
A benchmark over 250 queries show promising results for the approach.
Peter Dolog.SimSpectrum: A Similarity Based Spectral Clustering Approach to Generate a Tag Cloud
Peter’s work addressed the specific problem of clustering within tag clouds.There are some problems in clouds: recent tags are overlooked because they have lower frequency; frequent ones are often useless; … .
The presentation delved into the discussion on the selection of the best algorithms for clustering of tags.
The aim was to reduce the number of tags, pick the most relevant ones, and put at nearby locations in the cloud the semantically close terms. The evaluation of the approach has been calculated in terms of coverage, overlap and relevance between the queries and the generated clouds, in the medical field.
This is a visionary presentation on the needs and possible directions for a navigational model for data structures.
Data structures are very diverse (trees, graphs, …), and extracting hyperlink/access structure from the content structure is very difficult (basically there is no automatic transformation between the two). CMS enable something of this, but with limited expressive power and difficult configurability.
The idea is then to model:
- the content organization supporting different graph-based content structures
- the description of the access structures
- the relation between the content and the access structure
The authors propose a graphical notation for covering these requirements and define some navigation patterns using this language.
Rober Morales-Chaparro. Data-driven and User-driven Multidimensional Data Visualization
This work aims at extracting automatically a set of optimal visualization of complex data, covering the entire lifecycle:
- the data model
- the data mining
- the information model
- the visualization proposal engine
- the visualization model
- the code generation
- and the final generated application for the end user
To conclude, here is a simple tag cloud generated for the content discussed during the workshop: