Research & Innovation

Owen Sacco

Creating video games that are market competent costs in time, effort and resources which often cannot be afforded by small-medium enterprises, especially by independent game development studios. As most of the tasks involved in developing games are labour and creativity intensive, our vision is to reduce software development effort and enhance design creativity by automatically generating novel and semantically-enriched content for games from Web sources. In particular, this paper presents a vocabulary that defines detailed properties used for describing video game characters information extracted from sources such as fansites to create game character models. These character models could then be reused or merged to create new unconventional game characters.

Alex OliemanKaspar BeelenJaap KampsMilan van Lange

We investigate the Digital Humanities use case, where scholars spend a considerable amount of time selecting relevant source texts. We developed WideNet; a semantically-enhanced search tool which leverages the strengths of (imperfect) EL without getting in the way of its expert users. We evaluate this tool in two historical case-studies aiming to collect a set of references to historical periods in parliamentary debates from the last two decades; the first targeted the Dutch Golden Age, and the second World War II.
The case-studies conclude with a critical reflection on the utility of WideNet for this kind of research, after which we outline how such a real-world application can help to improve EL technology in general.

Wouter BeekJavier D. FernándezRuben Verborgh

Many Data Scientists make use of Linked Open Data. However, most scientists restrict their analyses to one or two datasets (often DBpedia). One reason for this lack of variety in dataset use has been the complexity and cost of running large-scale triple stores, graph stores or property graphs. With Header Dictionary Triples (HDT) and Linked Data Fragments (LDF), the cost of Linked Data publishing has been significantly reduced. Still, Data Scientists who wish to run large-scale analyses need to query many LDF endpoints and integrate the results. Using recent innovations in data storage, compression and dissemination, we are able to compress (a large subset of) the LOD Cloud into a single file. We call this file LOD-a-lot. Because it is just one file, LOD-a-lot can be easily downloaded and shared. It can be queried locally or through an LDF endpoint. In this paper we identify several categories of use cases that previously required an expensive and complicated setup, but that can now be run over a cheap and simple LOD-a-lot file. LOD-a-lot does not expose the same functionality as a full-blown database suite, mainly offering Triple Pattern Fragments. Despite these limitations, this paper shows that there is a surprisingly wide collection of Data Science use cases that can be performed over a LOD-a-lot file. For these use cases LOD-a-lot significantly reduces the cost and complexity of doing Data Science. 

Alessandro AdamouMathieu D'AquinCarlo AlloccaEnrico Motta

In virtual data integration, the data reside on their original sources without being copied and transformed on a single platform as in warehousing. Integration must be performed at query execution time and relies on transformations of the original query to many target endpoints.

Iker Esnaola-GonzalezJesús BermúdezIzaskun FernandezSantiago FernandezAitor Arnaiz

Outlier detection in the preprocessing phase of Knowledge Discovery in Databases (KDD) processes has been a widely researched topic for many years. However, identifying the potential outlier cause still remains an unsolved challenge even though it could be very helpful for determining what actions to take after detecting it. Besides, conventional outlier detection methods might still overlook outliers in certain complex contexts. In this article, Semantic Technologies are used to contribute overcoming these problems by proposing the SemOD (Semantic Outlier Detection) Framework. This framework guides the data-scientist towards the detection of certain types of outliers in WSNs (Wireless Sensor Network). Feasibility of the approach has been tested in outdoor temperature sensors and results show that the proposed approach is generic enough to apply it to different sensors, even improving the accuracy of outlier detection as well as spotting their potential cause.

Henning PetzkaClaus StadlerGeorgios KatsimprasBastian HaarmannJens Lehmann

The increasing availability of large amounts of Linked Data creates a need for software that allows for its efficient exploration. Systems enabling Faceted Browsing constitute a user-friendly solution that need to combine suitable choices for front and back end. Since a generic solution must be adjustable with respect to the data set, the underlying ontology and the knowledge graph characteristics raise several challenges and heavily influence the browsing experience. As a consequence, an understanding of these challenges becomes an important matter of study. We present a benchmark on Faceted Browsing, which allows systems to test their performance on specific choke points on the back end. Further, we address additional issues in Faceted Browsing that may be caused by problematic modelling choices within the underlying ontology.

Harsh ThakkarYashwant KeswaniMohnish DubeyJens LehmannSören Auer

Knowledge graphs, usually modelled via RDF or property graphs, have gained importance over the past decade. In order to decide which Data Management Solution (DMS) performs best for specific query loads over a knowledge graph, it is required to perform benchmarks. Benchmarking is an extremely tedious task demanding repetitive manual effort, therefore it is advantageous to automate the whole process.

Ciro Baron NetoDimitris KontokostasGustavo PublioDiego EstevesAmit KirschenbaumSebastian Hellmann

Over the last decade, we observed a steadily increasing amount of RDF datasets made available on the web of data. The decentralized nature of the web, however, makes it hard to identify all these datasets. Even more so, when downloadable data distributions are discovered, only insufficient metadata is available to describe the datasets properly, thus posing barriers on its usefulness and reuse.

In this paper, we describe an attempt to exhaustively identify the whole linked open data cloud by harvesting metadata from multiple sources, providing insights about duplicated data and the general quality of the available metadata. This was only possible by using a probabilistic data structure called Bloom filter. Finally, we enrich existing dataset metadata with our approach and republish them through an SPARQL endpoint.

Elisa Margareth SibaraniSimon ScerriCamilo MoralesSören AuerDiego Collarana

The rapid changes on the job market and the dramatic usage of the Web have triggered the need to analyze online job adverts. This paper presents an quantitative method to infer employers skill demand using co-word analysis based on skills keyword. These keywords are extracted automatically by an Ontology-based Information Extraction (OBIE) method. An ontology called Skills and Recruitment Ontology (SARO) has been developed to represent job postings in the context of skills and competencies needed to fill a job role. During the extraction and annotation of keywords, we focus on job posting attributes and job specific skills (Tool, Product, Topic). We present our system where cross-sectional study is decoupled in two phases: (1) a customized-pipeline for extracting information whose results are a matrix of co-occurrences and correlation; and (2) content analysis to visualize the keywords' structure and network. This method reveals the technical skills in demand together with their structure for revealing significant linkages. The evaluation of OBIE method indicates the promising result of automatic keyword indexing with an overall strict F-measure at 79%. The advantage of using an ontology and reusing semantic categories enables other research groups to reproduce this method and its results.


Subscribe to RSS - Research & Innovation