Research & Innovation

Auriol Degbelo

Ontologies are key to information retrieval, semantic integration of datasets, and semantic similarity analyzes. Evaluating ontologies (especially defining what constitutes a "good" or "better" ontology) is therefore of central importance for the Semantic Web community. Various criteria have been introduced in the literature to evaluate ontologies, and this article classifies them according to their relevance to the design or the implementation phase of ontology development. In addition, the article compiles strategies for ontology evaluation based on ontologies published between until 2017 in two outlets: the Semantic Web Journal, and the Journal of Web Semantics. Gaps and opportunities for future research on ontology evaluation are exposed towards the end of the paper.

Kolawole John AdebayoLuigi Di CaroGuido Boella

We propose a task independent neural networks model, based on a Siamese twin architecture. Our model specifically benefits from two forms of attention scheme which we use to extract high-level feature representation of the underlying texts, both at the word level (intra-attention) as well as at the sentence level (inter-attention). The inter-attention scheme uses one of the text to create a contextual interlock with the other text, thus paying attention to mutually important parts. We evaluate our system on three tasks, i.e. Textual Entailment, Paraphrase Detection and answer-sentence selection. We set a near state-of-the-art result on the textual entailment task with the SNLI corpus while obtaining strong performance across the other tasks that we evaluate our model on.

Despina-Athanasia PantaziGeorge PapadakisKonstantina BeretaThemis PalpanasManolis Koubarakis

Discovering matching entities in different Knowledge Bases constitutes a core task in the Linked Data paradigm. Due to its quadratic time complexity, Entity Resolution typically scales to large datasets through blocking, which restricts comparisons to similar entities. For Big Linked Data, Meta-blocking is also needed to restructure the blocks in a way that boosts precision, while maintaining high recall. Based on blocking and Meta-blocking, JedAI Toolkit implements an end-to-end ER workflow for both relational and RDF data. However, its bottleneck is the time-consuming procedure of Meta-blocking, which iterates over all comparisons in each block. To accelerate it, we present a suite of parallelization techniques that are suitable for multi-core processors. We present 2 categories of parallelization strategies, with each one comprising 4 different approaches that are orthogonal to Meta-blocking algorithms. We perform extensive experiments over a real dataset with 3.4 million entities and 13 billion comparisons, demonstrating that our methods can process it within few minutes, while achieving high speedup.

Roman ProkofyevDjellel DifallahMichael LuggenPhilippe Cudre-Mauroux

Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics extracted from the data. While such statistics can effectively deal with noisy annotations, they introduce bias towards head entities and are ineffective for long tail (e.g., unpopular) entities. In this work, we first analyze statistical properties linked to manual annotations by studying a large annotated corpus composed of all English Wikipedia webpages, in addition to all pages from the CommonCrawl containing English Wikipedia annotations. We then propose and evaluate a series of entity linking approaches, with the explicit goal of creating highly-accurate (precision > 95\%) and broad annotated corpuses for machine learning tasks. Our results show that our best approach achieves maximal-precision at usable recall levels, and outperforms both state-of-the-art entity-linking systems and human annotators.

Kris McGlinnChristophe DebruyneLorraine McNerneyDeclan O’Sullivan
Isaiah Onando Mulang'Kuldeep SinghFabrizio Orlandi

Research has seen considerable achievements concerning the translation of natural language patterns into formal queries for Question Answering based on Knowledge Graphs (KG). The main challenge exists on how to identify which property within a Knowledge Graph matches the predicate found in a Natural Language (NL) relation. Current approaches for formal query generation attempt to resolve this problem mainly by first retrieving the named entity from the KG together with a list of its predicates, then filtering out one from all the predicates of the entity. We attempt an approach to directly match an NL predicate to KG properties that can be employed within QA pipelines. In this paper, we specify a systematic approach as well as providing a tool that can be employed to solve this task. Our approach models KB relations with their underlying parts of speech, we then enhance this with extra attributes obtained from Wordnet and Dependency parsing characteristics. From a question, we model a similar representation of query relations. We then define distance measurements between the query relation and the properties representations from the KG to identify which property is referred to by the relation within the query. We report substantive recall values and considerable accuracy from our evaluation.

Najmeh Mousavi NejadSimon ScerriSören Auer

With the omnipresent availability and use of cloud services, software tools, Web portals or services, legal contracts in the form of license agreements or terms and conditions regulating their use are of paramount importance. Often the textual documents describing these regulations comprise many pages and can not be reasonably assumed to be read and understood by humans. In this work, we describe a method for extracting and clustering relevant parts of such documents, including permissions, obligations, and prohibitions. The clustering is based on semantic similarity employing a distributional semantics approach on large word embeddings database. An evaluation shows that it can significantly improve human comprehension and that improved feature-based clustering has a potential to further reduce the time required for EULA digestion. Our implementation is available as a Web service, which can directly be used to process and prepare legal usage contracts.

Sebastian BaderJan Oevermann

Service technicians in the domain of industrial maintenance require extensive technical knowledge and experience to complete their tasks. Some of the needed knowledge is made available as document-based technical manuals or service reports from previous deployments. Unfortunately, due to the great amount of data, service technicians spend a considerable amount of working time searching for the correct information. Another challenge is posed by the fact that valuable insights from operation reports are not yet considered due to insufficient textual quality and content-wise ambiguity. In this work we propose a framework to annotate and integrate these heterogeneous data sources to make them available as information units with Linked Data technologies. We use machine learning to modularize and classify information from technical manuals together with ontology-based autocompletion to enrich service reports with clearly defined concepts. By combining these two approaches we can provide an unified and structured interface for both manual and automated querying. We verify our approach by measuring precision and recall of information for typical retrieval tasks for service technicians, and show that our framework can provide substantial improvements for service and maintenance processes.

Jan Voskuil

Linked Data and the Semantic Web have generated interest in the Netherlands from the very beginning. Sporting several renowned research centers and some widely published early application projects, the Netherlands is home to Platform Linked Data Nederland, a grass-roots movement promoting Linked Data technologies which functions as a marketplace for exchanging ideas and experiences.

Georgios SantipantakisGeorge VourosChristos DoulkeridisAkrivi VlachouGennady AndrienkoNatalia AndrienkoJose Manuel CorderoMiguel Garcia Martinez

Motivated by real-life emerging needs in critical domains, this paper proposes a coherent and generic ontology for the representation of semantic trajectories, in association with related events and contextual information. The main contribution of the proposed ontology is the representation of semantic trajectories at varying, interlinked levels of spatio-temporal analysis. The paper presents the ontology in detail, also in connection to other well-known ontologies, and demonstrates how exploiting data at varying levels of granularity supports data transformations that can support visual analytics tasks in the air-traffic management domain.


Subscribe to RSS - Research & Innovation