Supporting virtual integration of Linked Data with just-in-time query recompilation

Research & Innovation

In virtual data integration, the data reside on their original sources without being copied and transformed on a single platform as in warehousing. Integration must be performed at query execution time and relies on transformations of the original query to many target endpoints. In systems that integrate many data sources, this means maintaining many mappings, queries and query templates, as well as possibly issuing separate queries for linking entities in the datasets and retrieving their data. We propose a practical approach to keeping such complexity under control, which manipulates the translation from one client query to many target queries. The method performs just-in-time recompilation of the client query into elements that are combined with a query template into the target queries for multiple sources. In a setting with a custom conjunctive query language as client API and SPARQL endpoints as sources, this has shown to reduce the number of target queries to issue and of query templates to maintain, by using a number of compiler functions that scales with the complexity of the data source.