With Javier David Fernández García we got a profound expert for Linked Data Research as Keynote Speaker for this year’s DBpedia Day at SEMANTiCS. Javier is a postdoctoral research fellow at the WU (Vienna University of Economics and Business). He holds a PhD in Computer Science by the University of Valladolid (Spain), and the University of Chile (Chile). His current research focuses on efficient Linked Data access, RDF streaming, archiving and querying dynamic Linked Data. In this interview Javier talks about practical and visionary topics around Linked Data, its stregths and limitations, DBpedia's underrepresented domains, and the directions he would like DBpedia to take in the future.
You are at the forefront of Linked Data Research since years now. Looking on the trends and fashions, where do you see the hot spots in the field? What will shape our future of LD-Research the most?
There are many interesting topics, and any prediction could be a mere speculation. I would maybe distinguish between practical and visionary topics. With practical topics, I refer to those topics that are increasingly important when using Linked Data (LD) in a practical scenario. LD is not a toy anymore (if it ever was), it is used in real-world environments and large-scale companies, sometimes under different names (e.g. Knowledge Graphs). Hot spots in this regard include scalability (efficiency and robustness to cope with current Big Data scenarios), decentralization (how to foster and to leverage the inherent data distribution nature of LD), streaming (a common trend in related areas such as IoT, WoT), quality, trust, provenance and preservation. I would like to refer here to a joint study with several colleagues where we applied a mixed methods methodology to provide a broader picture of Semantic Web topics in the last decade.
As for visionary topics, there is an obvious hype on the relation between Linked Data and Machine Learning/Deep Learning. Interestingly, Linked Data can benefit from these technologies to resolve existing challenges (e.g. ontology learning), but it can also become an important keystone by integrating/enriching information that can be fed into such systems. I expect to see many works in this field coming in the next years.
LD-Research has it’s strength and it’s limitation. Which technological innovation would you love to see happening?
It is hard to say. In fact, some of the limitations have a strong social factor rather than technological. For example, it is well known that some LD sources suffer from accessibility and availability problems. While this could be a common issue in many large-scale projects, stale data particularly jeopardizes the LOD initiative. The same applies to LD findability: in spite of existing tools (e.g. VoID), LD consumers lack of guidelines to help them discover and reuse existing content. Any innovation in this regard will be very valuable in order to achieve more sustainable and (re-)usable LD.
Technologically speaking, there is much room for improvement in federated query processing. I would love to have a unique API where I could find datasets according to my needs, enrich/integrate my private datasets and efficiently run a structured query, with reasoning capabilities, on the fully-distributed Linked Open Data. That is presumably too much to ask, but you always have to set your sights high.
The research agenda is determined by rising technological trends like Big Data (in the past) AI and Machine Learning (present). Where do you see the mutual benefit in the interchange between this hyped technologies and “good old” LD?
The connection of Big Data and Linked Data was always obvious. On the one hand, one of the challenges of Big Data is the variety of the data, which is natively addressed by LD technologies. The fact that Linked Data can be seen as a graph is not a mere coincidence. Integrating two concepts in a graph comes really natural, as it could be as simple as adding an edge (with some meaning) between nodes. On the other hand, Linked Data massive volumes can only be addressed with Big Data technologies. Thus, Big Semantic Data and Semantic Big Data have been, and they will be, recurrent topics in the years to come.
When it comes to AI and Machine Learning (ML), it is worth mentioning that Linked Data has many “faces” (semantic web, linked data, web of data, web of things, etc). Nowadays, we can say that its last flavor is the concept of “Knowledge graphs”, which is gaining increasingly attention in combination with AI and ML technologies. For example, Knowledge Graphs have been used as the “fuel” of Deep Learning (DL) algorithms, which learn features that can help in recommendation systems and entity classifiers. Conversely, some LD tasks, such as ontology learning, ontology prediction and ontology alignment have been tacked by ML/DL algorithms.
As an example of such mutual benefit and importance, machine learning is already a recurrent topic in SEMANTiCS conference, and the Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies at ESWC'2018 was the most attended. As I discussed with a colleague in that workshop, if you research in LD, it is time to learn a bit more on AI and machine learning... and vice versa.
How do you inspire your research colleagues to work within the LD domain?
Fortunately most of my colleagues are already firm believers. When it comes to researchers or practitioners from other areas, the idea of Linked Data is so powerful that it sells itself. Linked Data has its roots in the long-awaited dream of organizing scattered data such that humans and machines can get profitable knowledge. At web scale, if we focus on open data, it is indeed one of the most noble acts: there is no more worthy cause than sharing knowledge with others, in such a way that novel and unexpected knowledge can be created. The implications of the idea are almost philosophical. Just as humans complemented oral communication with written systems, it is not unreasonable to think that a novel system should be needed to represent and integrate distributed, world-scale knowledge for machine consumption. Linked Data can be a first step in this regard, probably as imperfect as the initial Sumerian writing, but a necessary building stone for future generations to come.
Current challenges are broad and multidisciplinary (scalability, quality, availability, findability, etc.), and the application domains are infinite, hence any research is welcome.
You are a keynote speaker at the DBpedia Day. The last years bought a hugh amount of innovation and projects around the DBpedia knowledge hub. Give us an insight where the developments of DBpedia should be driven, to keep DBpedia relevant for the enlarged LD community?
Whenever I present DBpedia to newbies, I always say that DBpedia is the main hub of the Linked Open Data cloud, and I continue to believe that this essence should remain a key concern. In this respect, while DBpedia accurately describes certain topics such as music, films, etc., other domains (and I'm thinking in life sciences in particular), are clearly underrepresented. Efforts must intensify to get more communities on board. In addition, I should recall the ever-present quality issues, where I believe the necessary steps are being taken. The challenge here is two-fold. On the one hand, and more obvious, DBpedia must be our best presentation card, integrating all best practices and providing quality, trustworthy, “bullet-proof” data and services. On the other hand, DBpedia has to be at the cutting edge of innovation, showing and supporting potential new research avenues. This can only be achieved by the research community itself. Thus, I would turn the issue around: It is not what DBpedia can offer to the LD community in the future, but how DBpedia can support and engage the LD community to take active part in novel, disruptive DBpedia projects. I hope to see this kind of discussions in the next DBpedia Day at SEMANTiCS 2018.
Discuss with Javier about the many faces of Linked Data at SEMANTiCS 2018.
Many thanks to the DBpedia Association, in particular Julia Holze, who supported the preparation of this interview.
The annual SEMANTiCS conference is the meeting place for professionals who make semantic computing work, and understand its benefits and know its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers, and researchers, from organisations ranging from NPOs, universities, public administrations to the largest companies in the world. http://www.semantics.cc