Linked data experience at Springer Nature


Building discovery services for scientific and scholarly content on top of a semantic data model

This talk provides a summary and reflection on how we think that Semantic Technologies are an effective way to do enterprise metadata management at web scale – essentially, being able to bring some order to the chaos resulting from multiple applications working on similar data domains.

Springer Nature is a leading publisher the Science & Scholarly area, which includes flagship journals like Nature and Scientific American, several titles under the Nature and Nature Reviews brands, plus several other products such as the Springer Book collections, Springer Journals and Springer Corporate Databases. It's a very diversified scenario which includes more than 3000 journals plus of course many other publication types like books, blogs or podcasts.

As a result of the digital revolution and the internet, new products are being created at a much faster rate than it has ever happened before - which has led the company in the last years to recognize the need to develop an integration layer that can bring together data from any of the applications that power the specific products we have on offer. In particular, we need interoperability both at the level of naming conventions - so to facilitate communication within the enterprise when people talk about articles or subject areas - and at a more formal semantic level, via a shared metadata model implemented as a set of ontologies.

To this end, we have been using Semantic and Linked Data technologies since 2012, when we lauched a prototype open platform called Subsequently, we have been working on various other projects including the nature ontologies portal (, 2015) and the springer conferences portal (, 2015). More generally, our focus has increasingly shifted from external data publishing to our internal systems – in particular we aimed at creating an architecture where RDF is core to the publishing workflow as much as XML is. In this talk we would like to provide an overview of this exciting and interesting journey, which is now taking us to the creation of a new platform that combines content, science and people data from across Springer Nature and that will be launched in late 2016.


The challenges and lessons learned we will touch on include:   

  • how to build knowledge models and data architectures which aim at leveraging sem tech within a traditionally XML-based publishing workflow: in other words, how to introduce these new technologies in such a way that the solve real problems without disrupting established workflows?
  • the importance a coherent metadata management and semantic integration solution can have for any enterprise looking at maintaining their competitive advantage in the knowledge society.
  • the value of making available open data to the scientific community (and the larger public), as a way to promote innovation and making it easier for scientists to do reseach. At the same time, the difficulties encountered in getting non data specialists involved with data which are instead encoded using Linked data standards.
  • challenges involved in identifying, indexing and transforming data coming from heterogeneous sources into a flexible yet coherent web-scalable metadata management layer.