This talk presents our experience with the evolution of semantic technologies in scientific libraries. Starting with semantic document representations and the use of semantic technologies to cross-link digital libraries, the talk will put its main focus on the question how deep web content can be brought to the surface.
It is widely known that much of the web content is hidden in the so-called deep web. That is content is hidden in protected databases which cannot be harvested by web search engines. At the same time, scholarly communication has changed dramatically. The new attitude can be best described as “What I do not find with Google or Google Scholar does not exist”. This however puts scientific libraries under pressure as much of their content (be it metadata or the publication itself) is stored in deep web databases.
As of today, query interfaces (HTML forms) provide access to most content in the deep web. Several approaches try to improve access to deep web content focusing on automatic query interface understanding. In contrast to this approach, this talk suggests to change the perspective. In our notion, it is in the interest of scientific libraries to provide controlled access to their deep web content through query interfaces. The talk therefore suggests the provision of semantic annotations for query interfaces. To realize a semantic annotation for query interfaces it is most important to find a generic model that is capable to formalize the variety of dependencies and restrictions of related form fields as well as the output data properties. Based on schema.org, the talk will propose a RDF vocabulary which can meet these requirements. It ensures the necessary abstraction level and serves as intermediate vocabulary connecting various other RDF vocabularies. The application of the vocabulary will be illustrated within the context of EconBiz which is maintained by ZBW and which one of the world’s largest digital library for economics.