What do knowledge graphs and statistical learning techniques like word2vec have in common? One could argue that both techniques produce similar outcomes: models that identify and describe the semantics of real-world concepts and their context. However, the underlying technical approach is fundamentally different. Knowledge graphs are typically curated manually or extracted from larger unstructured knowledge bases; concepts are identified via URIs and their context is defined via explicitly modelled relationships. Statistical models, in contrast, are trained on large corpus of text and produce a space in which concepts are “identified” by a corresponding vector; concept relationships are represented implicitly and can be computed using vector arithmetic.
This year’s data science track will again target the intersection between Data Science and Semantics research and kick-off with a keynote talk by Alan Hanbury who will shed some light on how lexical and statistical semantics can be used to improve search results.
Contributions to the data science track also recognize that even the most sophisticated analytics or machine learning task still obeys the fundamental computing law of “garbage in, garbage out”, which means that the best model will be limited by flaws or misinterpretations in the training dataset. Therefore, this track will also feature a number of papers highlighting the importance of well-defined structured vocabularies for data analytics and learning tasks.
We, the chairs of the overall data science tracks are happy that the Data Science track attracted more than enough interest to be included in this year’s program of SEMANTiCS. We are looking forward to meeting and exchanging ideas with people who share similar interests and are convinced that this year’s SEMANTiCS will be a good place to start these discussions.
Bernhard Haslhofer and Laura Hollnik and Alexander Schindler
Data Science Track Chairs