Get Your Hands Dirty: Evaluating Word2Vec Models for Patent Data

Poster & Demo

Patent search systems allow complex queries to be formulated by combining different search terms using Boolean and other operators such as proximity, wildcards, etc. in order to find relevant patents.
This widely adopted approach is based on exact match, making it difficult to efficiently identify and analyze relevant patents, as the search terms often do not match the terminology used by the inventors. Another problem concerns the large number of relevant hits due to weekly and monthly updates of patent applications and grants. Although some semantic search systems for patents based on latent semantic analysis have been implemented as black-box systems in the past, word embeddings that have been successfully applied to generate semantic representations of text have rarely been employed and evaluated for a (large) patent corpus. The work described herein aims to evaluate semantic representations for patent data via a pre-trained general model in comparison to an adapted word embedding model from a patent corpus in order to contribute to a multitude of semantic analysis tasks for patents such as similarity search, content analysis, entity linking etc..

Speakers:

Hidir Aras

FIZ Karlsruhe
https://www.fiz-karlsruhe.de/

Rima Türker

PhD student at Karlsruhe Institute of Technology (KIT) & FIZ-Karlsruhe

FIZ Karlsruhe
https://www.fiz-karlsruhe.de/

Dieter Geiss

FIZ Karlsruhe
https://www.fiz-karlsruhe.de/

Max Milbradt

FIZ Karlsruhe
https://www.fiz-karlsruhe.de/

Harald Sack

Senior Researcher

FIZ Karlsruhe
https://www.fiz-karlsruhe.de/

Harald Sack is Senior Researcher at the Hasso Plattner-Institute for IT-Systems Engineering (HPI) at the University of Potsdam and head of the research group 'Semantic Technologies and Multimedia Retrieval'.

Search form

Get Your Hands Dirty: Evaluating Word2Vec Models for Patent Data

Speakers:

Hidir Aras

Rima Türker

Dieter Geiss

Max Milbradt

Harald Sack

Interested in this talk?