Essence of Patent Text Mining

In order to create support tools for patent experts, we need to understand their daily work tasks, as well as, the linguistic character of the text genre. In the patent domain, all types of issues are accentuated from specific search requirements to the characteristic of the text domain. These include complex linguistic features in terms of paraphrasing, long sentences, domain-specific terminology, acronym diversity, etc. The object of the talk, which is based upon my PhD work, is to give a holistic view of the patent text domain to demonstrate the diversity, which the text mining applications need to handle to develop useful patent text mining tools for different information needs. Subsequently, the focus in my PhD thesis was not to develop a specific text summarisation algorithm, an information retrieval system or a classification system, but rather to give an overview of state-of-art text mining methods and their shortcomings that may occur when applied on patent texts. In this talk, I will address several different text mining scenarios from automatic terminology recognition, ontology population to domain-specific question and answering solutions. In my concluding PhD experiment, I integrated several of my information extraction modules into an information retrieval setting addressing patent passage retrieval. By recognising the importance of Language Complexity, Domain Complexity and Task Complexity, I significantly improve retrieval performance in comparison to the state-of-the-art for Patent Passage Retrieval on CLEF-IP 2013 test collection, Conference and Labs of the Evaluation Forum in Intellectual Property Domain.
 

Speakers: 

Interested in this talk?

Register for SEMANTiCS conference
Register