Using Semantic Technology to Solve Sparse Training Material Problem in Machine Learning for Classification of Company Websites

Industry

Starting in 2015 ds9 has been developing a large SEARCHCORPUS of companies in the Bio Sciences market for Boehringer Ingelheim. This Biotech Companies SEARCHCORPUS is optimized on an ongoing base to allow data scientists to quickly find licensing opportunities, acquisition targets and new technological developments of competitors.
Comprising a collection of > 10 Mio. pages from approx. 50.000 corporate websites even highly specific expert searches result in hundreds of potential targets that need to be verified manually.
This presentation will talk about an approach to restrict search on company types that are expected among as targets among the search hits by automatically classifying companies based on the corporate websites in different classes like Client Research Organizations, Electronic Health or Big Pharma.

The main problem to automatically classify companies based on website content is the fact that training data needs to be manually selected and qualified. Where machine learning usually expects thousands of training samples, semantic technology allowed us to use classic machine learning algorithms to build classifiers converging with less than 100 records of training data.

SlideDeck:

S5.1 - Klaus Kater ds9_Using_Semantic_Technology_to_Solve_Sparse_Training_Material_Problem_in_Machine_Learning_for_Classification_of_Company_Websit.pdf

Speakers:

Klaus Kater

Dipl. Infom. (FH)

Deep SEARCH 9 GmbH
http://www.deepsearchnine.com/

Having worked in C-level positions at international companies in Germany, Switzerland, Hong Kong and the US, Klaus directed large off shore development teams and developed strong leadership and management experience. Based on his academic background in Artificial Intelligence technologies and web based product development, Klaus’ latest endeavor is Deep SEARCH 9.

Search form

Using Semantic Technology to Solve Sparse Training Material Problem in Machine Learning for Classification of Company Websites

Speakers:

Klaus Kater

Interested in this talk?