Semantic Processing for the Conversion of Unstructured Documents into Structured Information in the Enterprise Context

Research & Innovation

We present an on-going research project addressing the problem of massive amounts of unstructured data that is generated on a daily basis in most business organisations, regardless of size. Our motivation is to support in particular small and medium seized enterprises to gain a competitive advantage in the market. The goal is to improve their processes for extracting valuable business information from such disorganised data. To achieve this, we introduce a flexible and scalable data analysis framework capable of transforming various types of documents into semantically annotated structures. This includes emails, text files in various formats, slide presentations, blog entries, etc. Additionally, the solution provides a semantic search engine for structured retrieval of the analyzed information and a graphical layer to dynamically visualize the search results as an interactive graph. Throughout the paper, the architecture of two main engines that are responsible for data and text analysis and semantic search are described. We conclude that semantic processing of unstructured sources signi cantly improves data management and data integration within the enterprises.