FarsBase: a Cross-Domain Farsi knowledge Graph

Poster & Demo

According to the Semantic web, future of internet would be a complex and huge global knowledge base, in which the Knowledge graphs can play a significant role in developing this emerging technology. A Knowledge graph is a collection of entities semantically connected which makes a contribution to tasks of both academia and industry. It is applied in search engines, Natural Language Processing (NLP), text mining, Question answering and Information Retrieval (IR). In this study, a cross-domain knowledge graph in Farsi language is presented, which consists of more than 500K of entities and 7 million relations. Data were extracted from Farsi edition of Wikipedia in addition to its structured data such as infoboxes and tables. According to the semantic web, RDF data model and OWL2 ontology were employed to implement the Farsi Knowledge Graph (FKG). Resources and their relations are stored in triple format, therefore, access to the knowledge graph is provided by a SPARQL endpoint. An ontology, retrieved from DBpedia ontology, was developed based on resources of Farsi Wikipedia. Moreover, more than 8000 templates and properties of Wikipedia were mapped to the ontology automatically and manually. Furthermore, a part of the ontology was mapped to the FarsNet, the Persian WordNet, for research purposes. According to the Linked data, most of entities in the FKG have been connected to DBpedia and Wikidata resources by owl:sameAs. In the graph, there is a large amount of information on a variety of topics including famous people, important places, organizations and companies, literary and art works, physiology, biology, events, species, astronomy and so forth. In order to achieve high performance and flexible data model, a two-level architecture for storing data was designed to separate data from metadata. This design plays a key role in update operation and managing versions. For evaluation purposes, a small part of triples were randomly collected to build a test dataset for manually inspection. Experimental results demonstrate that more than 94\% of triples were obtained correctly through the process of extraction, conversion, mapping, transformation and store.

Interested in this talk?

Register for SEMANTiCS conference