General overview

1 post / 0 new
Sennierer
General overview

APIS focuses on semantically enriching the roughly 20.000 articles of the Austrian Bibliographic Lexicon (ÖBL). These articles are online accessible in a very simple manner since the early 2000s.

In 2015 the Austrian Centre for Digital Humanities (ACDH) in conjunction with ÖBL and the Institute for Urban and Regional Research (ISR) started the APIS (Austrian Prosopographic Information System). However, especially in regards to semantic technologies some results (e.g. a running Web-app) are already available. Within the project a Virtual Research Environment (VRE) allowing researchers and automatic systems to interact with the data was developed. This VRE already includes simple visualizations. APIS also set up a Linked Open Data (LOD) hub based on Apache Stanbol. This flexible and versatile open-source solution allows to define and execute custom processing chains using custom vocabularies for entity linking. The service provided by ACDH already integrates major reference resources (GND, GeoNames, DBpedia, etc.). More resources can be included on demand. that not only allows to query for LOD resources (e.g. the city of Vienna), but is also able to do Named Entity Recognition (NER).

In APIS we followed a very pragmatic data modeling approach. On the on hand we decided to use relational databases and a web development framework on top of it and not a triple-store and/or a XML database. We did so in order to allow for faster and easier development of the VRE. On the other we opted for a very simple and custom made data-model (again the rational was to ease the development process of the web application while allowing for highly structured data).

Nonetheless we want to include the APIS data not only in the linked open data cloud, but also provide compatibility with quasi-standards such as CIDOC CRM. We evaluated several tools/strategies to re-model our internal data to RDF and various data models such as CIDOC CRM.
Two of the most promising tools were Ontop and Karma. Both tools have their advantages: Ontop is integrated in protege and allows to perform SPARQL queries directly on the relational database. However, the mapping is a bit complicated, you need to write SQL and the performance of the SPARQL queries is - at least in our test cases - not very good. Karma on the other hand allows to not only include relational databases, but also csv sheets, XML files etc. It offers a webbased GUI to actually do the mapping and imports the RDF in a built in OpenRDF Sesame triple store. One big advantages for people with Python-skills is that Karma allows to use small Python scripts for processing the data during the mapping process.
To make a long story short we decided to go with Karma for the moment. I will start mapping a sub-set of our data in May and keep you posted on the pros and cons of Karma.