This panel will introduce the vision and work behind the Data for History (DfH) consortium (dataforhistory.org). DfH is an international consortium holding the aim of improving geo-historical data interoperability in the semantic web. This aim entails establishing common methods for modelling, curating and managing data in historical research. Such methods would provide foundational support to research projects adopting a framework of collaborative, cumulative and interoperable scientific data production and investigation. DfH aims to develop and then maintain a common ontological model that would allow for domain specific, semantically robust data integration and interoperability. The model will be built up in the conceptual framework of the CIDOC CRM, and the experience of symogih.org and other participating projects, in order to integrate to the wider research community.
In contemporary research, an essential part of a historian’s research effort consists in laying the groundwork for scholarly argumentation by investing significant time in the production of complex structured data. This structured data encodes the scholar’s work of discernment of unique facts from documents and makes them potentially accessible either as individual facts or in aggregate. Such structured data sources are an invaluable tool of the contemporary historian, allowing for a much more granular and easily repeatable testing of historical argumentation than unstructured data. While nothing can replace the monograph/article in its function as the basic means to deliver complex argumentation, the advent of readily available primary facts encoded in a discrete way gives new means to confirm/disconfirm the arguments proposed therein, by checking arguments against individual and aggregate facts in a digital environment.
To maximize the utility of such structured facts, historians, as in other disciplines, face the task not only of creating such data but of doing so in a standardized way such that data regarding comparable facts and referents are transparently recorded and comparable. A means of creating commonly acceptable schemas for recording data and commonly accepted referents with regards to individual real world entities (places, people things) must be developed. The particular challenge for historical research, however, is the necessarily broad horizon of interest of the domain and the consequent significant intellectual challenge in considering how to derive such compatible schema and referent systems. The remit of historical research is so broad that there is not unfounded scepticism at the possibility of coming to any agreement on such questions. Without such agreement, however, the valuable integrated primary data that would allow a more granular investigation of historical argumentation will remain only locally useful, severely limiting its potential use and impact for historical research.
DfH proposes to meet these challenges by adopting the methods of semantic data encoding using formal ontologies. Such ontologies provide a means for historians, working together with computer scientists, to meet this challenge, by co-designing common conceptual models and reference data sources, which would establish a sort of interlingua to exchange data relevant to specific areas of historical research. This panel will offer a means for researchers belonging to DfH to lay out a vision and specific strategies for approaching this question.
Dear George,
Thank you for this. Great abstract. Two minor points that I propose as suggestions for change.
1) DfH is an international consortium holding the aim of improving geo-historical data interoperability in the semantic web.
I would change this to historical data interoperability. Not all cases in the consortium have a geographical dimensions, not even always a spatial dimension. I would just leave geo out.
2) The model will be built up in the conceptual framework of the CIDOC CRM
That in my view is too strong. We have been talking about extensions to. Parts of the model might be outside the conceptual framework of CIDOC-CRM but of course still in dialogue with the CIDOC-CRM. I would say: The historical model(s) will be built up as interoperable extensions to the CIDOC-CRM etc.
Thanks again, Charles [P.S I have just contacted Veruska to ask whether we can submit an abstract. The coming days we are in DH-Benelux conference in Amsterdam
Charles
Lab In Virtuo : A Intelligent Virtual Environment dedicated to History and Heritage of Industrial Landscapes.
Sylvain Laubé1, Ronan Querrec,2, Serge Garlatti2, Marie-Morgane Abiven1, Bruno Rohou1
1Centre F. Viète (EA 1161), 2LabSTICC (UMR 6285)
This research project is located in the field of knowledge engineering, Virtual Reality (VR), Digital Humanities (DN) on the topic "history of industrial cultural landscapes". Our main objectives are to:
a) Develop, validate and disseminate virtual laboratories and interdisciplinary research methods based on case studies from industrial history and heritage (studies in progress at the Centre F. Viète and the LIA CNRS "Mines" in Atacama[1]), i.e. demonstrators selected as "usung cases" of strong historical and heritage interest;
b) Develop and validate collaborative and participatory science methods involving local non-institutional actors; devolve these methods to heritage, mediation and economic actors.
We consider that 3D rendering (resulting from reverse engineering from archives or LIDAR 3D capture for example) constitute innovative collaborative work environments (in virtuo laboratories) if we associate them to :
1) A digital corpus/an incremental knowledge dataset based on the semantic web and a domain ontology (as an CIDOC-CRM extension) developed from the existing ANY-ARTEFACT SHS activity metamodel [Laubé, 2017] (upper part of Figure 1)
2) An Intelligent Virtual Environment (IVE) (developed from the existing MASCARET metamodel [Chevaillier et al., 2011]) shared on distant sites where it is possible to collect data and knowledge, to work/exchange/document in virtuo (bottom part of Figure 1) in a collaborative and participatory science approach. Finally, to reproduce the research work in an innovative way through interactive mediations for different types of public.
3) Interfaces allowing the integration of new knowledge into the incremental digital corpus via two types of interfaces: the intelligent virtual environment and a dedicated web interface.
The Lab In Virtuo project is based on various user games that will allow users to acquire and share more detailed, complete and enriched knowledge through collaboration and in virtuo simulation. In other words, it is a question of greatly improving the elicitation of knowledge and its dissemination.
Our proposal therefore focuses on the acquisition and restitution of knowledge (knowledge - knowledge - know-how - technical gestures) in the virtual space that constitutes the 3D modeling of an artifact with historical and heritage characters (Lab in Virtuo) involving several user games. A set of users is considered as a serie of interactions involving two or more actors, one or more autonomous virtual agents in the virtual 3D environment (Figure 1) in a collaborative and/or participatory science approach: i) researchers from different fields (historian, archaeologist, anthropologist, ethnologist, etc.).); ii) heritage experts/researchers; iii) local non-academic actors/researchers for the acquisition of information on the material environment, acquisition/capture of technical gestures (know-how) as well as their explanation; iv) mediation experts/researchers/local actors.
[1] Venise : https://brestvenise.hypotheses.org/100 ; Tocopilla : https://liamines.hypotheses.org/1183 ; Caracoles : https://liamines.hypotheses.org/957
Building a domain specific Research Ontology from external Databases of Academic History
Thomas Riechert[1], Edgard Marx[1], Jennifer Blanke[2]
The collaborative project: “Early Modern Professorial Career Patterns - Methodological Research on Online Databases of Academic History (PCP-on-Web)” (HTWK Leipzig, HAB Wolfenbüttel) funded by the DFG [3], focuses on domain-specific research ontologies [4]. As is befitting a project that aligns itself with the Digital Humanities, it is interdisciplinary in nature and innovatively combines classic historiographical research methods with Semantic Web technologies in order to investigate scholarly career patterns; a classic prosopographic research question that addresses a significant lacuna in the field that demands to be investigated.
In this talk, we will discuss the results of our current research and will demonstrate how a domain-specific research ontology, which gathers information from several different online databases, is being built. As we will show, PCP-on-Web uses RDF standards for collecting facts (RDF triples) and for describing the vocabulary in a formal way using OWL. Furthermore, the alignment to Data-for-History (DfH) [5] vocabularies enables the extension of the research ontology through the use of facts from prosopographical databases, as well as the reusability of the resulting research ontology.
The process for building the research ontology is using historical expertise and knowledge engineering methods in parallel. The process covers the database layer, the application layer as well as the research interface layer of the Heloise Common Research Model (HCRM) [6]. Researchers start by exploring available external databases in the way of queering it. These Queries are formally defined by SPARQL [7] and can be used by online available SPARQL endpoints or be explored by local tools like KBox [8]. By formal definition, these SPARQL queries represent parts of external databases. They extract relevant concepts and properties for the research vocabulary [9], and can be used for automatic transformation of relevant data into the envisaged research ontology as well. Thus enables researchers to rebuild the research ontology any time in the future, as far as the syntax and semantics of the sources are not changing. The usage of a common vocabulary, such as the one to be developed and evolved by the DfH consortium, can avoid this problem of inconsistent data. Additionally, the effort of exploring new databases can be minimised, as SPARQL can be defined for a common vocabulary. The usage of external data can be gathered without exploring manually.
The project is presented by Thomas Riechert, Professor for Information Systems and Data Management at the Leipzig University of Applied Sciences (Hochschule für Technik, Wirtschaft und Kultur).
[1] Hochschule für Technik, Wirtschaft und Kultur Leipzig: http://htwk-leipzig.de
[2] Herzog August Bibliothek, Wolfenbüttel: http://hab.de
[3] Research Project: http://pcp-on-web.htwk-leipzig.de
[4] PCP-on-Web research ontology: http://pcp-on-web.htwk-leipzig.de/data/ontology/
[5] Data for History Consortium: http://data-for-history.org
[6] Collaborative Research on Academic History using Linked Open Data: A Proposal for the Heloise Common Research Model, Riechert, Thomas and Beretta, Francesco; In. CIAN-Revista de Historia de las Universidades, 19. (2016)
[7] SPARQL Query Language for RDF: https://www.w3.org/TR/rdf-sparql-query/
[8] KBox: Transparently Shifting Query Execution on Knowledge Graphs to the Edge by Edgard Marx, Ciro Baron, Tommaso Soru und Sören Auer in 11th IEEE International Conference on Semantic Computing, Jan 30-Feb 1, 2017, San Diego, California, USA
[9] PCP-on-Web research vocabulary: http://pcp-on-web.htwk-leipzig.de/data/vocabulary/
Lodewijk Petram, Huygens ING – lodewijk.petram@huygens.knaw.nl
Jelle van Lottum, Huygens ING
Rutger van Koert, KNAW Humanities Cluster
Best practices for making data generated in automated linkage procedures readily re-usable
Since people are central to almost all research in (art) history, and the same persons are often relevant for multiple projects, interoperable and readily re-usable persons data are key to a well-integrated Linked Open Data (LOD) cloud of historical datasets. A common schema for describing person entities would have been invaluable, but the opportunity for proposing such a schema seems to have long passed, given the huge number of person observations already available online and the multitude of models used to describe them. This need not be an insurmountable problem for achieving data interoperability, however, since there is relatively limited variation between the models, allowing for mapping of equivalent classes.
However, there is a category of persons data for which the issues of interoperability and re-usability pose more of a challenge: data that have been generated in automated linkage procedures – a category that is set to explode in size in the coming years. Our paper will explore these issues, investigate and assess the ways they are currently being dealt with, distil best practices, and reflect on the possibilities for setting a domain standard.
We take a research project as a starting point for our paper and use case for the issues that we wish to shed light on. In this project [1], we try to gain historical insight into the economic contribution of migrant workers on a recipient economy, i.c. the economy of the Dutch Republic in the eighteenth century. To this end, we reconstruct the careers of maritime workers by applying (semi-)automatic record linkage on the biographical data observations contained in the muster rolls of the Dutch East India Company (almost 800,000 records), to be supplemented in the near future with observations in the transcripts of interrogations of crew members of Dutch merchant marine by the English admiralty (c. 15,500 records).
Our algorithms produce suggested matches on the basis of name similarity measures and a set of additional linkage rules for dates and geographical locations in the data. From our research perspective, this linkset contains richer data than the dataset with the original data observations. But, as with all automatic linkage procedures, our method involves a linkage selection bias: e.g. sailors with non-standard names are overrepresented in the mini-biographies we compose. So, for our matched records to be re-usable by other researchers, it is critical for them to know what source data we used, which linkage methods and rules we applied, etc., so that they can decide for themselves whether re-using the records in the linkset is apt for answering their research questions.
We argue that all relevant linkage information should be made available as provenance data with each record in the linkset, explore ways of properly conveying this information (e.g. using the PROV ontology [2], the P-PLAN Ontology [3], extensions to CIDOC-CRM such as CRMsci [4] or the GRaSP model [5]) and finally propose best practices for integrating data generated in automated linkage procedures in the LOD cloud.
[1] HUMIGEC: https://www.clariah.nl/projecten/research-pilots/humigec
[2] PROV-O: The PROV Ontology: https://www.w3.org/TR/prov-o/
[3] Garijo, D., Gil, Y. (2012). Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. Proceedings of the 2nd International Workshop on Linked Science: 12/11/2012, Boston, USA.
[4] CRMsci: http://www.ics.forth.gr/isl/index_main.php?l=e&c=663
[5] van Son, C., Caselli, T., Fokkens, A., Maks, I., Morante, R., Aroyo, L., Vossen, P. (2016) GRaSP: A Multilayered Annotation Scheme for Perspectives. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), pp. 1177-1184.
Francesco Beretta, CNRS - LARHRA http://larhra.ish-lyon.cnrs.fr/
Djamel Ferhod, CNRS - LARHRA
Vincent Alamercery, ENS de Lyon - LARHRA
OntoMe : an ontology management environment for extending the CIDOC CRM to historical research sub-domains
In the domain of historical data interoperability [1] one of the major points under discussion is about the possibility of sharing a common ontology for modelling the whole human activities in the past. Historians generally think that data produced according to their research agenda in a specific research sub-domain are not reusable in other contexts. To cope with this issue the symogih.org project (Système modulaire de gestion de l’information historique) —started in 2008 with the aim of producing a virtual research environment for collaborative data production— applied a basic distinction between the research agenda of the scholar and the design of a data model conceived as the most objective possible representation of « historical facts ». This allowed to produce a shared vocabulary about specific sub-domains of historical research providing data interoperability among research projects hosted in the platform [2].
This perspective has been broadened since 2013 with the aim of sharing data produced in the symogih.org virtual research environment with other resources available on the semantic web using RDF technologies [3]. For this purpose the project adopted the CIDOC CRM as a conceptual framework which is not only a standardized (ISO21127:2006) « formal ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information » but also a generic data model designed with a view to « maintain and support a global knowledge network » [4]. Although this model is generally considered by specialists to be well suited for modelling data in humanities and especially in historical reserarch [5], its high degree of abstraction —an indispensable condition for genericity— poses difficulties when used for data production in historical sub-domains. The solution recommended by the CRM consists in the creation of project-specific extensions but this operation is not easily accessible to the non-specialists.
To support the process of CRM extensions managment, and foster the coherence and interoperability of the ontology model development in the domain of historical research, an ontology management environment (OntoME) is currently under development which is designed to facilitate the understanding of the CRM (and of other standardized ontologies and vocabularies) and the production of sub-domain specific extensions submitted to a validation process by the expert community. The platform will allow, on the one side, to import the existing data models in the domain of historical research (or even in a wider spectrum) and to map them to the CRM classes and properties with the aim of providing interoperability for project data in the semantic web. On the other side, the platform will support a controlled development process of CRM extensions specific to sub-domains of historical research, allowing to produce explicit sub-classes and sub-properties of the existing, but more abstract ones, and to bundle them into application profiles which can be used for local data production. The paper will present the main components of OntoMe and provide an exemple of alignmenent with the CRM of some classes of the symogih.org ontology implemented in the SIPROJURIS project [6].
[1] Meroño-Peñuela Albert, Ashkpour Ashkan, van Erp Marieke, Mandemakers Kees, Breure Leen, Scharnhorst Andrea, Schlobach Stefan, van Harmelen Frank, « Semantic Technologies for Historical Research: A Survey », in Semantic Web – Interoperability, Usability, Applicability (IOS Press) 6(2015): 539-564.
[2] http://symogih.org/?q=documentation
[3] http://symogih.org/?q=rdf-publication – Cf. Beretta Francesco. L'interopérabilité des données historiques et la question du modèle : l'ontologie du projet SyMoGIH. Enjeux numériques pour les médiations scientifiques et culturelles du passé, Paris, Presses Universitaires de Paris Nanterre, 2017, 87-127.
[4] Martin Doerr et al., ‘The Dream of a Global Knowledge Network: A New Approach’, Journal on Computing and Cultural Heritage, vol. 1, no. 1 (2008).
[5] Courtin, A., Minel, J.-L. (2017). Propositions méthodologiques pour la conception et la réalisation d’entrepôts ancrés dans le Web des données. Enjeux numériques (cit.), 53-86:61-62.
[6] http://symogih.org/graph/siprojuris-sym – http://siprojuris.symogih.org/
Multi paper panel abstract presented by Bernard Hours (UMR CNRS 519 LARHRA - University of Lyon) and Georg Vogeler (Austrian Center for Digital Humanities – University of Graz)
The prosopographical approach can renew the understanding of the history of religious orders by introducing the notion of curricula and careers into a historiography nourished either by a sociological approach, or by an analysis of political or economic governance, or by the history of spirituality. Many studies have accumulated more or less structured data, collected in a more or less systematic way and which remain more or less accessible to the scientific community. We are therefore confronted with data heterogeneity and dispersion.
Several projects are already underway to try to remedy this situation: in Austria, the project of digital prosopography of religious orders piloted at the University of Vienna by Professor Thomas Wallnig in connection with the Austrian Centre for Digital Humanities directed by Professor Georg Vogeler (workshop of Vienna February 20171), in France within the LARHRA (CNRS Research Center for early modern and modern History) Professor Bernard Hours carries the Monastica project: at the moment, it will gather datas about french Carmelite nuns2 (Bernard Hours), about nuns of several orders who were living in a large space from North Italia to Spanish and then Austrian Netherlands from 16th to 18th century (ANR project LODOCAT3), lastly about Cistercians monks in the early modern period (Bertrand Marceau4) .
The career of a nun or a monk is broken down into compulsory stages (postulancy, clothing, novitiate, vows) and the exercise of various offices or offices for varying lengths of time and according to variable designation procedures. It takes place within the institution, but it can also take place at the provincial or the order as a whole. It can therefore lead to greater or lesser geographical mobility. Moreover, in some countries and at certain times, religious know the experience of exile.
A reflection on the modelling of its information has already been developed within Symogih.org's own ontology. The challenge of integrating the project within the "Data for history" consortium is therefore to align the modeling with CIDOC CRM standards in order to facilitate interoperability between projects dealing with the same historiographic problem. The use of CIDOC-CRM offers an opportunity for convergence of the approaches presented above within the Consortium ''Data for history'' and its interest group ''Prosopography use case overview'' (B.Hours, G. Vogeler). Indeed, the extension of the CIDOC CRM for the social world, in which the Digital History Pole is actively involved, will make it possible to have a standard ontology thanks to which data on monastic and religious careers can be aligned.
1 https://f-origin.hypotheses.org/wp-content/blogs.dir/971/files/2017/01/2...
2 http://symogih.org/?q=type-of-information-record/82&lang=en : information or temporal entity « vows »
3 https://lodocat.hypotheses.org/
4 http://www.sudoc.abes.fr//DB=2.1/SET=1/TTL=1/SHW?FRST=1
Veruska and I have turned our panel proposal into short paper proposal so that the prosopographical theme of the panel could be better secured. I hope that the panel and the short paper proposals will be all accepted so that we can meet in Galway. Best wishes, Charles
This is the text of our short paper proposal:
The Shape of Time and Storifying Data:
Modeling Historical Processes and their Temporal Dimension in Knowledge Graphs
With the advent of more advanced and efficient computer technologies as well as large digitisation programs in art museums, archives, and libraries whereby digital assets are more and more freely available, it is now possible to study the history of works of art and the lives of artists collectively using digital methods and techniques. We now have the capacity to revisit and test the conclusions of previous generations of individual researchers who attempted to identify significant patterns in the study of the arts (such as the development of styles and genres) or of the art markets (such as the emergence of clusters of economic and creative activities) but only had access to relatively small data sets. Recognizing the immense potential of data-driven research a number of institutions have embarked upon ambitious linked open data projects. The project Golden Agents: Creative Industries and the Making of the Dutch Golden Age for example, that started on 1 January 2016 aims at developing in a period of five years a sustainable infrastructure to study relations and interactions between 1) the various branches of the cultural industries and 2) between producers and consumers of creative goods across the long Golden Age of the Dutch Republic. The project will link distributed, heterogeneous resources (both existing and new) on the production of the creative industries in the Dutch Golden Age from institutions such as the Rijksmuseum, KB National Library of the Netherlands, The Netherlands Institute for Art History etc. so that researchers will be able to connect images, objects and texts from different sources in a new and meaningful way. Consumption remains an under-investigated topic with regard to the creative industries in the Dutch Golden Age. The digitisation of the enormously rich collection of the notarial acts in the Amsterdam City Archives, and more specifically the probate inventories contained within these records, will provide detailed and socially diverse data on the possessions of cultural goods by the inhabitants of Amsterdam what was one of the most important cities of the 17th Century. The Golden Agents research infrastructure enables analyses of interactions between various heterogeneous (un) structured datasets by using a combination of semantic web solutions and multi-agent technology that will be supported by ontologies. One of the challenges is the modeling of ontologies for the historical processes of the interactions between various branches, the production and consumption of the creative industries of the Dutch Golden Age. These processes are described as multiple narratives for which we use the concept “storifying data”. These multiple stories developed over time in parallel orders, for instance the order in the making of an object (from idea to final product), the order of an object in the artistic life or oeuvre of their maker, the order between the original object and copies and transformations hereof and finally the order of the object within history. For that reasonthe problem of representing time in linked data cannot be reduced to mapping a historical event in a given place to the right (Georgian, Julian, Chinese etc.) calendars. We need a model that can describe these multiple storylines of objects and ideas. “Like the astronomer, the historian is engaged upon the portrayal of time [..] both transpose, compose and color a facsimile which describes the shape of time,” George Kubler (1962, 19) wrote in The Shape of Time. Remarks of the History of Things. This history of things does not only represent the history of ‘material culture,’ but reunites ideas and objects visually in temporal sequences (Kubler, p. 9). Time and the History of Things are crucial concepts in the Golden Agents project. Herein, we create so to speak a ‘historical internet of things’of (im)material cultural objects and events of the Dutch Golden Age. Kubler’s description of the history of things as parallel sequential orders is not only relevant for a better understanding of the life-cycles of (im)material objects of the history of the Dutch Golden Age. Kubler’s discussion of art history as a “system of formal relations” enables critical reflections on the historiographies of the concept of time of other disciplines as well. In Kubler’s morphological analysis of duration in series and sequence, resonates Braudel’s conceptions of plural time and serial history in human agency (micro time (events); meso time (cyclic processes) and ‘longue durée’ (structural change) that dominated the French historiography of the Annales School (Daley 2012). Recently, the theoretical physicist Carlo Rovelli (2018, 103) in The Order of Time argued that there is no need to choose a privileged variable and call it time. It would suffice to have a theory of dynamic relations that tells us how the things we see in the world vary with respect to each other. While Rovelli is in search of a physical theory to understand the dynamic relations between the earth and the universe, our perception of historical relations within the world might be best captured by graph theory. It is revealing that the only image in Kubler’s art-historical analysis of long before the digital era is a visualisation of a directed network. This paper explores the potential of Kubler’s vision of (analog) multidimensional directed networks in which formal relations between ideas and objects build upon each other sequentially over time in a Semantic Web framework. The ultimate, practical goal of this exploration is to model and implement an ontology of parallel sequential orders of time that expresses historical processes. This ontology will make part of, partially extend the vocabularies of the most commonly used standards in cultural heritage and digital humanities, such as CIDOC-CRM and FRBR.
Citations:
Charles van den Heuvel and Veruska Carretta Zamborlini
Congratulations to all, both the panel and the long papers have been accepted!
And as George wrote:
The panel review:
See you in Galway :-)
The programme is now available: https://eadh2018eadh.wordpress.com/programme/