This page describes a data management plan written for the University of Oslo (uio.no) using the DMPTool.
Training HTR-Models for a Bilingual Digital Edition of the Ethica Complementoria
Contributors to this project
- Annika Rockenberger: Data-curation,University of Oslo (uio.no),https://orcid.org/0000-0001-9515-8262
- Håvard Loeng: Other,University of Oslo (uio.no)
Project details
- Research domain: Humanities
- Project Start: June 26, 2023
- Project End: October 31, 2023
- Created: June 21, 2023
- Modified: June 21, 2023
- Ethical issues related to data that this DMP describes? unknown
Citation
- When citing this DMP use:
Annika Rockenberger. (2023). "Training HTR-Models for a Bilingual Digital Edition of the Ethica Complementoria" [Data Management Plan]. DMPHub. https://doi.org/10.48321/D1RP93 -
When connecting to this DMP to related project outputs (such as datasets) use the ID:
https://doi.org/10.48321/D1RP93
Funding status and sources for this project
- Status:Granted
- Funder:University Of Oslo, Teksthub+digital Humanities
- Grant:unspecified
Project description
-
We aim to create a dataset to be used as the basis for a bilingual (Danish/German) digital scholarly edition of one of the most popular books on ‘etiquette’ in early modern Germany and Northern Europe: the Ethica Complementoria.
Originally written in German, the book made its way to the Nordic region through the Danish translation from 1678. This first Danish print will be published in parallel with the German version used for the translation.
The transcription project is part of a larger project on the book and revision history of the Ethica Complementoria, led by Annika Rockenberger and will be conducted by Håvard Loeng. An overview of all editions has been published digitally at the Herzog August library: http://diglib.hab.de/ebooks/ed000738/start.htm.
Manual transcription of two 300+ page texts is not feasible anymore. However, traditional Optical Character Recognition (OCR) yields inferior results for older printed books. Therefore, we aim to test, evaluate, improve, and build upon the NorFraktur model from the National Library of Norway. NorFraktur is a public Handwritten Text Recognition (HTR) model in Transkribus. It was trained on the HTR algorithm developed by READ Coop to recognise manuscripts and older prints automatically.
The development project contributes to both a digital scholarly edition with open access (planned as part of the publications by the Norwegian Language and Literature Society at bokselskap.no) and to the improvement and expansion of an open HTR model that the scholarly community can reuse for early modern prints in Norwegian (including Danish and German).
Planned outputs
Ethica Complementoria Digital Scholarly Edition – Redux. Series of blogposts
Series of public blog posts on the website of the parent project "Georg Greflinger - Digitale Edition". These accompany the project: https://greflinger.hypotheses.org/.
In addition, microblogging will be done on the social network platform Mastodon via the project lead's personal account.
- Format:Text
- Release timeline:October 31, 2023
Ethica Complementoria. Danish print 1678
XML file of the automatic transcription of the 1678 print. Transcribed with Read Coop's Transkribus, using the NorFraktur_1600_PyLaia public HTR model. Transcription quality-checked and corrected by Håvard Loeng, research assistant.
- Format:Dataset
- Anticipated volume:1 MB
- Release timeline:October 31, 2023
- License for reuse:Creative Commons Zero v1.0 Universal