ENCODING AND TRANSCODING OF TRANSCRIPTIONS OF THE HISTORICAL CORPUS “MANUSCRIPT”

Authors

  • V. A. Baranov Kalashnikov Izhevsk State Technical University
  • R. M. Gnutikov Udmurt State University
  • K. I. Zinatshin Kalashnikov Izhevsk State Technical University

DOI:

https://doi.org/10.22213/2618-9763-2021-4-82-89

Keywords:

text corpus, Slavonic medieval manuscripts, transcription, encoding

Abstract

The article considers capabilities of using Cyrillic blocks of the Unicode Standard for the purpose of creating transcriptions, which would represent graphics of medieval Slavonic manuscripts. In addition, much attention is given to the fact that the Unicode Standard provides variants of Cyrillic letters, which means that one can accurately enough record graphic features of manuscripts. However, some variants of certain letters are still missing, and that is why there exists a need to use additional agreements of character encoding, which code points are placed in special blocks and Private Use Areas and not in standard ranges of Unicode. The Manuscript - a historical corpus - is the example of a big machine-readable collection of medieval Slavonic manuscripts. It was created on the base of Oracle DBMS with the use of a specialized system of codes and fonts. Transference of the corpus to other technological platforms or usage of external software (including separate texts, parts of corpora, selections) for analysis of linguistic data would be possible only after downloaded files are recoded to the Unicode Standard. A comparative analysis of the character blocks used in the corpus and in the current version 14.0 of the Unicode Standard leads to the conclusion that recoding either results in losses of graphic features or requires usage of a supplementary set of varying characters with code points of Private Use Areas. Instances when there are two or more characters of the Unicode Standard that correspond to one recoded character of the Manuscript are analyzed. It is also stated that numerous ligatures and certain singular graphemes are missing in the standard blocks and in the blocks of Private Use Areas.

Author Biographies

V. A. Baranov, Kalashnikov Izhevsk State Technical University

Doctor of Philology, Professor

R. M. Gnutikov, Udmurt State University

K. I. Zinatshin, Kalashnikov Izhevsk State Technical University

References

Unicode // The Unicode Consortium. URL: https://home.unicode.org/(дата обращения: 03.11.2021).

Паймина О. С. Языковые особенности Троицкого сборника XII-XIII вв. : дис. … канд. наук: 10.02.01 - Русский язык. Казань : КГУ, 2012. 326 с.

Proposal for a unified encoding of Early Cyrillic glyphs in the Unicode Private Use Area / Victor Baranov, David J. Birnbaum, Ralph Cleminson, Heinz Miklas, Achim Rabus // Scripta & e-Scripta: The Journal of Interdisciplinary Mediaeval Studies. Vol. 8-9. Sofia : “Boyan Penev” Publishing Center ; Institute of Literature, BAS, 2010. S. 9-26. URL: https://clck.ru/YeZyU (дата обращения: 03.11.2021).

Новгородская служебная минея на май (Путятина минея). XI век: Текст, исследования, указатели / подг. В. А. Баранов, В. М. Марков. Ижевск : Издат. дом «Удмуртский университет», 2003. 788 с.

Путятина минея / подг. В. А. Баранов, В. М. Марков; ЛАФИ УдГУ. 2001. 2001. URL: http://manuscripts.ru/ptm/@@http://manuscripts.ru/mns/portal.main?p1=19&p_lid=1 (дата обращения: 03.11.2021).

Манускрипт: славянское письменное наследие / ИжГТУ имени М. Т. Калашникова, УдГУ ; коллектив авторов. URL: http://manuscripts.ru/ (дата обращения: 03.11.2021).

Proposal for a unified encoding of Early Cyrillic glyphs in the Unicode Private Use Area / Victor Baranov, David J. Birnbaum, Ralph Cleminson, Heinz Miklas, Achim Rabus // Scripta & e-Scripta: The Journal of Interdisciplinary Mediaeval Studies. Vol. 8-9. Sofia : “Boyan Penev” Publishing Center ; Institute of Literature, BAS, 2010. S. 9-26. URL: https://clck.ru/YeZyU (дата обращения: 03.11.2021).

Ponomar Project. URL: https://ponomar.net/(дата обращения: 03.11.2021).

Kodeks Project / Sebastian Kempgen. URL: https://kodeks.uni-bamberg.de/AKSL/AKSL.Schrift.htm (дата обращения: 03.11.2021).

Манускрипт: славянское письменное наследие / ИжГТУ имени М. Т. Калашникова, УдГУ; коллектив авторов. URL: http://manuscripts.ru/ (дата обращения: 03.11.2021).

Published

18.01.2022

How to Cite

Baranov В. А., Gnutikov Р. М., & Zinatshin К. И. (2022). ENCODING AND TRANSCODING OF TRANSCRIPTIONS OF THE HISTORICAL CORPUS “MANUSCRIPT”. Social’no-Ekonomiceskoe Upravlenie: Teoria I Praktika, 17(4), 82–89. https://doi.org/10.22213/2618-9763-2021-4-82-89

Issue

Section

Articles