The Oldest Russian Manuscripts as an Object of Statistical Analysis
linguistic statistics, ancient Russian texts, XIth century, Kirill TurovskyAbstract
The work describes two statistical experiments aimed at revelation of the correlation proximity/distance of 12 texts, survived in the Russian copies of the 11th century, and their comparison with the works of Kirill Turovsky – the author of the 12th century – (RNB, F.p.I. 39, 13th cent.; ff. 1–48). The paper presents the results of the comparative analysis of a) various ways of extraction of linguistic units from texts and b) retrievals of various volumes and also of the linguistic interpretation of basic laws of manuscript grouping.
The degree of the statistic-linguistic neighboring of the manuscripts is computed in two stages: at the first stage the lists of the most frequent words of each pair of texts are compared (computation of Spearman's rank correlation coefficient), at the second stage the texts are grouped on the basis of the obtained correlation values which are taken as distances between the manuscripts (cluster analysis is applied and a dendrogram is plotted).
The extraction of the most frequent words of the manuscripts, the development of ranked lists, obtaining the data on the quantity (and the rank, respectively) of each of the forms in other codices are carried out by means of the statistics module of the historical corpus “Manuscript”. Computation of the correlation coefficients of the texts and clustering texts are done with the help of software package “Statistics” (TIBCO Software Inc.). Lists of various volumes (from 50 to 300 word forms) and comprising units of various degrees of unification relative to the text forms were analyzed.
The result of the first experiment was the revelation of three main stable clusters of the sub-corpus: the group of Gospels, the group of Menaia and the group of miscellanies of various contents.
The second experiment gave a possibility of seeing the dependence of the proximity of the sermons of Kirill Turovsky to various clusters on the degree of unification of forms in the retrievals and the retrievals volumes.
The linguistic analysis of the results was a basis for revelation of lexical-grammatical and lexical-semantic factors determining occurrence of the texts of Kirill Turovsky in different clusters at various initial conditions of retrieval: in the group of Gospel copies (at the retrieval volume from of 50 or 100 words), in the sub-group of miscellanies (at the retrieval of 200 words), in the sub-group of Izbornik 1073 and The Pandects of Antiochus (the retrieval of 300 words).
Головин Б. Н. Язык и статистика. М. : Просвещение, 1971. 190 с.
Там же. С. 157–159.
Компьютеризованный статистический анализ для историков / под ред. Л. И. Бородкина и И. М. Гарсковой. М., 1999. 187 с.
Баранов В. А. Исторический корпус как цель и инструмент корпусной палеославистики // Scripta & e-Scripta : The Journal of Interdisciplinary Mediaeval Studies. Vol. 14-15. Sofia : “Boyan Penev” Publishing Center ; Institute of Literature, BAS, 2015. C. 39-62.
Victor Baranov. A Text Corpus of Medieval Manuscripts as a Goal and a Tool for Linguistic Research // Editing Mediaeval Texts from a Different Angle: Slavonic and Multi-lingual Traditions (together with Francis J. Thomson’s Bibliog-raphy and Checklist of Slavonic Translations). To Honour Francis J. Thomson on the Occasion of His 80th Birthday : Together with Proc. of the ATTEMT Workshop held at King’s College, London, 19–20 December 2013 and the ATTEST Workshop held at the University of Regensburg, 11–12 December 2015 / edited by Lara Sels, Jürgen Fuchsbauer, Vittorio Tomelleri and Ilse de Vos. Peeters Leuven - Paris - Bristol, Ct, 2018. Pp. 283-308.
Баранов В. А. Поиск и демонстрация данных в историческом корпусе «Манускрипт» // Корпусная лингвистика –2019 : труды международной конференции (24–28 июня 2019 г., Санкт-Петербург). СПб. : Изд-во С.-Петерб. ун-та, 2019. С. 271–279.
Баранов В. А., Дубовцев С. В. Модуль статистики информационно-аналитической системы «Манускрипт»: функции и демонстрация данных // Информационные технологии и письменное наследие: материалы IV Междунар. науч. конф. (Петрозаводск, 3–8 сентября 2012 г.) / отв. ред. В. А. Баранов, А. Г. Варфоломеев. Петрозаводск ; Ижевск, 2012. С. 23–26.
Сводный каталог славяно-русских рукописных книг, хранящихся в СССР (XI–XIII вв.). М. : Наука, 1984. 406 с.
Баранов В. А., Жолобов О. Ф. Лингвостатистическое исследование частотных слов в Словах Кирилла Туровского (по рукописи РНБ, F.п.I.39) // Slověne = Словѣне. International Journal of Slavic Studies. В печати.
Жолобов О. Ф. О контрастирующих орфографических системах в рукописи XIII в. (к интернет-изданию Толстовского сборника) // Древняя Русь. Вопросы медиевистики. 2018. 3 (73). С. 77–89.
Ferster, E. and B. Rents. Metody korrelyatsionnogo i regressionnogo analiza. Rukovodstvo dlya ekonomistov [Methods of Correlation and Regression Analysis. Manual for Economists]. Moscow, 1983, 304 p. Pp. 160-163.
Paul A. and Jr. Gore. Cluster analysis. In: Handbook of Applied Multivariate Statistics and Mathematical Modeling. (Eds.) Howard E.A. Tinsley and Steven D. Brown. Academic Press, 2000. Pp. 297-321.
Tryon, R. Cluster analysis. New York: McGraw Hill, 1939.
Cattell, R. B. A note on correlation clusters and cluster search methods. Psychometrica, 9, 1944. Pp. 169-184.
Sokal, R. and P. Sneath. Principles ofnumeric taxonomy. San Francisco: W. H. Freeman, 1963.
Головин Б. Н. Указ. соч. С. 159–166.
Успенский 1988 – Успенский Б. А. История русского литературного языка (XI–XVII вв.). Budapest: Tankö-nyvkiadó, 1988. 451 c. С. 18, 68.
Picchio, R. Models and patterns in the literary tradition of Medieval Orthodox Slavdom // American contributions to the Seventh International Congress of Slavists, II. The Hague, 1973. P. 445.
Пичхадзе А. А. Переводческая деятельность в домонгольской Руси: лингвистический аспект. М.: НП «Рукописные памятники Древней Руси», 2011. 408 с. С. 54.