Reducing the Text Document Volume Based on Analysis of Its Correlation Dependencies


  • S. V. Mochenov
  • R. R. Ahmetgaleev
  • S. A. Lazarev



analysis of textual information, multiple words, single words, correlation dependencies, priority sentence, semantic content


The paper deals with the analysis of textual information with the aim of reducing its volume and presenting the content of text of arbitrary sizes in the form of an abstract. The text is considered as a totality of sentences. As a basis for text analysis, the frequency (weight) characteristics of words are used, in particular, nouns used by the author in constructing sentences. The role of certain categories of words is determined. Based on weight characteristics, all words are divided into repeatedly and once used. Recommendations are formulated on the use of filter words to extract certain sentences from a text or a group of sentences and present them to the user. A technique for analyzing a text document has been developed. The analyzed text is divided into groups of sentences. Multiple words are used as base words in determining correlation dependencies between sentences in a text. Based on the correlation dependencies for each group, one priority proposal is determined, which reflects the semantic component of the text section specified by the group. By splitting into groups, a reduction in text volume is achieved. The total number of priority proposals corresponds to the number of groups. These proposals can be used to form an abstract and provide the researcher (user) with adequate and concise information about the content of the analyzed document. The paper provides examples of analysis and identifies the areas for further research.


