ruTS
Founded 6 years ago

Library for statistics extraction from texts in Russian

Overview

ruTS is a library that allows extracting statistics from texts in Russian. It provides the following functionality:

  • Object extraction - creating tools for sentence and word extraction from a text, which can be further employed for counting statistics
  • Basic statistics - extracting basic linguistic statistics from a text (the number of complex words, syllables, letters, etc)
  • Readability metrics - counting readability metrics for a text (SMOG Index, Flesch-Kincaid Grade Level, etc)
  • Lexical diversity metrics - counting lexical diversity metrics for a text (Hapax Legomena Index, Type-Token Ratio, etc)
  • Morphological statistics - extracting morphological features from a text (part of speech, gender, transitivity, etc)
  • Datasets - working with a number of preprocessed datasets (soviet reading-books for literature classes, the collected works of Stalin)
  • Visualization - visualizing text with the help of graphs (Zipf's law, Literature Fingerprinting, Word Tree)
  • Components - adding the library's classes to spaCy pipelines
  • API - using functions via RESTful interface

Installation

ruTS requires Python 3.7 or higher. To install the latest stable version from PyPI:

$ pip install ruts

Usage

from ruts import ReadabilityStats
text = "Ног нет, а хожу, рта нет, а скажу: когда спать, когда вставать, когда работу начинать"
rs = ReadabilityStats(text)
rs.get_stats()

    {'automated_readability_index': 0.2941666666666656,
    'coleman_liau_index': 0.2941666666666656,
    'flesch_kincaid_grade': 3.4133333333333304,
    'flesch_reading_easy': 83.16166666666666,
    'lix': 48.333333333333336,
    'smog_index': 0.05}

rs.print_stats()

                    Метрика                 | Значение 
    --------------------------------------------------
    Тест Флеша-Кинкайда                     |   3.41   
    Индекс удобочитаемости Флеша            |  83.16   
    Индекс Колман-Лиау                      |   0.29   
    Индекс SMOG                             |   0.05   
    Автоматический индекс удобочитаемости   |   0.29   
    Индекс удобочитаемости LIX              |  48.33  

Links

Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy