Python package to calculate statistics from text, which helps to decide readability, complexity and grade level of a particular corpus.
In short, this version greatly optimized the time efficiency, especially in the case that the user need several readability scores of a test text. Consequently, it requires a change in the usage.
Though there are various readability scores (also called textstat), they are all based on a limited set of measurements of the text, i.e. average sentence length, average syllable count per word, average character per word, etc. In the original version at the master, these results are calculated repeatedly when the user asks for more than one readability scores of a certain text. This version of implemetation calculate them all once and pass them to readability score functions wanted by the user.
This version also gets rid of some other repetitive calculations and implements syllable count differently. The master version is using the number of possible hyphenizations as the number of syllables in a word, but at least for the project I am working on, it makes more sense to use vowel-based syllable count, so I change it.
function name - char_count(txt, ignorespaces=True):
Function to return total character counts in a text, pass the following parameter
ignorespaces = False
to ignore whitespaces
function name - syllable_count(text)
returns - the number of syllables present in the given text.
function name - lexicon_count(wordlist, TRUE/FALSE)
wordlist is supposed to be the the list of all words in the test text. It can be obtained by
wordlist = [ch for ch in txt if ch not in exclude]
Calculates the number of words present in the text. TRUE/FALSE specifies whether we need to take in account in punctuation symbols while counting lexicons or not. Default value is TRUE, which removes the punctuation before counting lexicons.
function name - sentence_count(text)
returns the number of sentences present in the given text.
function name - readability_scores(text,defaultscores)
If not specified
defaultscores = ["flesch_ease","smog","flesch_grade","coleman_liau","automated",
"dale_chall","linsear_write","lix","gunning_fog"]
returns a dictionary of requested scores:
example_return_dict = {"flesch_ease":10,"smog":10,"flesch_grade":10,"coleman_liau":10,"automated":10,
"dale_chall":10,"linsear_write":10,"lix":10,"gunning_fog":10}
function name - avg_sentence_length (lexicon count,sentence count)
function name - avg_syllables_per_word(syllable_count,lexicon_count)
function name - avg_letter_per_word(character count,lexicon count)
function name - polysyllabcount(wordlist)
returns the number of words that have more than two syllables
from textstat.textstat import textstat
if __name__ == '__main__':
test_data = """Playing games has always been thought to be important to the development of well-balanced and creative children; however, what part, if any, they should play in the lives of adults has never been researched that deeply. I believe that playing games is every bit as important for adults as for children. Not only is taking time out to play games with our children and other adults valuable to building interpersonal relationships but is also a wonderful way to release built up tension."""
print readability_scores(test_data,list_of_scores_wanted)