Skip to content

lorenzomarini96/textcounter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

textcounter

Suppose we want to count how many times a given letter, or word, are used in a text file. The fastest way (at least compared to a human being) can be to create a program that, reading an input file, scrolls through all the lines and is able to count the number of of these in the text.

We also would also like to be able to compute the relative frequency between the various objects analyzed and show a histogram of the frequencies in a more easily readable way. It may seem like a useless and somewhat tedious operation (certainly for my friends :)), but it can be a useful exercise to learn how to use python for data analysis purposes.

The textcounter package aims to read an input file and create:

  • the histogram of the occurrences of the letters
  • the histogram of the occurrences of the words
  • the histogram of the occurrences of the N (integer value chosen by the user) most used words
  • the histogram of only the words chosen by the user

Table of Contents

Getting Started

Prerequisites

textcounter package

Count number of letters

Example

python3 count_letters.py -hist texts/infinito.

Output on command line:

Count number of words

Example

python count_words.py texts/test.txt

Output on command line:

-------------------------------------
Statistics of the text:

number of words:         13
number of lines:          3
number of letters:       52
number of characters:    75
-------------------------------------
Frequencies of words:

1) questo          -->  23.08%
2) testo           -->  23.08%
3) è               -->  15.38%
4) un              -->  15.38%
5) non             -->  7.69%
6) test            -->  7.69%
7) invece          -->  7.69%
-------------------------------------

Count specific words

Example

python count_words_find.py texts/infinito.txt

Count top N words

Example

python count_words_topN.py texts/infinito.txt

Repo structure

textcounter/
├── LICENSE
├── README.md
├── docs
├── tests
│   ├── README.md
│   ├── __init__.py
│   └── texts
│       ├── infinito.txt
│       ├── test.txt
│       └── yellow_submarine.txt
└── textcounter
    ├── README.md
    ├── __init__.py
    ├── count_letters.py
    ├── count_words.py
    ├── count_words_find.py
    ├── counts_words_topN.py
    ├── figures_count_letters
    ├── figures_count_words
    ├── figures_find_words
    ├── figures_topN_words
    ├── text_DantesInferno
    └── texts

Contributing

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.

Author

  • Lorenzo Marini - Initial work

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

About

Repo for textcounter project.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages