Skip to content

📊 Counts number of letters usage in the provided texts and makes charts

License

Notifications You must be signed in to change notification settings

SaveliyKolesnikov/LetterUsageAnalyzer

Repository files navigation

Letter usage analyzer

This program counts number of letters usage in the provided texts and makes charts.

Input

Local data is taken from the Data directory and will be grouped by subfolders. Don't forget to set copy to output directory property to copy if newer.

You can create your input provider by implementing IInputTextStreamProvider interface.

Currently supported formats: Epub. To add a new format please extend FileText and FileTextFactory with your file format.

Charts

Charts are generated by ChartjsNodeCanvas invoked from .NET via Javascript.NodeJS.

Chart sample

By letters count

For classic literature

ClassicCountSample

For modern literature

ModernCountSample

By letters percentage

For classic literature

ClassicPercentageSample

For modern literature

ModernPercentageSample

Investigation results

  • From the investigation, we can see that usage of the letter 'Ñ‘' has increased significantly. It's due to the fact that this letter was introduced in 1797 but its usage was marginal thus it was avoided.
  • The letter 'Ñ„' is used more frequently in modern literature due to the increased number of foreign words.
  • Modern literature is more concise than classic one. This we can see in letters count analysis results. All analyzed literature has the same form - novel.

Performance

Time spent on analysis of 20 large books is ~560ms.
Time spent on this data rendering is ~1500ms due to the fact that rendering code is executed via nodejs.

About

📊 Counts number of letters usage in the provided texts and makes charts

Topics

Resources

License

Stars

Watchers

Forks