Skip to content

VivekSagarSingh/U.S.A-Presidential-Speech-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

U.S.A-Presidential-Speech-Analysis

Text Analytics project using Python's NLTK library.

Problem Statement:

In this particular project, we are going to work on the inaugural corpora from the nltk in Python. We will be looking at the following speeches of the Presidents of the United States of America:

  • President Franklin D. Roosevelt in 1941
  • President John F. Kennedy in 1961
  • President Richard Nixon in 1973

Code Snippet to extract the three speeches:

    import nltk
    nltk.download('inaugural')
    from nltk.corpus import inaugural
    inaugural.fileids()
    inaugural.raw('1941-Roosevelt.txt')
    inaugural.raw('1961-Kennedy.txt')
    inaugural.raw('1973-Nixon.txt')

Steps involved:

1. Finding the number of characters, words and sentences for the mentioned documents.

a) Roosevelt speech:

Screenshot 2024-01-01 at 8 16 24 PM

b) Kennedy speech:

Screenshot 2024-01-01 at 8 16 44 PM

c) Nixon speech:

Screenshot 2024-01-01 at 8 17 00 PM

2. Removing all the stopwords from the three speeches & showing the word count before and after the removal of stopwords.

a) Roosevelt speech:

Screenshot 2024-01-01 at 8 17 24 PM

A sample sentence after removal of stop-words :

Screenshot 2024-01-01 at 8 17 50 PM

b) Kennedy speech:

Screenshot 2024-01-01 at 8 19 57 PM

A sample sentence after removal of stop-words :

Screenshot 2024-01-01 at 8 19 03 PM

c) Nixon speech:

Screenshot 2024-01-01 at 8 19 39 PM

A sample sentence after removal of stop-words :

Screenshot 2024-01-01 at 8 19 21 PM

3. Most frequently used words in the inaugural address for each president (after removing the stopwords)

a) Roosevelt speech:

Screenshot 2024-01-01 at 8 20 29 PM

• The word that occurs the most number of times in the 1941 inaugural address for president Roosvelt is "nation".

• While the top three words based on frequency of repitition were 'nation': 17 times, 'know': 10 times and 'peopl': 9 times.

• Here we should also note that the words 'spirit': 9 times, 'life': 9 times, 'democraci': 9 times and 'becaus': 9 times were repeated the same 9 number of times as the word 'peopl'. But only top three words were asked so we could not fit these words.

• As 'peopl' was the word which came first on the list among the words having frequency as 9, it was included in the top three words. But in real sense any of these words can replace the word 'peopl' among the top three words.

b) Kennedy speech:

Screenshot 2024-01-01 at 8 20 50 PM

• The word that occurs the most number of times in the 1961 inaugural address for president Kennedy is "let".

• While the top three words based on frequency of repetition were 'let': 16 times, 'us': 12 times and 'power': 9 times.

c) Nixon speech:

Screenshot 2024-01-01 at 8 21 06 PM

• The word that occurs the most number of times in the 1973 inaugural address for president Nixon is "us".

• While the top three words based on frequency of repitition were 'us': 26 times, 'let': 22 times and 'america': 21 times.

4. Plotting the word cloud for each of the three speeches (after removal of stopwords).

a) Roosevelt speech:

Screenshot 2024-01-01 at 8 21 53 PM

b) Kennedy speech:

Screenshot 2024-01-01 at 8 22 24 PM

c) Nixon speech:

Screenshot 2024-01-01 at 8 23 12 PM

About

Text Analytics project using Python's NLTK library.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published