This is a program to download the HTML pages of multiple URLs pointing to Reddit posts.
- Each URL is read from a file that the user of the program provides a path to.
- The raw HTML of each URL is then put into separate files.
- From there the HTML is "scrubbed" to extract the comments from the subreddit.
- The cleaned comments are stored in files unique to each post.
- The comments files are scanned and sentiment analysis is performed on each comment.
- Once the sentiments are gathered, they are also put into their own files, unique for each Reddit post.
- Each sentiment file is then put into its own bar graph.
- The user can visually see the num of pos., neg., and neutral comments.
- These graphs appear and are stored in a folder as well.
-
Ensure you are in the correct directory.
-
Make sure to download the environment from requirements.yaml to make sure you have the appropriate libraries.
- If you don't, no biggie, the program will just take a little longer, to install the dependencies during runtime.
-
Run in the terminal using python passing the file of urls you wish to gather comments from as an argument, e.g.:
python run.py yourfilepath/yourfilename
-
Finally when you are finished viewing the graphs, exit out of all of the windows to end the program. Don't worry, all of the graphs will still be in the plots folder as pictures.