ViralCast - HackRPI 2020

Rohan Agarwal / Jed Magracia / Hanif Fauzan / Bao Tran

Inspiration

The rampant spread of misinformation on platforms like Twitter caused much confusion and chaos in the face of an unprecedented pandemic, possibly leading to many preventable deaths. For the remainder of the COVID-19 pandemic and for any future pandemics, it is therefore important to be able to attack misinformation at its source. This is why we wanted to leverage machine learning to predict the spread of social media posts. If successful, platforms could use this technique to monitor and flag potential fake news before it goes viral and before it misleads thousands, not after.

What It Does

ViralCast demonstrates a model for predicting the spread of a tweet, predicting both the number of favorites and retweets it will receive, based on the properties of the tweet's text and the tweeter's follower count. The web app lets a user type in a tweet and fake follower count, and then receive a prediction.

How We Built It

We used a dataset of around 500k tweets related to COVID-19. Using IBM Watson Natural Language Understanding, we added data on sentiment, emotion, entities, and more to each entry. We trained a regression model using SciKit-Learn on this enhanced data. Using React and a Flask server, we created an interactive demo showcasing the model and its predictions. We deployed our app on Heroku.

Challenges We Ran Into

Our first major challenge was figuring out how to represent Twitter text numerically with enough features to train an accurate model. After looking through numerous solutions, we settled on IBM Watson NLU for its easy-to-use API and multiple different analyses, such as sentiment, emotion, entities, and entity sentiment and emotion.

Once this was settled, another big challenge we ran into was efficiency. How can we preprocess thousands of rows of data quick enough for a 24-hour project? Our solution was to implement multithreading into our data processing script.

Finally, we were considering 11 total text features and predicting 2 different variables. We needed to figure out a good model for handling this complexity, so we settled on regression in SciKit-Learn.

On another note, the team was split across two hemispheres, making communication and time management another challenge we learned to overcome.

Accomplishments that We're Proud of

We are proud of the technologies and techniques we learned to use over the course of 24 hours, from IBM's products to machine learning and multithreading, from Flask servers to React frontend and data visualization, and how we could bring them all together to address this issue. We're glad we were able to learn so much and demonstrate a potential pathway to tackling a large and current problem in the world.

What We Learned

We learned a lot about visualization, natural language processing through the cloud, program efficiency, machine learning involving multiple variables over large datasets, and tying together frontend and backend through flask. We also explored the capabilities of machine learning and data science in solving a real-world issue, which was a valuable experience.

What's next for ViralCast

Our first next step is to train on more data and potentially integrate more features into the model, such as word embeddings. This will allow the model to reach its full potential. We could also explore ways to integrate the web app with a person's Twitter feed, perhaps through a browser extension, so that it can be put to use in daily life. Additionally, we can explore alternate datasets, such as "fake news" sets or sets for other topics impacted by misinformation, and see how well the concept generalizes.

More Information

Check out our video demo, DevPost submission, and Heroku deployment. 2nd Place Winner of IBM's Pandemic and Public Health Response Challenge, Top 15 Overall Teams by Wolfram.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
__pycache__		__pycache__
machine-learning		machine-learning
node_modules		node_modules
react-app		react-app
templates		templates
.env		.env
.gitignore		.gitignore
README.md		README.md
app.py		app.py
clean.py		clean.py
package.json		package.json
requirements.txt		requirements.txt
textanalysis.py		textanalysis.py
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViralCast - HackRPI 2020

Inspiration

What It Does

How We Built It

Challenges We Ran Into

Accomplishments that We're Proud of

What We Learned

What's next for ViralCast

More Information

About

Releases

Packages

Contributors 3

Languages

roaga/ViralCast

Folders and files

Latest commit

History

Repository files navigation

ViralCast - HackRPI 2020

Inspiration

What It Does

How We Built It

Challenges We Ran Into

Accomplishments that We're Proud of

What We Learned

What's next for ViralCast

More Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages