Skip to content

Project built for the graduation thesis of Bruno Gois and Matheus Nascimento at the University Graduate Program of Computer Engineering - UniCEUB

Notifications You must be signed in to change notification settings

tanjounokamioku/covid-fake-news-detection

 
 

Repository files navigation

Fake News Detection with Multinomial Naive Bayes and K-Nearest Neighbors on Twitter focused on COVID in Brazil

Project goals

  • Extract tweets in portuguese related to the COVID-19 pandemic for further inspection
  • Use a pre-determined dataset for training both algorithms
  • Clean the training dataset
  • Prepare a model for the dataset in each of the algorithms
  • Use the prepared model to inspect the previously extracted tweets
  • Determine accuracy and precision on both for research comparisons

Dependencies

  • SciKit Learn
  • Numpy
  • Pandas
  • Seaborn
  • NLTK
  • Tweepy

How to run

It is essential to have access to the Twitter API to execute this projetct. In tweets_retrieval, we have set all the code for retrieval and you should create a txt file with all the necessary keys, in order:

    consumer_key
    consumer_secret
    access_token
    access_secret
    bearer_token

Run real_tweets.ipynb, to retrieve live tweets about the Covid context. To change the language retrieved, change the 'pt' to the corresponding BCP 47 language identifier on this line.

    if json_response['data']['lang'] != 'pt':

See more information about the twet Lang Operator here

You can also change the context annotation of the tweets query, to retrieve tweets from others subjects. You can stop executing (CTRL + C on the terminal executing the script) once you have enough tweets.

Before you execute fake_news_MNB.ipnyb, you need to run

    pip install pandas && pip install numpy

after that, before you can use the Tweepy object, you need to create an auth object, like that:

    auth = tweepy.OAuthHandler(your_consumer_key, your_consumer_secret)
    auth.set_access_token(you_access_token, your_access_secret)

With this, the setup process is finished and you can execute the MNB and the KNN files to see and compare the results of the algorithms.

It is also important to declare that the training dataframe was only arranged in the MNB algorithm file. In KNN, we used the final CSV with all of our necessary data.

This work has been developed and published by:

About

Project built for the graduation thesis of Bruno Gois and Matheus Nascimento at the University Graduate Program of Computer Engineering - UniCEUB

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 87.5%
  • Python 12.5%