Skip to content

This project documents the data wrangling efforts (gathering, wrangling and cleaning) of the WeRatedDogs twitter data. A report provides summary of the important observations.

Notifications You must be signed in to change notification settings

naqeeb-05/WeRatedDogs-Twitter-Data-Wrangling

Repository files navigation

Data Wrangling of WeRatedDogs Twitter Data

Project Description:

The goal of this project was to carry out carry out data wrangling of the WeRatedDogs twitter data. For this data was sourced for three different places and combined into a single file called twitter_archive_master.csv. The data had many quality and tidness issues such as variables with wrong datatypes, missing and repeated values. Some of these issues were addressed and data visualization was carried out to summarize some of the key insights.

Repositry File Structure

There are seven files in the repositry:

• act_report.pdf - this file contains a concise summary of the important observations.
• image-predictions.tsv - dog breed prediction using neural networks (file provided by Udacity)
• tweet_json.txt - this text file contains all the downloaded tweets
• twitter-archive-enhanced.csv - uncleaned/messy data contains information such as 'tweet id', timestamp etc
• twitter_archive_master.csv - this master file contains the final cleaned data
• wrangle_act.html - the python code which performs the data wrangling process
• wrangle_report.html - documentation of the data wrangling efforts

Key Insights

• Most tweets had a rating between 10/10 and 12/10 infact around 70% of the tweets had this ratings
• Golden retriever is the most tweeted dog breed followed by Labrador Retriever and Pembroke
• Standard Poodle on average had the most retweets followed by English springer and Afghan hound (both of which had similar retweet counts)
• Even though Standard Poodle on average had more retweets it was the Saluki breed which had the highest favorite counts (more than 20k)
• The most highly rated dog was Atticus who had the highest numerator rating of 1776. With a bow tie and sunglasses he is truly a good boi

About

This project documents the data wrangling efforts (gathering, wrangling and cleaning) of the WeRatedDogs twitter data. A report provides summary of the important observations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages