Text Data Analysis • Conducted sentiment analysis on the comments of certain Youtube videos using TextBlob library. Discovering audience’s attitude towards the videos and youtubers; Visualized audience’s attitude with WordCloud
• Discovered most used emojis
• Computed audience engagement rate& like rate among different categories of videos and found trending channels
• Investigated relationship between number of views & number of likes, and relationship between video title naming & number of views using linear regression models.
It depends on 2 factors mainly:
1. Polarity
2. Subjectivity.
- Polarity carries a sentiment in it can be a positive or negative polarity with a range of [-1,1]
- Subjectivity does not carry any sentiment in it.
- textblob is one of package which we will be using. It is a NLP library.
The bigger the font is the higher the priority it gets added to it. We create 2 sets of comments
1.comments_positive (1)
2.comments_negative (-1)
TextBlob('trending 😉').sentiment
-
We use textblob to get the polarity and apply this our main dataframe passing comment_text to texblob such that we will get postive and negative comments
-
we will join all the postive comments as single string and pass it to the wordcloud
wc=WordCloud(stopwords=set(STOPWORDS)).generate(total_comments)
* stopwords are basically unimportant words like "the,is,of…." Etc. we will be ignoring this and concentrating mainly on the polarity either positive or negative w.r.t to the stopwords.
How many people use that emoji package. What is the count of each emoji either happy,sad,excited
- each emoji having their unicodeencoding character assinged to everthing irrespective platform,device,language.
- we create a dictionary with emoji count. in a comment there will be a emoji to get that emoji we use emoji.UNICODE_EMOJI_ENGLISH and append that emoji to a list and use Counter to count the emoji to get the most_common ones and use list comprehension to get the counts of emoji and frequencies.
- USING Plotly's Python graphing library can make interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts. So, to make barchart we use plotly.graph_obj and iplot to trace it
- For os we import os and make a list of those files files=os.listdir(path) and get the file from path current_df=pd.read_csv(path+'/'+file,encoding='iso-8859-1',error_bad_lines=False such that file is from we only get csv files
- Gettting country name using split and category name from path and converting to dictionary using to_dict. And mapping that category name with category_id
full_df['category_name']=full_df['category_id'].map(dict['category_name'])
3 Category With Most Likes: #using boxplot to visualize numerical values in quartile percentages min,25%,median,75%.so from this we can see music and entertainment has most likes.
It depends on 3 factors :
1. Like Rate
2. Dislike Rate
3. Comment Ratio
Calculating the percentage of all rates
1)Like Rate wrt Views: So, Entertainment has more likes and less median value compared with comedy
It depends on 3 factors :
1. Like Rate W.r.t views
2. Dislike Rate w.r.t views
3. Comment Ratio w.r.t views
To count of videos in a channel
cdf=full_df.groupby('channel_title')['video_id'].count().sort_values(ascending=False).reset_index().rename(columns={'video_id':'total_videos'}) AND plotting a graph chart for that.