Skip to content

This repository contains machine learning projects implemented in Python. The projects demonstrate application of ML algorithms for tasks like classification, regression, clustering, etc.

Notifications You must be signed in to change notification settings

swaapnaa/MACHINE-LEARNING-PROJECTS

Repository files navigation

MACHINE LEARNING PROJECTS WITH PYTHON icons8-machine-learning-53icons8-python-48

This repository contains machine learning projects implemented in Python. The projects demonstrate application of ML algorithms for tasks like classification, regression, clustering, etc.

Identifying Toxic Behaviour in Reddit gaming communities! 🎮

Project Description

Throughout my research journey, I explored the complex world of online communities and delved into the challenges of tackling toxic behavior. Here's a sneak peek into my dissertation, where I addressed the problem of toxicity in Reddit gaming communities and developed innovative solutions using Python, #nlp #machinelearning Gephi and Polinode.

🎯 Problem Statement:

The rise of online gaming communities has brought both joy and challenges. However, one significant challenge is the prevalence of toxic behavior that can negatively impact the gaming experience. My dissertation aimed to identify and understand this toxic behavior, ultimately leading to the development of effective strategies for fostering healthier and more inclusive gaming communities.

🔍 Data Collection

To tackle this problem, I embarked on an extensive data collection process. I gathered vast amounts of Reddit posts and comments from Activision Subreddit, creating a comprehensive dataset that allowed me to capture the diverse interactions and behaviors.

📉 Identifying Negative Content and Users

Using the power of Python, I employed sophisticated techniques to analyze the collected data and identify negative content and toxic users. I leveraged natural language processing (NLP) algorithms and machine learning models to distinguish toxic behavior, such as hate speech, harassment, and disrespectful language, from regular interactions.

📊 Sentiment Analysis

In order to gain deeper insights into the emotional undercurrents of the gaming communities, I conducted sentiment analysis on the collected data. By employing sentiment analysis algorithms, I was able to discern the prevailing sentiments within the community, ranging from positive and neutral to negative emotions, shedding light on the overall tone and atmosphere of these communities.

📈 Building a Predictive Model

One of the highlights of my research was the development of a predictive model using Python. By integrating the insights obtained from the previous stages, I created a robust model capable of predicting the likelihood of toxic behavior within Reddit gaming communities.

1️⃣ Overall Sentiment Score Network Map

We analyzed a vast amount of data and constructed a comprehensive sentiment score network map. It reveals the interconnectedness of sentiments expressed across various topics and provides a holistic view of public opinion. The map showcases the intricate relationships between positive, negative, and neutral sentiments, giving us valuable insights into the underlying sentiment landscape.

2️⃣ Grouping Comments Based on Sentiment Score

To further enhance our analysis, we segmented comments based on sentiment scores. This allowed us to categorize comments as positive, negative, or neutral. By understanding the distribution and intensity of sentiments, we can gain a deeper understanding of the sentiments expressed by users, enabling us to address specific areas for improvement or commendation.

3️⃣ Posting Activity by Hour of the Day

We examined the posting activity of our top 50 positive users and top 50 negative users across different hours of the day. This analysis offers valuable insights into the temporal patterns of posting behavior. By understanding when users are most active and when positive or negative sentiments are prevalent, we can tailor our engagement strategies to effectively target and engage with our audience.

4️⃣ Sentiment Score Trends of Posts for Top 50 Positive and Negative Users Over Time

We tracked the sentiment score trends of our top 50 positive and negative users over a specified period. By analyzing the temporal dynamics, we can identify shifts in sentiment patterns, track the impact of certain events, and assess the overall sentiment trajectory of influential users. These insights empower us to adapt our strategies and cultivate a positive online environment while addressing potential issues head-on.

1685567945935

Network Analysis

1️⃣ Network Map of Negative Users

By closely examining the sentiment data, we constructed a network map specifically focusing on negative users. This map unveils the connections and interactions between individuals expressing negative sentiments, shedding light on influential users and potential clusters. Understanding the network dynamics of negative sentiment can help us identify areas for intervention and address concerns promptly and effectively.

2️⃣ Network Map of Positive Users

In parallel to the negative sentiment network map, we also built a network map dedicated to positive users. This map uncovers the relationships and conversations among individuals who consistently express positive sentiments. By studying the positive sentiment network, we gain insights into the key influencers, supportive communities, and content that resonates with positivity. This knowledge empowers us to nurture a culture of optimism and enhance user experiences.

3️⃣ Network Map Visualizations using Gephi

To bring these network maps to life, we utilized the powerful visualization tool, Gephi. These visualizations enable us to perceive the intricate connections between users, identify clusters, and detect central nodes. Through interactive exploration, we can better comprehend the sentiment landscape and devise strategies that foster positive engagement.

4️⃣ Network Map Visualizations using Polinode

In addition to Gephi, we also employed Polinode to visualize our sentiment networks. Polinode provides a user-friendly interface and intuitive visualizations that enhance our understanding of the sentiment patterns. With its advanced features, we can easily identify influential users, analyze sentiment flows, and delve deeper into the dynamics of positive and negative sentiment networks.

1685568491844 1685568505385 1685568507865 1685568499250

Formula 1 Race Analysis with Python icons8-f1-48

This project involved analyzing Formula 1 race data from the 2020 season using Pandas, Matplotlib, Seaborn, and Scikit-Learn in Python.

The key goals were to:

  • Explore and visualize race statistics like lap times, points, positions etc.
  • Identify insights like top performers, teamwise comparisons, lap time distributions
  • Build a regression model to predict race points based on grid position
  • Apply clustering to segment drivers based on performance metrics
  • Classify race finish positions using machine learning models

The analysis involved data preprocessing techniques like handling missing values, converting formats, and feature engineering. Visualizations included histograms, boxplots, scatterplots, and bar charts.

Some key findings

  • The average points scored by drivers showed high variance due to retirements
  • Mercedes had the highest total points scored among all teams
  • Grid position had a strong negative correlation with final race position
  • Logistic regression achieved the best accuracy in classifying race finish categories

Breast Cancer Classification with Machine Learning icons8-cancer-cell-65

This project involved building a machine learning model to classify breast cancer tumors as benign or malignant based on cell measurements. The dataset was obtained from the UCI Machine Learning Repository.

The key steps included

  • Importing and exploring the breast cancer dataset
  • Visualizing the feature distributions using histograms, pie charts, and boxplots
  • Identifying correlations between features using a heatmap
  • Preprocessing the data by handling missing values and encoding categorical labels
  • Splitting the data into training and test sets
  • Training and evaluating SVM and Random Forest classification models
  • Comparing the precision of the models to identify the better performer
  • Making predictions on a new sample data point
  • The Random Forest model achieved the highest precision with a score of 95%. This indicates the - model correctly classified 95% of malignant tumors in the testing data.

Usage

The Jupyter Notebook and Python files for each project are in their respective folders. To run them, install the required libraries and execute the scripts.

About Machine Learning

Machine learning is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so.

This repository showcases my hands-on application of ML techniques to solve real-world problems using Python. The projects cover various algorithms and techniques like supervised learning, unsupervised learning, dimensionality reduction, neural networks etc.

About

This repository contains machine learning projects implemented in Python. The projects demonstrate application of ML algorithms for tasks like classification, regression, clustering, etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published