Skip to content

Snigda0402/Factors-shaping-Data-Science

Repository files navigation

Factors shaping Data Science

Objective

The project aims to understand the underlying reasons for the success and failure of data science initiatives by analyzing a collection of news articles related to Data Science, Machine Learning, and Artificial Intelligence.

Skills/Tools Used :

  1. Python programming language
  2. Natural Language Processing
  3. Text cleaning
  4. Named Entity Recognition
  5. Topic Modelling
  6. Sentiment Analysis

Project Overview

1. Data Collection and Preprocessing:

  • A dataset containing news articles on Data Science, Machine Learning, and AI was provided by my University that I am studying in (University of Chicago).
  • Noise Cleaning :
    • Lowercasing
    • Removed HTML tags, URLs and web crawl remnants
    • Removed punctuations and digits
    • Removed symbols and non-printable characters
    • Removed newlines, tabs and extra white spaces - Pre-processing :
    • Removed stopwords
    • Lemmatization using WordNetLemmatizer()

2. Topic Detection:

  • Use topic modeling techniques to categorize articles into major themes or topics.
  • Assign each article to the appropriate topic for analysis.

3. Sentiment Analysis:

  • Perform sentiment analysis to determine the sentiment (positive, negative) expressed in the articles.
  • Customize sentiment analysis to fit the context of data science initiatives.

4. Reasons for Failure:

  • Identify articles with negative sentiment discussing failures in data science projects.
  • Extract reasons for these failures, such as technology issues, data challenges, or project management problems.

5. Reasons for Success:

  • Identify articles with positive sentiment discussing achivements in data science projects.
  • Extract reasons for these achievements

6. Sentiment Over Time Analysis:

  • Create a timeline to visualize how sentiment changes over different time periods.
  • Investigate whether sentiment patterns align with specific events or technological advancements.

7. Entity Identification:

  • Use Named Entity Recognition to identify organizations, people, and locations mentioned in the articles.
  • Compile a list of these entities for further analysis.

8. Targeted Sentiment Analysis:

  • Analyze the sentiment associated with specific entities mentioned in the articles.
  • Determine how organizations and people are portrayed in the context of data science projects.

9. Insights and Recommendations:

  • Analyze the reasons for failure and success to extract insights.
  • Develop actionable recommendations to enhance the success rates of data science initiatives.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published