This project involves advanced SQL queries and data analysis to investigate and solve real-world business problems using MySQL. The focus is on two main case studies: Job Data Analysis and Investigating Metric Spike. The project demonstrates how to analyze large datasets, derive meaningful insights, and optimize operations through efficient SQL querying techniques.
Operational Analytics is vital for enhancing a company's operations by identifying areas of improvement.
The project is tasked with analyzing datasets to answer questions posed by different departments within the company.
The project is divided into two main case studies:
-
Job Data Analysis: Analyze job-related data to understand trends, throughput, and language distribution, and to identify duplicate records.
-
Investigating Metric Spike: Analyze user engagement metrics to investigate sudden changes and understand user behavior across different dimensions.
- job_id: Unique identifier of jobs
- actor_id: Unique identifier of actor
- event: The type of event (decision/skip/transfer)
- language: The Language of the content
- time_spent: Time spent reviewing the job in seconds
- org: The Organization of the actor
- ds: The date in the format yyyy/mm/dd (stored as text)
- Jobs Reviewed Over Time: Calculation of the number of jobs reviewed per hour for each day in November 2020.
- Throughput Analysis: Calculation of the 7-day rolling average of throughput (number of events per second).
- Language Share Analysis: Calculation of the percentage share of each language over the last 30 days.
- Duplicate Rows Detection: Identification of duplicate rows in the data.
- Language Share: The analysis showed that Persian had the highest share at 37.5%, with other languages having smaller shares.
- Duplicate Rows: Several duplicate rows were identified across different job IDs, which could indicate data quality issues that need to be addressed.
- users table: Contains one row per user, with descriptive information about the user’s account.
- events table: Contains one row per event, where an event is an action that a user has taken (e.g., login, messaging, search).
- email_events table: Contains events specific to the sending of emails.
- Weekly User Engagement: Measure the activeness of users on a weekly basis.
- User Growth Analysis: Calculation the growth of users over time for a product.
- Weekly Retention Analysis: Calculation of the weekly retention based on sign-up cohort.
- Weekly Engagement Per Device: Calculation of the activeness of users on a weekly basis per device.
- Email Engagement Analysis: Analyze how users are engaging with the email service.
- User Engagement: Weekly engagement metrics provide insights into how active users are over time, which is critical for understanding user retention and behavior.
- Device Engagement: Understanding engagement across different devices can help tailor experiences and improve overall user satisfaction.
- Email Metrics: Analyzing email engagement can reveal how effective email communications are in driving user actions.
- Jobs Reviewed Over Time
- Throughput Analysis
- Language Share Analysis
- Duplicate Rows Detection
- Weekly User Engagement
- User Growth Analysis
- Weekly Retention Analysis
- Weekly Engagement Per Device
- Email Engagement Analysis
The approach to this project involved several key steps:
- Data Examination: Thoroughly examining the datasets and creating an ER Diagram to understand the relationships between tables.
- Data Profiling: Understanding the structure of the dataset, including the number of rows and columns, data types, and key relationships.
- SQL Queries: Writing advanced SQL queries to perform the analysis required for both case studies.
- Result Interpretation: Interpreting the results of the queries to provide actionable insights.
- Mainak Mukherjee
Email: subha.mainak@gmail.com
Linkedin: www.linkedin.com/in/mainak8
- MySQL Workbench (Version 8.0.31) A graphical user interface that simplifies database management and querying.
Advanced SQL
- Download the Repository:
-> Navigate to the GitHub repository and click on the Code button.
-> Select Download ZIP to download the repository to your local machine.
- Import the SQL Files:
-> The SQL dump file is included in the repository. Locate the SQL file in the extracted folder. Open MySQL Workbench.
-> Navigate to File > Open SQL Script and select the SQL file from the repository.
-> Execute the script by clicking on the lightning bolt icon or pressing CTRL + Enter.
- Run SQL Scripts: -> The repository also contains individual SQL scripts for each case study analysis in the scripts/ directory.
-> Open these scripts in MySQL Workbench to reproduce the results of the case studies.
Throughout this project, I gained insights into operational metrics, user engagement, and language preferences. For instance, in the Language Share Analysis, I observed that Persian had the highest share at 37.5%, with other languages sharing the remaining 62.5%.
The project helped improve the understanding of operational metrics, especially in identifying spikes and trends in user engagement. This analysis can guide decision-making and strategic planning in various departments.