Skip to content

A collection of my practice DA projects, completed in Python for the most part. I've uploaded them as they are, complete with the feedback I received from professional data analysts who've pointed out occasional mistakes, suggested improvements or complemented the work done.

Notifications You must be signed in to change notification settings

vrova3/data_analysis_practice_projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Roman Varum - Data Analysis Portfolio

About

Hi, I'm Roman! I'm an aspiring data analyst passionate about tech, languages, and dataviz, currently seeking a junior or entry-level analytics role. A political and investigative reporter in the past with a knack for finding patterns and artifacts buried in numbers. In love with science fiction, blues music, and sports.

Below is a collection of my practice projects. I've uploaded them as they are, complete with feedback and the reviews I got from professional data analysts who've pointed out occasional mistakes, suggested improvements or complemented the work done.

Table of contents

Portfolio Projects

In this section I will briefly describe each of the projects and what kind of problems I solved completing them.

Data Cleaning: Analysing Borrowers' Risk of Default

Code: p1_data_cleaning_Borrowers_Risk_of_Defaulting.ipynb
Description: The dataset contains 21525 records pertaining to individual borrowers with 12 fields to describe them such as the number of days employed, number of children, family status, level of education, income type and others. The project is centered around data cleaning and preprocessing, including fairly tricky cases of dealing with missing, duplicate, and incorrectly converted data.
Skills: data cleaning, data analysis, descriptive statistics.
Technology: Python, Pandas.

Exploratory Data Analysis: Car Sales Ads Research

Code: p2_EDA_Car_Sales_Ads.ipynb
Description: The dataset has 51525 entries for individual car sale ads posted on a classified ads website and contains 13 fields describing the car characteristics and the duration of the ad.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation.
Technology: Python, Pandas, NumPy, MatPlotlib.

Statistical Analysis: Comparing Mobile Plans

Code: p3_statistical_analysis_Megaline_Mobile_Plans.ipynb
Description: The data for this project is scattered across 5 datasets: data on web sessions, calls, messages, and users, and a table describing the different plans.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, hypothesis testing (t-testing).
Technology: Python, Pandas, NumPy, SciPy Stats, MatPlotlib, SeaBorn.

Integrated Project 1: Video Game Sales Research

Code: p4_integrated_project_1_Video_Game_Sales.ipynb
Description: The dataset contains 16715 entries, where each is a record for a video game, with fields describing its title, genre, year of release, platform, critic and user scores, and sales figures for North America, the EU, Japan and the rest of the other markets.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, hypothesis testing (t-testing), statistical testing.
Technology: Python, Pandas, NumPy, SciPy Stats, MatPlotlib, SeaBorn.

Business Analytics: Calculating LTV, CAC, ROI for Online Afisha App

Code: p5_business_analytics_LTV_CAC_ROI_for_Online_Afisha_App.ipynb
Description: The data for this project includes 3 datasets: visits (359400 entries), orders (50415) and costs (2542). All of them are used to compare the sources of visits, calculate user conversion and retention, and analyse marketing expenses across channels.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, cohort analysis, business metrics.
Technology: Python, Pandas, NumPy, SciPy Stats, MatPlotlib, SeaBorn.

Making Business Decisions Based on Data: A/B Testing for an Online Store

Code: p6_business_decisions_based_on_data_AB_Testing_for_Online_Store.ipynb
Description: Aside from the table containing the hypotheses, the data for this project comes from an orders dataset (just under 1200 entries) containing transactions for visitors from two experimental groups, and a small time-series table with total daily numbers of visits by groups over two months of the testing period.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, business metrics, defining KPIs, A/B testing, hypothesis prioritisation, statistical testing (Shapiro, Mann-Whitney, Levene).
Technology: Python, Pandas, NumPy, SciPy Stats, MatPlotlib, SeaBorn.

Data Stories: Researching L.A. Restaurant Market

Code: p7_data_stories_Market_Research_Restaurants_in_LA.ipynb
Presentation p7_slides_restaurants_in_LA.pdf
Description: This project's dataset contains 9561 entries on restaurants in L.A., each recording a single place with fields providing its name, address, number of seats, the type of establishment and whether it's part of a chain or a standalone place.
Skills: data cleaning, data analysis, descriptive statistics, regular expressions, data visualisation, market research, storytelling.
Technology: Python, Pandas, RegEx, MatPlotlib, SeaBorn.

Integrated Project 2: Analysing User Behaviour for a Foodtech Startup

Code: p8_integrated_project_2_Analysing_FoodTech_User_Behaviour.ipynb
Description: The data for this project contains 244126 entries recording time-series user logs of a food delivery app or website, with fields for user ID, timestamp, event screen (e.g. Main Screen, Cart, Payment Successful etc.) and their experimental group following an A/B testing period.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, sales funnels, A/A and A/B testing, statistical testing (proportion tests).
Technology: Python, Pandas, SciPy Stats, MatPlotlib, SeaBorn, Plotly.

Data Visualisation Using Tableau: Researching YouTube Trends

Dashboard: Youtube_Trends_Dashboard
Presentation: p9_dashboard_and_presentation_with_tableau_YouTube_Trends.pdf
Description: The dataset for this project contains 12343 entries of time-series data, where each is a record of a trending category, with fields providing the record ID, category title or genre, the video count for that category, trending date, and region (one of 5: U.S.A., France, India, Japan and Russia).
Skills: data analysis, descriptive statistics, dashboard design, storytelling.
Technology: Tableau.

About

A collection of my practice DA projects, completed in Python for the most part. I've uploaded them as they are, complete with the feedback I received from professional data analysts who've pointed out occasional mistakes, suggested improvements or complemented the work done.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published