Hi, I'm Roman! I'm an aspiring data analyst passionate about tech, languages, and dataviz, currently seeking a junior or entry-level analytics role. A political and investigative reporter in the past with a knack for finding patterns and artifacts buried in numbers. In love with science fiction, blues music, and sports.
Below is a collection of my practice projects. I've uploaded them as they are, complete with feedback and the reviews I got from professional data analysts who've pointed out occasional mistakes, suggested improvements or complemented the work done.
- About
- Portfolio Projects
- Data Cleaning: Analysing Borrowers' Risk of Default
- Exploratory Data Analysis: Car Sales Ads Research
- Statistical Analysis: Comparing Mobile Plans
- Integrated Project 1: Video Game Sales Research
- Business Analytics: Calculating LTV, CAC, ROI for Online Afisha App
- Making Business Decisions Based on Data: A/B Testing for an Online Store
- Data Stories: Researching L.A. Restaurant Market
- Integrated Project 2: Analysing User Behaviour for a Foodtech Startup
- Data Visualisation Using Tableau: Researching YouTube Trends
In this section I will briefly describe each of the projects and what kind of problems I solved completing them.
Code: p1_data_cleaning_Borrowers_Risk_of_Defaulting.ipynb
Description: The dataset contains 21525 records pertaining to individual borrowers with 12 fields to describe them such as the number of days employed, number of children, family status, level of education, income type and others. The project is centered around data cleaning and preprocessing, including fairly tricky cases of dealing with missing, duplicate, and incorrectly converted data.
Skills: data cleaning, data analysis, descriptive statistics.
Technology: Python, Pandas.
Code: p2_EDA_Car_Sales_Ads.ipynb
Description: The dataset has 51525 entries for individual car sale ads posted on a classified ads website and contains 13 fields describing the car characteristics and the duration of the ad.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation.
Technology: Python, Pandas, NumPy, MatPlotlib.
Code: p3_statistical_analysis_Megaline_Mobile_Plans.ipynb
Description: The data for this project is scattered across 5 datasets: data on web sessions, calls, messages, and users, and a table describing the different plans.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, hypothesis testing (t-testing).
Technology: Python, Pandas, NumPy, SciPy Stats, MatPlotlib, SeaBorn.
Code: p4_integrated_project_1_Video_Game_Sales.ipynb
Description: The dataset contains 16715 entries, where each is a record for a video game, with fields describing its title, genre, year of release, platform, critic and user scores, and sales figures for North America, the EU, Japan and the rest of the other markets.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, hypothesis testing (t-testing), statistical testing.
Technology: Python, Pandas, NumPy, SciPy Stats, MatPlotlib, SeaBorn.
Code: p5_business_analytics_LTV_CAC_ROI_for_Online_Afisha_App.ipynb
Description: The data for this project includes 3 datasets: visits (359400 entries), orders (50415) and costs (2542). All of them are used to compare the sources of visits, calculate user conversion and retention, and analyse marketing expenses across channels.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, cohort analysis, business metrics.
Technology: Python, Pandas, NumPy, SciPy Stats, MatPlotlib, SeaBorn.
Code: p6_business_decisions_based_on_data_AB_Testing_for_Online_Store.ipynb
Description: Aside from the table containing the hypotheses, the data for this project comes from an orders dataset (just under 1200 entries) containing transactions for visitors from two experimental groups, and a small time-series table with total daily numbers of visits by groups over two months of the testing period.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, business metrics, defining KPIs, A/B testing, hypothesis prioritisation, statistical testing (Shapiro, Mann-Whitney, Levene).
Technology: Python, Pandas, NumPy, SciPy Stats, MatPlotlib, SeaBorn.
Code: p7_data_stories_Market_Research_Restaurants_in_LA.ipynb
Presentation p7_slides_restaurants_in_LA.pdf
Description: This project's dataset contains 9561 entries on restaurants in L.A., each recording a single place with fields providing its name, address, number of seats, the type of establishment and whether it's part of a chain or a standalone place.
Skills: data cleaning, data analysis, descriptive statistics, regular expressions, data visualisation, market research, storytelling.
Technology: Python, Pandas, RegEx, MatPlotlib, SeaBorn.
Code: p8_integrated_project_2_Analysing_FoodTech_User_Behaviour.ipynb
Description: The data for this project contains 244126 entries recording time-series user logs of a food delivery app or website, with fields for user ID, timestamp, event screen (e.g. Main Screen, Cart, Payment Successful etc.) and their experimental group following an A/B testing period.
Skills: data cleaning, data analysis, descriptive statistics, data visualisation, sales funnels, A/A and A/B testing, statistical testing (proportion tests).
Technology: Python, Pandas, SciPy Stats, MatPlotlib, SeaBorn, Plotly.
Dashboard: Youtube_Trends_Dashboard
Presentation: p9_dashboard_and_presentation_with_tableau_YouTube_Trends.pdf
Description: The dataset for this project contains 12343 entries of time-series data, where each is a record of a trending category, with fields providing the record ID, category title or genre, the video count for that category, trending date, and region (one of 5: U.S.A., France, India, Japan and Russia).
Skills: data analysis, descriptive statistics, dashboard design, storytelling.
Technology: Tableau.