Skip to content
View anaclaudialemos's full-sized avatar

Block or report anaclaudialemos

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
anaclaudialemos/README.md

Hi there 👋,

I'm Ana Claudia, or just Ana :)

I'm motivated by getting answers, solving problems, and understanding how everything works. I like the idea of turning a heap of data into meaningful, accessible, and impactful information, which led me to develop myself as a data scientist 👩‍💻

Main Skills: Data Analysis, Statistics for Data Analysis, SQL, Tableau, Python and Main Data Science Libraries, Machine Learning (Classification, Regression, Clustering and Time Series).

🤝 Contact me:

Linkedin Badge Gmail Badge

DATA SCIENCE PROJECTS

Business Problem: A traditional health insurance company wants to expand its business, and the company has chosen to offer its customers car insurance. Through a survey, it has obtained feedback from 380,000 customers about whether or not they are interested in acquiring car insurance. The product team plans a campaign for 127,000 new customers who did not respond to the survey, in which they will be offered the auto insurance product, but realized that it would not be able to reach this entire customer base, as it will use telephone calls to offer the product, and the sales team have a capacity to make only 20,000 calls in the campaign period.

Solution: Develop a model to obtain the probability that a given customer in a list, acquire car insurance, and sort the list from the customer with the highest probability to the lowest probability of acquiring the car insurance.

Conclusion: Making 20,000 calls from a list of 127,000 customers, the model would be able to identify 44% of the total number of customers interested in acquiring car insurance, with the ML model being approximately 2.7 times better than the solution used before. Increased to 40,000 calls the model would be able to identify 75% of the total number of customers interested in purchasing auto insurance, meaning that the ML model is approximately 2.3 times better than the solution used before. To reach 90% of interested customers, with the ML model the sales team will only have to make 53,340 calls, which represents 42% of the entire customer list, a reduction of 48% of the number of calls compared to the solution used before.

Repository: Learning to Rank for Insurance Cross Sell

Rossmann is a pharmacy chain that operates over 3,000 stores in 7 European countries. The stores are going to be renovated and the CFO needs to know how much can be invested in each one of them.

Solution: Develop a prediction model for sales and a Telegram bot that returns sales predictions given a store id number.

Conclusion: The model developed predicts a gross income of 285,707,584.00 USD in the next 6 weeks for the stores available, where the best and worst case scenarios results on 286,423,764.87 USD and 284,991,409.31 USD, respectively.

Repository: Sales Forecasts for a Drugstore Chain

Business Problem: House Rocket business model consists of purchasing and reselling properties through a digital platform. The company is looking for new properties for its portfolio. The data scientist is in charge to help find the best business opportunities by answering the questions: which properties should the company buy? once the property is in the company's possession, how long wait before selling it, and what would ta the sale price be?

Solution: Develop a online dashboard with selected properties, a map view with properties distribution, a table with attributes filters, and the expected profit for each property.

Conclusion: Based on commercial criteria, 8,130 properties are recommended to be purchased by House Rocket resulting in a profit of 702,080,905.28 USD, which represents 20.33% of the total investment. This result already considering repairs or renovations expenses.

Repository: King County Housing Market Insights

Business Problem: Mariana and Laura are planning to enter the US fashion market as an e-commerce business model. The initial idea is to enter the market with only one product and for a specific public, i,n this case the product would be jeans for the female public. However, the partners have no experience in this market. So the partners hired a Data Science consultancy to answer the following questions: what would be the medium sales ticket of the products? what are the types of jeans and their colors for the initial products? what are the product's compositions? The main competitors in the business are H&M and Macy's.

Solution: Follow the steps: ETL architecture design, web scraping, data cleansing, saving data to database, data analysis, delivery of answers and insights via pdf report.

Conclusion: The medium price of the competitor's products is 29.99 USD, and 75% of the products in the dataset are between 17.99 USD and 34.99 USD. Of the 24 styles present in the dataset, 8 represent 80% of the products in the dataset. About the fit, 80% of the products have fit loose, regular, or slim. Of the 46 colors available 7 represent 80% of the products in the dataset. Jeans that are not 100% cotton are most often also made of spandex and polyester. Almost 80% of products have some percentage of material that is considered environmentally responsible.

Repository: Scraping and Analysis of Fashion Trends and Pricing

Pinned Loading

  1. learning_to_rank_for_cross_sell learning_to_rank_for_cross_sell Public

    This is a classification/ranking ML project for my portfolio. The goal is to detect health insurance customers who are more likely to buy a new insurance from the company, so that the company can o…

    Jupyter Notebook 4

  2. drugstore_sales_prediction drugstore_sales_prediction Public

    This repository contains codes for a drugstore chain sales forecasts using machine learning.

    Jupyter Notebook

  3. predicting_customer_satisfaction predicting_customer_satisfaction Public

    Predicting customer satisfaction for purchases made in Brazilian e-commerce.

    Jupyter Notebook 1 1

  4. housing_market_analysis housing_market_analysis Public

    Analysis of a portfolio of residences for sale in King County, USA, to answer the questions: which properties should the company buy, how long wait before selling them, and what would the selling p…

    Jupyter Notebook 1

  5. scraping_and_analysis_of_fashion_products scraping_and_analysis_of_fashion_products Public

    Analysis of products for entry into the fashion market by scraping, to answer the main questions: what would be the medium sales ticket of the products, what are the types of jeans and their colors…

    Jupyter Notebook 1

  6. extracting_ecommerce_information_in_sql extracting_ecommerce_information_in_sql Public

    Practices with the SQL language in the Brazilian E-Commerce Dataset by Olist, to retrieve data to fulfill some requests.

    Jupyter Notebook