Skip to content

vladimir-skvortsov/data-science-portfolio

Repository files navigation

Data Science Portfolio

Real Estate Price Prediction

This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015.

It's a great dataset for evaluating simple regression models.

Tags: Python, Scikit Learn, CatBoost, Regression, Decision Tree, Clustering
Dataset: https://www.kaggle.com/datasets/harlfoxem/housesalesprediction

Customer Market Segmentation

This case requires to develop a customer segmentation to define marketing strategy. The sample Dataset summarizes the usage behavior of about 9000 active credit card holders during the last 6 months. The file is at a customer level with 18 behavioral variables.

Tags: Python, Scikit Learn, Clustering
Dataset: https://www.kaggle.com/datasets/arjunbhasin2013/ccdata

Titanic

The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Tags: Python, Keras, Scikit Learn, CatBoost, Binary Classification
Dataset: https://www.kaggle.com/competitions/titanic/overview

Job Placement

Due to the growing need of educated and talented individuals, especially in developing countries, recruiting fresh graduates is a routine practice for organizations. Conventional recruiting methods and selection processes can be prone to errors and in order to optimize the whole process, some innovative methods are needed.

Tags: Python, Keras, Binary Classification
Dataset: https://www.kaggle.com/datasets/ahsan81/job-placement-dataset

MNIST Dataset: Digit Recognizer

MNIST ("Modified National Institute of Standards and Technology") is the de facto “Hello World” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

Tags: Python, Keras, PyGame, Classification, CNN
Dataset: https://www.kaggle.com/code/ngbolin/mnist-dataset-digit-recognizer

Rossmann Store Sales

Rossmann operates over 3,000 drug stores in 7 European countries. Currently, Rossmann store managers are tasked with predicting their daily sales for up to six weeks in advance. Store sales are influenced by many factors, including promotions, competition, school and state holidays, seasonality, and locality. With thousands of individual managers predicting sales based on their unique circumstances, the accuracy of results can be quite varied.

GigaChain Wikipedia Q&A

Answers to questions on Wikipedia articles.

Tags: Python, LangChain, GigaChain, GigaChat

GigaChain Wikipedia RNG Q&A

Questions formulated from Wikipedia articles.

Tags: Python, LangChain, GigaChain, GigaChat

LangChain RNG Q&A Probability Theory Telegram Bot

The Telegram Bot project combines LangChain and ChatGPT to create probability theory questions, evaluate user answers, and deliver feedback. A distinctive feature includes presenting accurate solutions through LaTeX images, thereby improving comprehension and involvement, presenting an interactive learning method for probability theory.

Conversation

A standard conversational bot equipped with a memory token restriction.

Tags: Python, LangChain, GigaChain, GigaChat