Skip to content

Hackathon solution for HackerEarth Data Science competition "Sigmathon 1.0" 2020. This codebase is built using the DataCo Supply Chain dataset.

Notifications You must be signed in to change notification settings

SUKANTHEN/Sigmathon-1.0

Repository files navigation

Sigmathon-1.0 - Team PyFlow++

E-Commerce Goods Shipment Duration prediction and Estimating Late Delivery Risk

E-commerce goods Late delivery risk identification by predicting fastest and normal Shipping Durations. The Decision Tree model built by us, will enable E-commerce and Goods/products delivery industries to identify the 'risk of late delivery' and predict the fastest and normal duration of goods shipment for their Inland and foreign clients/ customers/buyers.

GOAL

i) Building a Multi-Output Decision Tree Regressor to determine the maximum range of shipping time, by predicting the Fastest and Normal duration for shipping of goods for both Inland and International customers.
ii) Building a Binary Classifier to classify orders with high probabilty of late delivery (Late Delivery Risk analyser).

DATA SOURCE


This is the dataset of Supply Chains used by the company DataaCo Global which includes a collection of their products sold, finacial details(profit, loss, total sales etc.), Shipping details, and customer details such as sales, demographics, and transaction details. The data spans to 91 MB engulfing details of 180,520 customers spanning to 53 columns related to Clothing , Sports,and Electronic Supplies.

BUILD

Kaggle Notebooks

  • Model trained and Run using: RAM (13 GB) and NVIDIA Tesla 300 GPU

SUBMISSIONS AND FILES

  • Exploratory Data Analytics[EDA]-Sigmathon1.0.ipynb
    Consists of Basic EDA of the DataCo dataset such as finding missing values, feature analysis, duplicate values identification, Statistical analysis and other Data Quality checks.
  • Final_E-commerce Model.ipynb
    Contains python code for Feature Selection, Feature Engineering, Model Building, Hyper-parameter tuning, Predictive analytics, Statiscal modelling using regressor output. Decison Tree Regressor for 'Multi-Ouput' Regression model and Statistical models for Late Delivery risk Binary classfication.
  • BI_E-commerce_Visualization.pbix
    Microsoft PowerBI (Business Intelligence Tool) for Visulaization and insights derivation are used and this file consists of numerous 'Dashboards' and visualizations.
  • hackathon ppt.pptx
    Power Point Presentation of Insights and Graphs/Visuals for Financial, Demograhical, Sales, Market, and Profit Insigths and analysis.
  • requirements.txt
    List of libraries used and their versions.

About

Hackathon solution for HackerEarth Data Science competition "Sigmathon 1.0" 2020. This codebase is built using the DataCo Supply Chain dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published