Supply Chain Management (SCM) involves optimizing the flow of goods, from procurement to final delivery, to streamline operations, improve customer satisfaction, and reduce costs. Disruptions, particularly in last-mile delivery due to weather conditions, can lead to late or failed deliveries, negatively impacting revenue and customer trust. This project addresses these challenges by integrating predictive modeling and geospatial data to improve delivery success rates and enhance inventory management.
- Total Rows: 9,900
- Total Columns: 90 (filtered to 20 relevant columns)
- Shipment Details:
- Days for shipment (scheduled), Late_delivery_risk, Order Item Quantity
- Categorical Variables:
- Category Name, Customer City, Department Name
- weather_preciptype, weather_conditions, weather_description
- Weather Metrics:
- weather_tempmax, weather_tempmin, weather_temp
- weather_precip, weather_snow, weather_snowdepth
- weather_windspeed, weather_cloudcover, weather_visibility, weather_severerisk
- Date Information:
- shipping_date_day_month
- Deliveries are typically scheduled within 2–4 days.
- Late_delivery_risk is balanced at 53%.
- Weather metrics exhibit varied distributions, with notable outliers in precipitation and snow depth.
- Numerical Summaries: Mean, standard deviation, and percentiles calculated for key features.
- Visualizations: Histograms, boxplots, and heatmaps for univariate and multivariate analyses.
- Handling Missing Values: Imputed missing data for
weather_preciptype
with "Unknown."
- Encoding: Used label and binary encoding for categorical variables to reduce bias and high-dimensionality issues.
- Datetime Splitting: Extracted day and month from
shipping_date_day_month
. - Statistical Tests:
- T-tests for numerical features to assess class differences.
- Chi-square tests for categorical features to evaluate independence.
- Class Balance: Verified balance between failed (53%) and successful (47%) deliveries.
- Standardization: Applied Z-score standardization to numerical features for consistent scaling.
- Logistic Regression:
- Balanced recall (0.59) and precision (0.78) but missed 41% of late deliveries.
- Cross-validation accuracy: ~70% with a stable performance.
- Naïve Bayes Classifier:
- Higher recall for late deliveries (0.87) but lower precision (0.57), leading to misclassification of successful deliveries.
- Cross-validation accuracy: Gaussian (67%), Multinomial (52%).
- Accuracy, Precision, Recall, F1-Score
- Cross-validation accuracy with confidence intervals
- Learning curves for bias-variance tradeoff assessment
-
Confidence Interval (CI):
- CI (2.89–2.94 days) quantifies the precision of the average shipment days estimate.
- Formula: [ CI = \bar{x} \pm t \times \frac{s}{\sqrt{n}} ]
-
Bootstrapping:
- Resampling technique to estimate mean and confidence interval without assuming normality.
- Bootstrap CI (2.89–2.94 days) aligns with traditional CI, validating robustness.
- Shipment Trends:
- Deliveries are consistent, with most scheduled within 2–4 days.
- Late deliveries account for 53%.
- Weather Influence:
- Severe weather increases delivery risks.
- Geographic Impact:
- Logistical challenges vary by city.
- Model Comparisons:
- Logistic Regression outperforms Naïve Bayes in balancing recall and precision.
We leveraged predictive modeling to identify delivery failures and reduce operational costs. Logistic Regression demonstrated better overall performance, but Naïve Bayes highlighted areas for improvement, such as capturing late deliveries. Geospatial data integration and weather analytics offer significant potential for enhancing supply chain strategies. Expanding the model with advanced techniques can further optimize delivery success rates.
Video Recording of the Project - Optimizing Supply Chain Efficiency with Geospatial Data
- Correlation and Covariance
- Logistic Regression and Naïve Bayes Classification
- Bootstrapping and Confidence Intervals (CI)
- Data encoding and preprocessing techniques
- Statistical hypothesis testing
- Supply Chain Dataset: DataCo SMART SUPPLY CHAIN
- Weather Dataset: Visual Crossing
- Chi-squared test - Wikipedia
- NOAA: Weather and Climate Impacts
- The Impact of Weather on Supply Chain
- Supply Chain - Wikipedia