Skip to content

subhanjandas/subhanjandas

Repository files navigation

Social banner for Subhanjan

A Full-Stack Data Professional, experienced: 🛠️ Data Engineer | 👨🏻‍💻 Developer | 🕵🏻 Data Analyst | 🧬 Data Scientist & 🤖 AI+ML

Being a creative tech enthusiast, I love working + learning new softwares, tools, technologies & platforms: ChatGPT ML Ops

👨  My Background

I am a MS CS student with over four years of professional experience in eCommerce and Internet Services Industry.

I started in 2020 with Python, making simple data exploration projects and expanding my knowledge over time. Around mid-to-end 2021, I started to learn Machine Learning and Deep Learning concepts with Python libraries like SciKitLearn, Keras, TensorFlow to create predictive models. During this time I also started with my Analytics post graduate program and learned Big Data tools like Apache hadoop with Hive and Pig for web scraping and Business Intelligence tools like Tableau, Power BI and IBM Cognos. I am currently working at Tucows as a Customer Intelligence Researcher, building a strong foundation in data analytics and reporting.

Over the last year, my knowledge and experience with Business Intelligence tools have expanded, as has my interest. I am proficient in using Tableau and Power BI with Python and SQL environment, as well as Google Cloud Platform. I also have a solid understanding of Mathematics and Statistics, and am able to work with large and complex datasets. My goal with data analytics, visualization and Reporting is to help others. I enjoy being able to create something that stakeholders can use to make their decisions easier and data driven.

✨  My Portfolio
  • Data Visualization and Dashboarding: Tableau Power BI Google Analytics Looker Alteryx

    • E-Commerce Sales Analysis | Minimal Overview Dashboard -
      Built a dashboard using Tableau that analyzes credit card complaints data. The dashboard allows for a comprehensive analysis of the data through the use of custom calculations and parameters. This enables users to identify patterns and trends in the data, and make data-driven decisions. The visualizations in the dashboard are interactive and visually appealing, making it easy to understand and interpret the data. The purpose of the project is to improve customer satisfaction and reduce complaints by gaining a better understanding of the complaints data.
    • Modern Retail Sales Dashboard | Aesthetic Light and Dark Themes -
      This Tableau dashboard presents a modern and aesthetic analysis of retail sales, with light and dark themes for user preference. Key performance indicators (KPIs) are displayed with current and previous year sparklines and min-max indicators, and users can customize the dashboard with global filters. An interactive text summary of sales by region allows for a quick and easy view of performance by location.
    • A 100 Years of Earthquakes - Analysis of a century of Earthquakes | Story Book using Tableau -
      This Tableau dashboard provides a comprehensive analysis of 100 years of earthquakes, presenting a visual representation of the data by year and magnitude, as well as a distribution of the earthquakes by class and magnitude. The dashboard also features an interactive earthquake map with filters for magnitude, damages, injuries, number of houses destroyed, number of missing, and number of deaths, allowing users to gain deeper insights into the impact of earthquakes over the past century.
    • Bank and Credit Card Complaints Analysis using Tableau -
      Built a dashboard using Tableau that analyzes credit card complaints data. The dashboard allows for a comprehensive analysis of the data through the use of custom calculations and parameters. This enables users to identify patterns and trends in the data, and make data-driven decisions. The visualizations in the dashboard are interactive and visually appealing, making it easy to understand and interpret the data. The purpose of the project is to improve customer satisfaction and reduce complaints by gaining a better understanding of the complaints data.
    • Employee Attrition - What makes employees quit? | Futuristic Tableau and Power BI Dashboards -
      This is an in-depth project that utilizes Tableau, Power BI, Python, Pig Latin, and Hadoop to gain a deeper understanding of IBM's workforce. The project meticulously investigates the Key Risk Indicators (KRIs) that influence employee attrition by leveraging the power of big data analysis. The project's results, in the form of recommendations, aim to aid IBM in enhancing employee retention and minimizing turnover rates. The project exemplifies the capability of advanced big data tools and visualization techniques to unveil actionable insights from large datasets.
  • Predictive Analytics and Machine Learning: Python TensorFlow PyTorch Pandas SAS SKLearn Keras R

    • Artificial Neural Networks for Fraud Detection in Supply Chain Analytics: A Study on MLPClassifier and Keras -
      This study was aimed to detect fraudulent activities in the supply chain through the use of neural networks. The study focused on building two machine learning models using the MLPClassifier algorithm from the scikit-learn library and a custom neural network using the Keras library in Python. Both models were trained and tested on the DataCo Supply Chain dataset. The results showed that the custom neural network achieved an accuracy of 97.67% in detecting fraudulent transactions, demonstrating its potential to minimize financial losses for organizations.
    • US Flight Delays Prediction Models based on Naïve Bayes, Regression Tree, and Logistic Regression Algorithms -
      This project uses Python and Scikit-learn library to predict flight delays in the United States using three machine learning algorithms (Naive Bayes, Regression Tree, and Logistic Regression). The data collected, preprocessed and divided into training and test sets to train and evaluate the prediction models. The Logistic Regression algorithm achieved the highest accuracy of 85.14% in predicting flight delays. The project serves as a valuable tool for airlines and airport management to improve flight schedules and reduce the number of flight delays for passengers.
    • Predicting Housing Prices Using Multiple Linear Regression and k-NearestNeighbours (kNN) -
      The objective of this project was to predict housing prices using two modeling techniques, multiple linear regression and k-Nearest Neighbours (kNN). The project aimed to construct accurate models to estimate real estate values by identifying relevant factors and their impact on the property's price. The multiple linear regression model was deemed to be the most suitable for prediction, with low Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). The kNN model with 10 nearest neighbors also performed well, with a low RMSE.
    • Supermarket Organic Product Purchase Prediction - Data Mining and Modeling with SAS -
      This project aimed to predict customer purchasing behavior for a supermarket's new line of organic products. Using data mining techniques, the customer loyalty program data was analyzed to identify factors affecting organic product purchases. The data was modeled using SAS Enterprise Miner to create accurate predictive models. The results of this study could assist the supermarket in understanding their customer base and effectively target marketing efforts.
  • DataBase Scripting, Querying and Analysis: SQL SQLite MariaDB Cassandra Neo4j NoSQL PostgreSQL

    • RDBMS to GraphDB - Big Data Analytics using Neo4j -
      This project involves migration from a traditional RDBMS to Neo4j for big data analytics. Using graph database technology, various business-critical questions are addressed, including identifying the employees who sold Tofu, the products sold with Tofu, the total number of products, top 5 products by sales, and the category with the highest sales. Neo4j's efficiency and effectiveness in managing big data provides valuable insights for decision making.
    • Data Analysis for Digital Music Store using SQL -
      This project is a data analysis of Chinook Digital Music Store using SQL queries and PostgreSQL database. The project aimed to identify and optimize business opportunities by analyzing customer and sales data, answering questions such as top-selling genres, top-selling artists, total value of sales by country. Data visualization techniques were used to present the results in an easy-to-understand format.
  • Big Data Analytics and Cloud: Azure AWS Docker Hadoop GCP

    • Worldwide Sales Data Analysis and Exploration using Zeppelin, HDFS and Spark -
      This project aimed to analyze and understand worldwide sales data through the use of Zeppelin and HDFS. The primary objective was to utilize Spark's basic Scala commands and SQL to query and manipulate the data, providing valuable insights and findings for the customer.
    • User, Occupation and Movies, Ratings Data Exploration using Apache Hive -
      In this project, the objective was to analyze the "User, Occupation, Movies, and Ratings" dataset using Apache Hive. The data was processed and analyzed using Hive's SQL-like query language and MapReduce framework, making it easier to handle large datasets. The focus of the analysis was to provide a comprehensive breakdown of the data and uncover key insights into user preferences and trends.
  • Advanced Excel, IBM SPSS Modler, IBM Cognos Analytics and Others: Excel SPSS Cognos

    • MoneyBall: Sports Predictive Analytics | Advance Excel and Data Analysis Toolpak -
      This project used advanced Excel tools such as Solver and Data Analysis ToolPak to optimize a baseball team's lineup and maximize the expected return to risk ratio while adhering to a set salary budget. Data on over 500 players was collected, cleaned and analyzed to identify the best players and positions. Data visualization techniques were used to present the results in an easy-to-understand format. The project provided valuable insights into building a winning team within a budget constraint
    • IBM SPSS - A Comprehensive Guide to Data Analysis and Data Modeling -
      IBM SPSS Modeler is a comprehensive data analysis and modeling tool. This repository is a compilation of exercises outlined in the "Introduction to IBM SPSS Modeler" document by IBM. It covers the essential steps of data import, preparation, visualization, and model building. The repository includes building decision trees and linear regression models, demonstrating the tool's modeling capabilities.
    • Telecomm Customer Churn - Data Modeling and Finding Main Drivers with IBM Cognos Analytics -
      In this project, IBM Cognos Analytics was used to analyze Telecomm customer churn data to determine the main drivers affecting customer churn. By answering questions such as what were the top three key drivers affecting churn, insights were gained on customer tenure with fiber optic, payment method, and internet service type. The results showed that customers with a tenure less than three months and fiber optic service, paying with electronic check, had the highest churn rate.

⏩   and many more

🛠️  My Stack  
  • 🛢 Databases || Db2, Redis, Dynamo, MongoDB, Postgres, Cassandra

  • 🧑🏻‍💻 Programming || Python, SQL, HiveQL, SAS, Scala, Shell/UNIX, R, C

  • 📶 BI Tools|| Tableau, Power BI, Looker, Cognos, Alteryx, SAS BI, GA4

  • 🔢 Big Data || Spark, Hadoop, Hive, Sqoop, HBase, Kafka, Impala, Hue

  • 💭 Azure Stack || ADLS, Databricks, Visual Studio, Synapse, ADF, AKS

  • 💭 AWS Stack || Glue, EC2, S3, Athena, Redshift, Lambda, IAM, RDS

  • 💭 GCP Stack || BigQuery, Looker, Pub/Sub, Cloud Storage, Dataproc

  • 🔗 DevOps || Docker, Kubernetes, Jenkins, Git, Azure, YAML, JSON

  • 🤖 AI/ML || Sklearn, Pytorch, TF, Keras, AzureML, SageMaker, AutoML

  • 🎯 SDLC || SAFe® Agile, Kanban, Jira, Confluence, Scrum, Waterfall

  • 📝 Code Management || Github, BitBucket, GitLab, AWS CodeCommit

  • 🧮 Mainframe || COBOL, JCL, VSAM, DB2, TSO/ISPF, TSYS TS2®, zOS

🔏  My Certifications 
🔬  My Publications 

⏩   and many more

Tableau Power BI python mysql java Hadoop Hive Scala java sqlite PyTorch TensorFlow IBM Cloud

👨‍💻 All of my projects are available at Github, Tableau Public, Kaggle


📄 To know about my experiences have a look at my resume


🔗  Connect with me

subhanjansd subhanjan-das subhanjan33

Handy : Tableau Python SQL DataBricks C++ Apache Spark Hadoop Hive Azure Kafka DynamoDB DataBricks Kotlin Flask

Releases

No releases published

Packages

No packages published