Skip to content

I tried to summarize first few Chapters of "Practical Statistics for Data Scientists" book with the help of Jupyter notebook files using various datasets obtained from Kaggle and other data sources.

Notifications You must be signed in to change notification settings

Krishnkumar542/Practical_Statistics_for_Data_Scientists

Repository files navigation

Practical Statistics for Data Scientists

Introduction

Statistics is the branch of science that deals with collecting, organizing, analyzing, interpreting, and presenting data. The field of statistics basically divided into two parts such as;

  1. Descriptive Statistics: It deals with collecting, analyzing, and summarizing the data.
  2. Inferential Statistics: It is a technique that makes the conclusions about the whole data (population) by observing a small amount of data (sample).

Below, I tried to summarize first few Chapters of this book with the help of Jupyter notebook files using various datasets obtained from Kaggle and other data sources.

Chapter 1. Exploratory Data Analysis

This chapter focuses on the first step in any data science project: exploring the data. Exploratory data analysis, or EDA, is a comparatively new area of statistics. In 1962, John W. Tukey called for a reformation of statistics in his seminal paper “The Future of Data Analysis” [Tukey-1962]. With the ready availability of computing power and expressive data analysis software, exploratory data analysis has evolved well beyond its original scope. Key drivers of this discipline have been the rapid development of new technology, access to more and bigger data, and the greater use of quantitative analysis in a variety of disciplines.

Chapter 2. Data and Sampling Distributions

The concepts we will discuss in this chapter is data and sampling distributions. Traditional statistics very much focused on using theory based on strong assumptions about the population. Modern statistics has moved to the sampling procedures, where such assumptions are not needed. In general, data scientists need not worry about the theoretical nature of population and instead should focus on the sampling procedures and the data at hand. There are some notable exceptions. Sometimes data is generated from a physical process that can be modeled. The simplest example is flipping a coin: this follows a binomial distribution. Any real-life binomial situation (buy or don’t buy, fraud or no fraud, click or don’t click) can be modeled effectively by a coin (with the modified probability of landing heads, of course). In these cases, we can gain additional insight by using our understanding of the population.

Chapter 3. Statistical Experiments Significance Testing

Design of experiments is a cornerstone of the practice of statistics, with applications in virtually all areas of research. The goal is to design an experiment in order to confirm or reject a hypothesis. Data scientists often need to conduct continual experiments, particularly regarding user interface and product marketing. This chapter reviews traditional experimental design and discusses some common challenges in data science. It also covers some oft-cited concepts in statistical inference and explains their meaning and relevance (or lack of relevance) to data science.

Sources Used For Coding

References

  1. Shashank Kalanithi: https://www.youtube.com/watch?v=wwsizzg6UjU&list=PL-u09-6gP5ZNd6AhULnQHr6ZsF15qy4D0
  2. Krish Naik: https://www.youtube.com/watch?v=y1y1ATTMpaw
  3. Derek Banas: https://youtu.be/tcusIOfI_GM
  4. Khan Academy: https://www.youtube.com/watch?v=uhxtUt_-GyM&list=PL1328115D3D8A2566
  5. Code Basics: https://www.youtube.com/watch?v=8ZI55Inh1_A&list=PLeo1K3hjS3uuKaU2nBDwr6zrSOTzNCs0l
  6. Code with Harry: https://www.youtube.com/watch?v=gfDE2a7MKjA

About

I tried to summarize first few Chapters of "Practical Statistics for Data Scientists" book with the help of Jupyter notebook files using various datasets obtained from Kaggle and other data sources.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published