Welcome to the course Introduction to Data Science in Python by Appsilon!
This course aims to introduce people that know how to code in Python into the Data Science world. In particular I show tricks and tips useful for STEM/economic students. One of secondary goals is to show students how use free tools that are industry standards at the same time instead of Matlab/Statistica/SAS and so on.
- The course starts with introducing what does Data Scientist do in his work and why this job is so important in XXI century. Then we start the technical part of the course.
numpy
- numbers and vectors, fundamentals of all calculations in Pythonpandas
- data frames - SQL-like, in-memory data, fundamentals of data processing in Pythonmatplotlib
andplotly
- plots, basics of data visualizationscikit-learn
- introduction to machine learning, examples from the go-to library in Pythonstreamlit
,quarto
,fastapi
- simple, useful and creative ways to share your work in Python and to generate beautiful reports
Apart from those libraries I present and benchmark the polars
library - a high-performant replacement for pandas
if you work datasets of sizes 0.5GB - 5GB and pandas starts to be too slow.
All course materials are located either here or on google drive. Code and small datasets are in repo, while large size datasets are located on google drive.
I suggest using html
files, generated from qmd
and ipynb
with quarto
.
Guide to setup an environment included in the introduction presentation.
tl;dr You can try
conda create -n ds-course python=3.10
conda activate ds-course
pip install -r requirements.txt
Each lecture has also some homework assignment. For every homework, there's provided solution in a separate directory. Note that solutions are not necessarily the best possible, but may present some interesting approach. Very often there are multiple ways you can approach the same problem.
The course has been prepared by Piotr Pasza Storożenko from Appsilon. It is available under CC BY 4.0 license. Feel free to use these materials for your use, you just have to attribute the original author.
Some exercise have been inspired by the exercises author had to solve while studying.