Skip to content

The 2021 edition of Introduction to Data Science at Dalhousie University's Faculty of Management.

License

Notifications You must be signed in to change notification settings

cdconrad/ds-2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ds-2021

Copyright 2021 Colin Conrad

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Introduction to Data Science: 2021 Edition

This is a public repository of my teaching materials from the 2021 iteration of INFO 6270: Introduction to Data Science at Dalhousie University. This course is designed to bring students with no programming experience up to the point of having basic data science capabilities using Python technologies. In this repository you will find supporting lab materials and exercises for 12 labs (plus some bonus labs), all of which are in a Jupyter Notebook format, except for the R labs which are in R markdown. I have not provided the answers to the exercises (for various reasons) but these are avaliable upon request. It is also important to note that though these exercises are provided under the MIT License, the various supporting datasets are not and will need to be downloaded from the respective websites. The references to the datasets are given at the bottom of the lab documents.

Credit must also be given to Al Sweigart for creating supporting material for the early labs. You can purchase his book at https://automatetheboringstuff.com/

I recommend cloning this folder as a whole and opening it in Jupyter using Anaconda. You can learn how to download and configure Anaconda here: https://www.anaconda.com/products/individual

Lessons:

  • Lab 1 - Hello Python world!
  • Lab 2 - A function for validating Elections Nova Scotia's records
  • Lab 3 - Basic data cleaning of Halifax's housing data
  • Lab 4 - Accessing data sources
  • Lab 5 - Making big(ger) Airbnb data eas(ier) with data frames
  • Lab 6 - Visually analyze iPhone app downloads
  • Lab 7 - Do people have different attitudes towards dating during Covid?
  • Lab 8 - Discover associations between e-commerce purchases
  • Lab 9 - Detect heart disease in anonmymous patients
  • Lab 10 - Create and manage a digital bookstore collection
  • Lab 11 - Identify Halifax's Twitter influencers
  • Lab 12 - Getting started with R
  • Bonus lab - Second steps with R

About my teaching approach

I jokingly refer to this course as "scrappy data science for managers". It's "scrappy" in the sense that I take an as needed approach to teaching, rather than a theoretical approach. We do not cover theoretical math or the like, and only employ statistics and machine learning to solve tangible problems. In fact, each week is structured around a tangible problem to help keep the otherwise abstract concepts tangible and relevant to people in managerial professions.

These labs start out easy and then ramp up quickly. You can think of labs 1-4 as being a sort of "introduction to Python for managers" module. Labs 5-9 cover substantial topics in data science with Pandas. Labs 10-12 cover additional subjects: SQL, social media APIs and the R programming language.

About

The 2021 edition of Introduction to Data Science at Dalhousie University's Faculty of Management.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published