This repository is intended to centralize the collaborative development of educational materials for the 2019 Data Science Bootcamp by both instructors and teaching assistants.
Monday | Tuesday | Wednesday | Thursday | |
---|---|---|---|---|
9:30 - 9:40 Welcome | ||||
9:40 - 10:30 Lecture | Ryan Wade - Python & GitHub | Vetria Byrd - Python for Data Science | Edmond Chow - Regression, Discriminants, etc. | Xiaoming Huo - Clustering & Classification |
10:30 - 10:50 Break | ||||
10:50 - 11:40 Lecture | Ryan Wade - Python & Github | Vetria Byrd - Python for Data Science | Edmond Chow - Regression, Discriminants, etc. | Xiaoming Huo - Clustering & Classification |
11:40 - 1:15 Lunch | ||||
1:15 - 2:30 Lab | Dominic Sirianni - GitHub Planets | Benjamin Comer - Library Basics | Ray Lei - SciKitLearn Basics | Ray Lei - Clustering |
2:30 - 2:45 Break | ||||
2:45 - 4:00 Lab | Benjamin Comer - Project Euler | Dominic Sirianni - Ecology Data Carpentry | Derek Metcalf - TensorFlow Basics | Derek Metcalf - Classification |
Friday | |
---|---|
9:30 - 9:40 Welcome | |
9:40 - 10:20 Lecture | Vetria Byrd - The ubiquitous nature of data visualization |
10:20 - 11:00 Lecture | David Sherrill - Machine learning for predicting drug binding |
11:00 - 11:20 Break | |
11:20 - 11:50 Lecture | Chris DePree - The NASA exoplanet dataset |
11:50 Adjourn |
Data science is revolutionizing how scientists and engineers go about their work, but most students have not had much exposure to it. This one-week bootcamp provides an opportunity to get introduced to data management and visualization, data modeling, deep learning, and scientific programming in Python. The bootcamp will consist of morning lectures, followed by hands-on sessions in the afternoon to try out and practice concepts and software tools.
The bootcamp is aimed at undergraduate and graduate students in science and engineering who have an introductory-level familiarity with any computer programming language, or MATLAB, or RStudio, etc. The bootcamp is free of charge, but enrollment is capped so students must apply by May 15, 2019. Students from Agnes Scott, Morehouse, Spelman, and Georgia Tech are particularly encouraged to apply.
- Topics:
- Computer programming in Python for data science, clustering, numerical linear algebra, classification, regression, deep learning, and domain applications.
- Tools:
- Python, Jupyter notebooks, GitHub, NumPy, Pandas, Matplotlib, scikit-learn, and TensorFlow libraries
- Skills:
- Python programming, version control, social coding, data handling and visualization, data analysis, data modeling and prediction, and scientific and engineering applications
- Instructors:
- Ryan Wade (Blue Horseshoe Solutions), Vetria Byrd (Purdue University), Edmond Chow (Georgia Tech), Xiaoming Huo (Georgia Tech), Eva Dyer (Georgia Tech), Chris DePree (Agnes Scott), and David Sherrill (Georgia Tech)
- Location: Georgia Tech Campus (Visitor parking available in the W23 Parking
Lot, located at 911 State St. NW.)
- Monday: Engineered Biosystems Building (EBB), Children's Healthcare Seminar Room (first floor by food kiosk), 950 Atlantic Dr., Atlanta GA 30332
- Tuesday–Friday: Molecular Science and Engineering Building (MoSE), Room G011 (ground floor behind elevators), 901 Atlantic Dr., Atlanta, GA 30332
This bootcamp is sponsored by a National Science Foundation TRIPODS+X: EDU grant to the Data-Driven Alliance (Agnes Scott, Georgia Tech, Morehouse, and Spelman) and the Institute for Data Engineering and Science (IDEaS) at Georgia Tech.
Once you have generated your data science virtual machine (DSVM), follow these steps to launch it:
Once the DSVM is done deploying:
- click "go to resource."
- From there click "connect" on the top right of the resource window
- click "Download RDP File."
- Open this file and enter the username and password you made when you created the virtual machine.
Once the DSVM had deployed:
- Launch the Microsoft remote desktop client (RDC).
- If this is the first time connecting to the DSVM, add a new connection by clicking
on the "New" button (big plus "+", upper left corner of window). Then, fill in the
following fields in the pop-up window:
- Connection name: (doesn't matter what you call it)
- "PC Name": input the public IP address of the DSVM (available under the "Overview" section of the DSVM on the Azure dashboard)
- "User name"/"Password": Username & passord for Azure account, set when the DSVM was created
- After filling in the above fields, close the external window. The new connection should appear under "My Desktops" in the Microsoft remote desktop client.
- Launch your DSVM by double-clicking the item in the list
- First, try deleting the virtual machine you made and start over
- ensure that when you're logging into the virtual machine, you're using the username and password you made when you were prompted to create when you made your virtual machine
- If these fail you'll need to work locally, the dirctions below will allow you to do this.
- go to this address: https://www.anaconda.com/distribution/
- choose a distribution appropriate for your operating system
- download the file and follow the installation instructions
- If you are on windows, this will give you the anaconda prompt, and you can follow along
- If you are on mac, you can use your terminal. To access this, open
finder
and type in "terminal" - this will allow you to follow along on your local machine, but you may need to install some packages