Thank you for volunteering to teach this one-hour session on using the pandas
library to clean and visualize data. This teaching guide explains our setup and the material to cover.
The class is one hour long. The exercises live in this Jupyter notebook.
It would be a good idea to take a spin through the notebook prior to teaching the session.
Now that you’ve got a handle on Pandas, it’s time to jump into some advanced topics. You know how to import a dataset, but what happens when you load the data and nothing looks right? We’ll walk through cleaning up a dirty dataset with Pandas. Then we’ll jump into the fun part: visualizing the data you’ve analyzed.
This session is good for: People who can load and perform basic summary and grouping functions in Pandas.
Attendees should leave with a basic understanding of:
- Grouping data
- Normalizing dirty data
- Renaming columns
- Creating new columns from existing columns
- Merging dataframes
- Visualizing data with bar charts and matplotlib
- Visualizing data with maps and geopandas
- Anything related to virtual environments
- Things already covered in Python I and II
I Do, We Do, You Do. Demonstrate a concept, go through it together, then give them plenty of time to experiment on their own while you and your coach walk around and answer questions (see sections marked ✍️ Try it yourself
). The pace will be slower than you think, and that's OK! It's not the end of the world if you don't get through everything.
Most people who come to this class will have zero experience with programming, so be empathetic and try to remember how frustrating it is to feel lost.
We'll have the latest version of Python 3 and pipenv to manage the virtual environment and dependencies (jupyter
and pandas
), which will already have been installed and tested prior to your session.
Begin the class by (slowly!) walking everyone through the process of activating their virtual environments and launching Jupyter:
- Open Terminal (or
cmd
orcygwin
if you're on a PC) cd
into your class directorypipenv shell
jupyter notebook
It will take everyone a few minutes to get going. You'll also probably get some questions about what, exactly, you're doing at this step. Try to avoid a lengthy digression into virtual environments -- it's beyond the scope of this hourlong session, so maybe offer to talk to them after class, or send 'em our way: training@ire.org.
Once everyone is good to go, toggle back to the terminal and show them what's going on: A Jupyter server is running in the background, so don't close that terminal window!
Go over some notebook basics: Adding cells, writing code and running cells, etc. A common beginner gotcha: Writing code that other cells depend on but forgetting to first run it to make it available.
Start marching down the notebook: Importing pandas, loading data from file, sorting, filtering, grouping. Pause frequently to ask if anyone has questions.
Any time you see ✍️ Try it yourself
, hit the brakes and give everyone time to play around with whatever concept you're discussing.
If you can, find an opportunity when someone has gotten an error and take 5 minutes to walk through basic debugging strategy: Reading the traceback error from bottom to top, strategic Googling, etc.
Unlikely! But if you have extra time, oversee some unstructured lab time -- they can practice syntax or look up additional methods, etc.
- Have everyone close out of their notebook tabs
- In terminal,
Ctrl+C
to kill the server process - Close the terminal window
You'll need the latest version of Python 3 and pipenv installed on your computer. Here's our install guide.
- Clone or download/unzip this repo onto your computer
- In your command-line interface,
cd
into the folder pipenv install
pipenv shell
jupyter notebook