This repository contains the material for a half-day workshop on data science with Python.
In this README document you will find:
- Workshop Overview
- Intended Audience
- Installation and Setup Instructions
- Credits
This workshop focuses on Natural Language Processing (NLP)- ie. how can we make a computer "understand" human text / speech, and how can we get a computer to write (generate) text / speech.
There will be four different sections exploring different visualizations, techniques and tools:
- Word clouds to see visualize what are the most frequently used words / what everyone is saying
- Word clusters to understand what words "go with" other words
- Sentiment Analysis to understand the polarity of phrases (ie. if it is a positive of negative sentiment)
- Generating your own Shakespeare Play
The main notebooks for this workshop is for people who have some basic knowledge of Python and would like to learn more about, or start a career in, data science.
However, if you have not worked with Python or Jupyter notebooks before, we also have two notebooks covering that.
In order to setup the environment and run the workshop material, you'll need to:
- Install Python and relevant libraries on your machine. We recommend downloading and installing Anaconda.
- Download the workshop material on your machine. If you are familiar with Git, you can clone this repo. Alternatively, download a zip file with all the necessary content here.
Please try to have everything installed before you come to the workshop.
The first version of this workshop was born in 2016 at a PyData London conference. Some of the content here was first created in 2019 as a joint effort between the PyData London and PyLadies London user groups, with volunteer contributions from the organisers. A big thank you to the people who have since contributed in some shape or form:
- Marco Bonzanini
- Conrad Ho
- Renee Ho
- kan2k