Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
In this workshop, you will learn the following skills:
- How to use skills from the NLTK workshop to build features for a classification task
- How to build a text classification system that can predict whether sentences belong to one category ("news") or another ("romance")
- How to model the topics in a corpus based on the distributions of words across the documents
- How to group data and perform calculations on the aggregations
- How to prepare data for machine learning using pandas, a package for Python that helps to organize your data
- How to use the scikit-learn package for Python to perform different types of machine learning on the data
- How to evaluate the results of machine learning algorithms
- How to visualize observations, aggregations, and algorithmic results
This workshop will review key concepts for understanding how machine learning works, and walk participants through the process of analyzing data using statistical and machine learning methods.
- Introduction
- Installation
- What Is Classification?
- Getting Our Data
- Features
- Visualization
- Supervised Machine Learning
- Supervised Classification Algorithm with sklearn!
- Unsupervised Machine Learning
- Feature Extraction Using Bag of Words
- Topic Modeling with Latent Dirchlet Allocation
- Review
- Resources
- Appendix: Visualize the Decision Boundary
- Appendix: Create a Word Cloud
Session leaders: Rachel Rakov and Hannah Aizenman
Based on previous work by: Rachel Rakov and Hannah Aizenman
Digital Research Institute (DRI) Curriculum by Graduate Center Digital Initiatives is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at https://github.com/DHRI-Curriculum. When sharing this material or derivative works, preserve this paragraph, changing only the title of the derivative work, or provide comparable attribution.