Skip to content

Latest commit

 

History

History
39 lines (21 loc) · 2.18 KB

01-introduction.md

File metadata and controls

39 lines (21 loc) · 2.18 KB

<<< Previous | Next >>>

Introduction

In this workshop, we are going to learn how to go through the process of doing machine learning on a set of data. To do so, we will download a corpus of text data to work with, extract features from this data, and do supervised machine learning to our data, using a mathmatical algorithm to train a classifier which will then classify previously unseen data into a set of predefined categories.

Machine learning is a research field that sits at the intersection of statistics, artificial intelligence, and computer science. It is also known as predictive analytics or statistical learning.1

Key terms

machine learning: an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed

corpus: a large collection of data. In our case, this will be text data (although a corpus can contain any type of data)

dataset: a collection of related information (such as a corpus)

  • variable: an attribute of the dataset (such as the type of text being analyzed)
  • observation: an entry in the dataset (a single text)
  • measurement: a single data point (e.g., one text's type)

features: properties that describe data attributes for machine learning—often the variables

feature representation, feature vector: a set of features

supervised machine learning: a machine learning task of learning a function that maps an input to an output based on example input-output pairs

unsupervised machine learning: a machine learning task used to draw inferences from datasets consisting of input data without labelled responses (lacks input-output pairs; only has input data)

algorithm: a process or set of rules to be followed in calculations (or other problem-solving operations), particularly by a computer

classification: a machine learning task used to predict a class label, which is a choice from a predefined list of possibilities


1 Andreas Mueller, Introduction to Machine Learning with Python.


<<< Previous | Next >>>