binary-classification

ML model that classifies movie reviews as positive or negative.

About the Project

This repository is meant to be an introduction to Machine Learning. The IMDB dataset from Keras, which contains reviews from the Internet Movie Database, is used to train the model to classify the sentiment of a movie review as good or bad. A lot of the code used can be found in the book Deep Learning with Python by François Chollet. There were two main goals for this project: the first was to set up a workspace so that I can actually train and run a model (which was surprisingly difficult without a machine running on Linux), and the second to get familiar with a basic Machine Learning workflow solving a simple problem, and experimenting with changing parameters and how they affect the outcome of the training.

About the Model

This model is trained using 3 dense layers - two of which have 16 hidden units and run the relu operation and a single layer with one hidden unit running the sigmoid operation. To explain what this means in detail, we can further define what each italicized word means to remove ambiguity for anyone unfamiliar with these terms. A dense layer refers to a layer where the output of the previous layer is passed to the next. So with 3 dense layers, we have a dense network where each layer is fully connected. A hidden unit is the dimension given to the weight matrix for the layer. A higher number means giving the network more freedom in learning as it can create much more complex representations of the input data. relu is short for rectified linear unit, and is the operation that is ran on the data passed to the layer: output = relu(dot(W, input) + b). relu runs the dot operation on the weight tensor (W) and the input, adds the bias tensor to the dot product output, and returns either 0 or the result of the dot(W, input) + b (whichever is bigger). Lastly, the sigmoid operation squashes a value between 0 and 1. The output for the sigmoid function works well with binary classification because we want the output to be a scale between 0 and 1 in order to represent the probability of the input belonging to one of two classes (in our case, a positive or negative review).

After the input data has run it's course through the network, we use the rmsprop optimizer, and binary crossentropy loss function to adjust the weights for the network.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
core		core
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

binary-classification

About the Project

About the Model

About

Releases

Packages

Languages

NicBonetto/binary-classification

Folders and files

Latest commit

History

Repository files navigation

binary-classification

About the Project

About the Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages