The project consists of generating scripts of a TV series using Recurrent Neural Networks (RNNs) and the Seinfeld dataset of scripts from nine seasons. The model generates fake dialogues such as the one below based on the learned patterns from the training data.
george: what are you doing here?
jerry: what?
george: well, you know...
jerry:(to the phone) yeah, yeah, i got to get out of here!
jerry:(still talking) what?
jerry: i thought he was in the shower.(george laughs)
To generate the TV scripts, the project encompasses the following steps:
- Data preprocessing
- Create a look-up table to transform the vocabulary words to ids
- Tokenize punctuation so that symbols are treated separately from words (e.g., bye! is composed by two ids, one for bye and another for the exclamation mark)
- Prepare feature and target tensors by batching words into data chunks of a given size
- Build the neural network
- Implement the RNN using PyTorch's Module class using either a GRU or LSTM
- Set hyperparameters
- sequence_length
- batch_size
- num_epochs
- learning_rate for an Adam optimizer
- vocab_size: the number of unique tokens in the vocabulary
- output_size: the desired size of the output
- embedding_dim: the embedding dimension
- hidden_dim: hidden dimension of the RNN
- n_layers: the number of layers/cells in the RNN
- Train the model to achieve a training loss less than 3.5
The project implementation complies with Udacity's list of rubric points required to pass the project. The whole implementation can be found in either the dlnd_tv_script_generation.ipynb Jupyter notebook or dlnd_tv_script_generation.html file.
The model architecture is composed of the following layers and its implementation can be found in the RNN class.
Layer | Description |
---|---|
Input | Feature tensor of batch_size x sequence_length (200x10) |
Embedding | Embedding of size vocab_size x embedding_dim (21,388x300) |
LSTM | Two-layer long short-term memory of size embedding_dim x hidden_dim (300x1024) and dropout of 0.5 |
Fully connected | Fully connected linear layer of size hidden_dim x output_size (1024x21,388) and dropout of 0.5 |
This project contains my implementation of the "Generate TV scripts" project for the Udacity's Deep Learning program. The baseline code has been taken from Udacity's Deep Learning repository.
Although the model generates dialogues between multiple characters that say complete sentences, questions and answers, it is not perfect. Particularly, the model lacks a level of coherence in some instances. This is expected due to the time invested to train the model and size of the dataset.