The goal here was to fine-tune a pre-trained medium-sized GPT-2 model on a dataset of jokes to assess its ability to generate humorous content.
-
Dataset Preparation: Utilized the Short Jokes dataset from Kaggle, preprocessing each joke with a special end-of-text marker to help the model generate multiple jokes within a single input sequence.
-
Hyperparameter Tuning: Explored various hyperparameter combinations, setting values for batch size, epochs, learning rate, warmup steps, and maximum sequence length.
-
Model Training: Trained the GPT-2 model using the specified hyperparameters, saving model weights after each epoch for performance comparison.
-
Joke Generation: Used the trained model to generate jokes, selecting the best-performing version based on training epochs.