This is the implementation of our course project in CSCI-376 Natural Language Processing taught by Prof. Yik-Cheung Tam at NYU Shanghai. Yuchen Wang (yw3642@nyu.edu) and Yichen Huang (yh2689@nyu.edu), May 2021.
report(pdf)
presentation slides
- Raw Data: Raw data scrapped from memegenerator.net including 3000 images and about 300,000 captions.
- Cleaned Caption Data: Cleaned captions with non-English and noisy sentences removed.
- Data Scrapper: The
Python
scrapper script fetching our data from memegenerator.net. - Preprocessing: Data pre-processing pipeline, which mainly cleans the text data.
Both of these notebooks contain everything from data loading to evaluation and can be run independently. Note that we do not include the BERT splitter in the main pipeline.
- Baseline: The baseline pipeline using a standard encoder-decoder.
- Proposed: The proposed pipeline with naive / MMI / CLIP score decoding.
- Finetuning CLIP: The pipeline for finetuning CLIP on our dataset, including both training and evaluation.
- Finetuning BERT for Caption Splitting: The pipeline for finetuning BERT on our dataset for caption splitting, including both training and evaluation.
- Weights and dataloaders for the baseline encoder-decoder: The
PyTorch
weights of the baseline encoder-decoder. Comes with the validation and test loaders used in training and evaluation. - Weights and dataloaders for the proposed encoder-decoder: The
PyTorch
weights of the proposed encoder-decoder. Comes with the validation and test loaders used in training and evaluation. - Finetuned CLIP Weights for Image-Caption Matching: The
PyTorch
weights of the finetuned CLIP model, including abest_model.pt
which had lowest validation loss and alast_model.pt
which overfitted the training data. - Fintuned BERT Weights for Caption Splitting: The
PyTorch
weights of the finetuned BERT model for caption splitting, including abest_model.pt
which had lowest validation loss.