Meme Caption Generation

This is the implementation of our course project in CSCI-376 Natural Language Processing taught by Prof. Yik-Cheung Tam at NYU Shanghai. Yuchen Wang (yw3642@nyu.edu) and Yichen Huang (yh2689@nyu.edu), May 2021.

Paper Report

report(pdf)
presentation slides

Pipeline Architecture

Example Output

Data

Raw Data: Raw data scrapped from memegenerator.net including 3000 images and about 300,000 captions.
Cleaned Caption Data: Cleaned captions with non-English and noisy sentences removed.

Code

Data and Preprocessing

Data Scrapper: The Python scrapper script fetching our data from memegenerator.net.
Preprocessing: Data pre-processing pipeline, which mainly cleans the text data.

Main Pipeline

Both of these notebooks contain everything from data loading to evaluation and can be run independently. Note that we do not include the BERT splitter in the main pipeline.

Baseline: The baseline pipeline using a standard encoder-decoder.
Proposed: The proposed pipeline with naive / MMI / CLIP score decoding.

Pipeline Components

Finetuning CLIP: The pipeline for finetuning CLIP on our dataset, including both training and evaluation.
Finetuning BERT for Caption Splitting: The pipeline for finetuning BERT on our dataset for caption splitting, including both training and evaluation.

Model Weights

Weights and dataloaders for the baseline encoder-decoder: The PyTorch weights of the baseline encoder-decoder. Comes with the validation and test loaders used in training and evaluation.
Weights and dataloaders for the proposed encoder-decoder: The PyTorch weights of the proposed encoder-decoder. Comes with the validation and test loaders used in training and evaluation.
Finetuned CLIP Weights for Image-Caption Matching: The PyTorch weights of the finetuned CLIP model, including a best_model.pt which had lowest validation loss and a last_model.pt which overfitted the training data.
Fintuned BERT Weights for Caption Splitting: The PyTorch weights of the finetuned BERT model for caption splitting, including a best_model.pt which had lowest validation loss.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
README.md		README.md
model_diagram.png		model_diagram.png
nlp_final_report.pdf		nlp_final_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Meme Caption Generation

Paper Report

Pipeline Architecture

Example Output

Data

Code

Data and Preprocessing

Main Pipeline

Pipeline Components

Model Weights

About

Releases

Packages

Contributors 2

Zacchaeus00/CSCI-376-Project-Implementation

Folders and files

Latest commit

History

Repository files navigation

Meme Caption Generation

Paper Report

Pipeline Architecture

Example Output

Data

Code

Data and Preprocessing

Main Pipeline

Pipeline Components

Model Weights

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages