LLM Summarization Capability Evaluation

This project is meant for evaluating the summarization capabilities of three Large Language Models (T5-Small, T5-Large and GPT2).

Dataset:

CNN_Daily Mail Dataset [https://huggingface.co/datasets/cnn_dailymail]

Evaluation Metric

ROUGE: Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a set of evaluation metrics designed for comparing summaries from Lin et al., 2004. See Wikipedia for more info. Here, we use the Hugging Face Evaluator wrapper to call into the rouge_score package. This package provides 4 scores:

rouge1: ROUGE computed over unigrams (single words or tokens)
rouge2: ROUGE computed over bigrams (pairs of consecutive words or tokens)
rougeL: ROUGE based on the longest common subsequence shared by the summaries being compared
rougeLsum: like rougeL, but at "summary level," i.e., ignoring sentence breaks (newlines)

Compute Resources

Single GPU available from Google Colab Free version

How did I fit three models in Google Colab Free version

Deleted and cleaned the gpu memory after every evaluation so the GPU is not full.

del model
torch.cuda.empty_cache()
gc.collect() # Garbage Collector

Results

Model specific results and the summarization results are available in the attached notebook

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LLM__EVALUATION_SUMMARIZATION_TASK.ipynb		LLM__EVALUATION_SUMMARIZATION_TASK.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Summarization Capability Evaluation

Dataset:

Evaluation Metric

Compute Resources

How did I fit three models in Google Colab Free version

Results

About

Releases

Packages

Languages

Cenrax/LLMSummaEvaluation

Folders and files

Latest commit

History

Repository files navigation

LLM Summarization Capability Evaluation

Dataset:

Evaluation Metric

Compute Resources

How did I fit three models in Google Colab Free version

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages