ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage

📋 Introduction

ETHIC is a long-context benchmark designed to assess whether LLMs can fully utilize the provided information. ETHIC comprises tasks with high Information Coverage (IC) scores (~91%), i.e. the proportion of input context necessary for answering queries.

⚒️ Setup

We recommend using the following versions for compatibility.

PyTorch 2.4.0
Cuda 12.1

# create a new environment
conda create -n ethic python==3.9.19
conda activate ethic

# install required packages
pip install -r requirements.txt

⏩ Quickstart

To use our dataset directly, simply download it using 🤗 Datasets:

from datasets import load_dataset

task = "Recalling" # Choose from "Recalling", "Summarizing", "Organizing", "Attributing"
dataset = load_dataset("dmis-lab/ETHIC", task)["test"]

For model inference and evaluation, prepare your OpenAI API key (or other keys for authorization) in api_config.py, as we utilize gpt-4o in the Summarizing task.

# run.sh

CUDA_VISIBLE_DEVICES=1

# arguments
task=Attributing # Recalling, Summarizing, Organizing, Attributing
model_name_or_path=meta-llama/Meta-Llama-3.1-8B-Instruct
cache_dir=""

cmd="CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES python inference.py \
    --task $task \
    --model_name_or_path $model_name_or_path"

if [ -n "$cache_dir" ]; then
    cmd="$cmd --cache_dir $cache_dir"
fi

eval $cmd

Citation

@article{lee2024ethic,
  title={ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage},
  author={Lee, Taewhoo and Yoon, Chanwoong and Jang, Kyochul and Lee, Donghyeon and Song, Minju and Kim, Hyunjae and Kang, Jaewoo},
  journal={arXiv preprint arXiv:2410.16848},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
figs		figs
README.md		README.md
api_config.py		api_config.py
inference.py		inference.py
requirements.txt		requirements.txt
run.sh		run.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage

📋 Introduction

⚒️ Setup

⏩ Quickstart

Citation

About

Releases

Packages

Languages

dmis-lab/ETHIC

Folders and files

Latest commit

History

Repository files navigation

ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage

📋 Introduction

⚒️ Setup

⏩ Quickstart

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages