LabelSense

LabelSense: A Plug-and-play Centroid-based Label Semantic Enhanced Contrastive Learning for Text Classification

TL;DR

LabelSense is a versatile plug-and-play method that can be added to any encoder to enhance classification performance by having an awareness of label context.

Abstract

Understanding the relationship between a concept's intension (features) and extension (examples) is crucial in machine learning classification. Clear and precise definitions of each class (intension) help distinguish classes, while effective generalization (extension) ensures accurate predictions on unseen data. Balancing these aspects enhances model performance and generalization. Addressing the challenge of label semantics in text classification, we present LabelSense, a versatile "plug-and-play" approach that augments labels with rich explanations and illustrative examples, achieving robust label contextualization and enhanced label semantics. Leveraging Centroid-based Learning and contrastive learning techniques, our methodology significantly advances text classification performance across diverse datasets. Evaluation on various datasets, including legal and multi-label text datasets, demonstrates a notable enhancement of over 10 percentage points in Precision@1 compared to the baseline model, highlighting the effectiveness of LabelSense.

Usage

⚙️ Environment

Create a new environment LabelSense using conda

conda create -n LabelSense python=3.11

Activate LabelSense environment

conda activate LabelSense

Install requirements

pip install -r requirements.txt

📖 Dataset

The datasets used in this paper include:

Multi-class Dataset
- 20 Newsgroups
- Blurb Genre Collection (BGC)
- Web of Science (WOS)
Multi-label Dataset
- RCV1-v2
- Reuters-21578
- AAPD
- freecode
- EUR-Lex

You can download using data/download.sh

First you need to change your directory to data:

cd data

And download dataset using:

bash download.sh <Dataset Name>

For example, if you want to download RCV1-v2:

bash download.sh RCV1-v2

Of course, you can also download all datasets at once:

bash download.sh all

The downloaded dataset is placed in the origin folder of the data/<Dataset Name> folder

🔧 Preprocess

Go to the directory data/<Dataset Name> and there is a preprocess.ipynb.

By running this jupyter notebook will get the preprocessed files id2label.json, label2id.json, train.json, train-with-example.json, test.json, test-with-example.json

🎯 To Run

Modify the parameters in the run.sh file

And run in the project root directory:

bash run.sh

The result will be placed in the output_dir folder

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
model.py		model.py
requirements.txt		requirements.txt
run.sh		run.sh
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LabelSense

TL;DR

Abstract

Usage

⚙️ Environment

📖 Dataset

🔧 Preprocess

🎯 To Run

About

Releases

Packages

Languages

License

CodingMonkey12/LabelSense

Folders and files

Latest commit

History

Repository files navigation

LabelSense

TL;DR

Abstract

Usage

⚙️ Environment

📖 Dataset

🔧 Preprocess

🎯 To Run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages