NLP Crowdsourcing

📝 Paper • 📊 Poster • 👋 Visit our official website

This repository contains code and data for the paper Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation presented on CCL 2024.

If you are interested in our research, please visit our official website: ICALL Research Group at Beijing Language and Culture University.

What's New

2024/07/27: We presented our work with a poster at CCL 2024. The poster can be found here.

2024/05/25: Our paper has been accepted by CCL 2024! 🎉🎉🎉

2023/05/11: We uploaded the initial version of out paper to arXiv.

Installation

Clone this repository:

git clone https://github.com/blcuicall/nlp-crowdsourcing.git
cd ./nlp-crowdsourcing

Create a conda environment:
```
conda create -n nlp-crowdsourcing python=3.11
conda activate nlp-crowdsourcing
```
This is necessary since the experiments are run in different screen sessions in which this environment will be activated automatically.
Install the requirements:
```
pip install -r requirements.txt
```

Usage

To run the experiments mentioned in the paper, you can use the shell scripts provided in this repo like:

./run_table_tests.sh [oei|conll]

We provide five scripts for different experiments.

The script ./run_table_tests.sh reproduces the numerical results in Table 3 and 4 in the paper.

./run_table_tests.sh [oei|conll]

The script run_regret_tests.sh compares different CMAB algorithms on the regret metric.

./run_regret_tests.sh [oei|conll]

The script run_epsilon_tests.sh tests for the best Epsilon value in the Epsilon-Greedy algorithm.

./run_epsilon_tests.sh [oei|conll]

The script run_ucb_scale_tests.sh tests for the best UCB scale value in the CUCB algorithm.

./run_ucb_scale_tests.sh [oei|conll]

The script run_kappa_tests.sh tests for the best kappa threshold in the combined feedback mechanism.

./run_kappa_tests.sh [oei|conll]

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Reference

If you find our work helpful, please consider citing the following paper.

@inproceedings{wang-etal-2024-crowdsourcing-span,
    title = {Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation},
    author = {Yujie Wang and Chao Huang and Liner Yang and Zhixuan Fang and Yaping Huang and Yang Liu and Jingsi Yu and Erhong Yang},
    booktitle = {CCL},
    month = {July},
    year = {2024},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Crowdsourcing

What's New

Installation

Usage

License

Reference

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
conll_test		conll_test
data		data
sim		sim
src		src
LICENSE		LICENSE
README.md		README.md
nlp_crowdsourcing_poster.pdf		nlp_crowdsourcing_poster.pdf
requirements.txt		requirements.txt
run_epsilon_tests.sh		run_epsilon_tests.sh
run_kappa_tests.sh		run_kappa_tests.sh
run_regret_tests.sh		run_regret_tests.sh
run_table_tests.sh		run_table_tests.sh
run_ucb_scale_tests.sh		run_ucb_scale_tests.sh

License

blcuicall/nlp-crowdsourcing

Folders and files

Latest commit

History

Repository files navigation

NLP Crowdsourcing

What's New

Installation

Usage

License

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages