📝 Paper • 📊 Poster • 👋 Visit our official website
This repository contains code and data for the paper Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation presented on CCL 2024.
If you are interested in our research, please visit our official website: ICALL Research Group at Beijing Language and Culture University.
2024/07/27: We presented our work with a poster at CCL 2024. The poster can be found here.
2024/05/25: Our paper has been accepted by CCL 2024! 🎉🎉🎉
2023/05/11: We uploaded the initial version of out paper to arXiv.
-
Clone this repository:
git clone https://github.com/blcuicall/nlp-crowdsourcing.git cd ./nlp-crowdsourcing
-
Create a conda environment:
conda create -n nlp-crowdsourcing python=3.11 conda activate nlp-crowdsourcing
This is necessary since the experiments are run in different screen sessions in which this environment will be activated automatically.
-
Install the requirements:
pip install -r requirements.txt
To run the experiments mentioned in the paper, you can use the shell scripts provided in this repo like:
./run_table_tests.sh [oei|conll]
We provide five scripts for different experiments.
The script ./run_table_tests.sh
reproduces the numerical results in Table 3 and 4 in the paper.
./run_table_tests.sh [oei|conll]
The script run_regret_tests.sh
compares different CMAB algorithms on the regret metric.
./run_regret_tests.sh [oei|conll]
The script run_epsilon_tests.sh
tests for the best Epsilon value in the Epsilon-Greedy algorithm.
./run_epsilon_tests.sh [oei|conll]
The script run_ucb_scale_tests.sh
tests for the best UCB scale value in the CUCB algorithm.
./run_ucb_scale_tests.sh [oei|conll]
The script run_kappa_tests.sh
tests for the best kappa threshold in the combined feedback mechanism.
./run_kappa_tests.sh [oei|conll]
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you find our work helpful, please consider citing the following paper.
@inproceedings{wang-etal-2024-crowdsourcing-span,
title = {Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation},
author = {Yujie Wang and Chao Huang and Liner Yang and Zhixuan Fang and Yaping Huang and Yang Liu and Jingsi Yu and Erhong Yang},
booktitle = {CCL},
month = {July},
year = {2024},
}