The Gender Rewriting Shared Task will take place at the Seventh Arabic Natural Language Processing Workshop (WANLP 2022) at EMNLP 2022.
Please join our Slack workspace and Google Group to stay up to date. If you have any questions, feel free to email the organizers.
Click here to register for the shared task!
The task of gender rewriting refers to generating alternatives of a given sentence to match different target user gender contexts (e.g., female speaker with a male listener, a male speaker with a male listener, etc.). This requires changing the grammatical gender (masculine or feminine) of certain words referring to the users (speaker/1st person and listener/2nd person). In this task, we focus on Arabic, a gender-marking morphologically rich language. The task of gender rewriting was introduced by Alhafni et al. (2022).
All participating teams will use the publicly available Arabic Parallel Gender Corpus v2.1 to train and test their systems. Participants are not allowed to use external manually labeled datasets, but they can leverage unlabeled data to create synthetic examples (i.e., data augmentation). A blind test set will be used to evaluate the outputs of participating teams. All teams are required to report on the development and test sets in their write ups.
We will treat the task of gender rewriting as a user-aware grammatical error correction task and use the M2 Scorer as the evaluation metric. The M2 Scorer computes the Precision, Recall, and F0.5 of the word-level edits between the input and the rewritten output against the gold edits. We provide instructions on how to run the evaluation script below.
You will need to have conda installed. To setup the environment, you would need to run:
git clone https://github.com/CAMeL-Lab/gender-rewriting-shared-task.git
cd gender-rewriting-shared-task
bash scripts/create_envs.sh
conda activate gender_rewriting
pip install -r requirements.txt
To run the m2scorer evaluation, you will also need to download the m2 files edits we have prepared for the dev and the test sets of the Arabic Parallel Gender Corpus. The m2 files are provided in this release.
Your system should generate four output files. Each one of those output files represents the target gender context you are modeling:
- Target MM: Masculine first person and masculine second person.
- Target FM: Feminine first person and masculine second person.
- Target MF: Masculine first person and feminine second person.
- Target FF: Feminine first person and masculine second person.
Once you have the four outputs, place them in a single directory and name them respectively as: arin.to.MM, arin.to.FM, arin.to.MF, and arin.to.FF. Since the Arabic Parallel Gender Corpus v2.1 is balanced by design, all of the four files should have the same number of sentences. The output_example folder shows how the files should look like when you run your system on the dev set.
To run the m2scorer on your system's output, you would need to run:
bash scripts/eval.sh /path/to/outputs /path/to/m2_files [dev|test]
For example, to run the evaluation over the provided dev set example outputs, you would run:
bash scripts/eval.sh output_example m2_edits dev
The expected output should be:
Precision : 0.8846
Recall : 0.8674
F_0.5 : 0.8811
This repo is available under the MIT license. See the LICENSE for more info.
-
User-Centric Gender Rewriting. Alhafni, Bashar, Nizar Habash, Houda Bouamor. 2022. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, Seattle, Washington.
-
The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses. Alhafni, Bashar, Nizar Habash, Houda Bouamor. 2022. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), Marseille, France.