FsPONER: Few-shot Prompt Optimization for Named Entity Recognition in Domain-specific Scenarios
Link to the paper: https://ebooks.iospress.nl/doi/10.3233/FAIA240936 or https://arxiv.org/abs/2407.08035
Please cite the paper:
@incollection{tang2024fsponer,
title={FsPONER: Few-Shot Prompt Optimization for Named Entity Recognition in Domain-Specific Scenarios},
author={Tang, Yongjian and Hasan, Rakebul and Runkler, Thomas},
booktitle={ECAI 2024},
pages={3757--3764},
year={2024},
publisher={IOS Press}
}
data/assembly_dataset, data/fabNER, data/thin-film-technology-dataset store the original data of the three industrial datasets
data/immutable_data_formal stores the corresponding few-shot examples for each input sentence in the test dataset
contain the generated completions from LLMs, based on the proposed few-shot prompting methods
the code to set up the OpenAI LLMs and construct the prompt with selected few-shot examples
Please check the clean script for few-shot selection methods (random, embedding-based, TFIDF-based) in https://github.com/markustyj/FsPONER_ECAI2024/blob/main/few_shot_list_creation.py
The requirements: please see requirements_finetune_llama2.txt and requirements_gpt_prompting_hf38.txt
notebooks with eva_ prefix are evaluation results --> F1 score, precision, recall
notebooks with formal_finetune_ are the scripts to fine-tune LLaMA 2
notebooks with get_results_ are the scripts to get completions from LLaMA 2-chat, Vicuna...