This is the repository of the implementation in the paper.
Conda environment is recommended for running the code. To create the environment, run the following command:
conda env create -f environment.yml
To activate the environment, run the following command:
conda activate chime
Download SciSpacy model:
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_core_sci_sm-0.5.4.tar.gz
Add api keys to the environment variable:
export CLAUDE_API_KEY="YOUR KEY"
export OPENAI_API_KEY="YOUR KEY"
THe hierarchy generation pipeline is implemented in chime/src/hierarchical_category_construction.py
.
Turn off DEBUG flag in
chime/src/hierarchical_category_construction.py
to run the pipeline on the entire dataset.
cd chime/src
python hierarchical_category_construction.py
The fine-tuning and LLM prediction is implemented in chime/src/flanT5
and chime/src/llm_prediction
respectively.
Finetuned model and dataset are available on theHugging Face hub. You can find the datasets as follows:
joe32140/chime-parent-child-relation
for the parent-child relation dataset: link.joe32140/chime-sibling-coherence
for the sibling coherence dataset: link.joe32140/chime-claim-category
for the claim and category relevance dataset: link.
The finetuned model for claim and category relevance prediction joe32140/flan-t5-large-claim-category
: link.
We also provide the model prediction on hierachy without human annotion for the claim and category prediction. You can find the prediction here.
See chime/src/parse_generated_hierarchy.py
parse the generated hierarchy into structured format. Note that 2 out of 474 cannot be parsed due to the format of the generated hierarchy which results in total of 472 hierarchies in the paper.
- Paper
resources/raw_generated_hierarchy.csv
contains the raw generated hierarchies from the Claude-2.resources/raw_source_data.csv
contains the raw review and studies data from the Cochrane Library. The generated claims are also included in this file.
If you use this code or dataset, please cite the following:
@inproceedings{hsu-etal-2024-chime,
title = "{CHIME}: {LLM}-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support",
author = "Hsu, Chao-Chun and
Bransom, Erin and
Sparks, Jenna and
Kuehl, Bailey and
Tan, Chenhao and
Wadden, David and
Wang, Lucy and
Naik, Aakanksha",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.8",
pages = "118--132",
}