ChallengeClinicalQA

Repo for the paper Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

Datasets

We do not publicly release the JAMA Clinical Challenge data due to license constraints. Instead, we provide URLs to the articles and a scraper that you can use to obtain the data with the appropriate license. Please check your license to ensure you have access to JAMA articles (Full Text) before you run the script.

Install the required dependencies

pip install -r requirements.txt

Scrape the data

python jama_scraper.py

The data will be saved in jama_raw.csv and jama_raw.json files.

We thank awxlong for providing fetch_jama_cases to scrape updated links for new data.

Scrape updated links

python fetch_jama_cases.py

The updated links will be saved in jama_links_updated.json.

Reference

If you find this repository helpful, please cite our paper:

@article{chen2024benchmarking,
  title={Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions},
  author={Chen, Hanjie and Fang, Zhouxiang and Singla, Yash and Dredze, Mark},
  journal={arXiv preprint arXiv:2402.18060},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
medbullets		medbullets
.gitignore		.gitignore
README.md		README.md
fetch_jama_cases.py		fetch_jama_cases.py
jama_links.json		jama_links.json
jama_scraper.py		jama_scraper.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChallengeClinicalQA

Datasets

Reference

About

Releases

Packages

Contributors 2

Languages

HanjieChen/ChallengeClinicalQA

Folders and files

Latest commit

History

Repository files navigation

ChallengeClinicalQA

Datasets

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages