About BIRD
-About NovelQA
- BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents - a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL - parsing. - - BIRD contains over 12,751 - unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. - It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and - education, etc. + This paper introduces ... This dataset utilizes ...
+ "Ayala's Angel": [ + { + "ques": "In novel Ayala's Angel, has someone ever screamed in the novel? If so, how many times have they screamed?", + "ops": [ + "No, never", + "Yes, 1", + "Yes, 2", + "Yes, 3" + ] + }, + ... + ] +
Subscribe to BIRD Update
-Bird is a long-term research project aimed at bridging the gap between semantic parsing models and the success of database applications. To receive the latest updates of the dataset, you can leave your email address. -
- - +Contributors
+Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Qian Wang, and Yue Zhang
License
+This dataset is released under the Apache-2.0 License.
+Subscribe to BIRD Update
Citation
--@article{li2024can, - title={Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls}, - author={Li, Jinyang and Hui, Binyuan and Qu, Ge and Yang, Jiaxi and Li, Binhua and Li, Bowen and Wang, Bailin and Qin, Bowen and Geng, Ruiying and Huo, Nan and others}, - journal={Advances in Neural Information Processing Systems}, - volume={36}, - year={2024} -}+ +
Citation
Model | -Code | -Size | -Oracle Knowledge | -Dev (%) | -Test (%) | +Parameter Size | +Context Window | +Openbook? | +Acc | Human Performance - Data Engineers + DB Students - |
- - | - | βοΈ | -- | 92.96 | - - - - -
---|---|---|---|---|---|---|---|---|---|---|---|---|
- Jan 14, 2024 - | -MCS-SQL + GPT-4 - Dunamu - |
- - | UNK | -βοΈ | -63.36 | -65.45 | -||||||
- Feb 27, 2024 - | -PB-SQL, v1 - Seoul National University - |
- - | UNK | -βοΈ | -60.50 | -64.84 | -||||||
- Feb 21, 2024 - | -
- Sense - Anonymous - |
- - | 13B | -βοΈ | -55.48 | -63.39 | -||||||
- Nov 16, 2023 - | -Dubo-SQL, v1 - Mercator Technologies - |
- - | UNK | -βοΈ | -59.71 | -60.71 | -||||||
- Oct 12, 2023 - | -SFT CodeS-15B - Renmin University of China [Li et al. SIGMOD'24] - |
- - [link] - | -15B | -βοΈ | -58.47 | -60.37 | -||||||
- Feb 27, 2024 - | -DTS-SQL + DeepSeek 7B - University of Alberta [Pourreza et al. '24] - |
- - [link] - | -7B | -βοΈ | -55.8 | -60.31 | -||||||
- Nov 21, 2023 - | -MAC-SQL + GPT-4 - BUAA & Tencent [Wang et al. '23] |
- - | UNK | -βοΈ | -57.56 | -59.59 | -||||||
- Oct 12, 2023 - | -SFT CodeS-7B - Renmin University of China [Li et al. SIGMOD'24] - |
- - [link] - | -7B | -βοΈ | -57.17 | -59.25 | ++ | + | β | +90% | ||
- Nov 09, 2023 - | -DAIL-SQL + GPT-4 - Alibaba Group [Gao and Wang et al. VLDB'24] - |
- - [link] - | -UNK | -βοΈ | -54.76 | -57.41 | -||||||
- Aug 15, 2023 - | -DIN-SQL + GPT-4 - University of Alberta [Pourreza et al. '23] - |
- - [link] - | -UNK | -βοΈ | -50.72 | -55.90 | -||||||
- Jul 01, 2023 + | π1 |
GPT-4 - Baseline |
- - [link] - | -UNK | -βοΈ | -46.35 | -54.89 | +- | +- | +- | +- | |
- Jul 16, 2023 - | -Claude-2 - Baseline - |
- - [link] - | -UNK | -βοΈ | -42.70 | -49.02 | -||||||
- Nov 23, 2023 + | π₯2 |
- Open-SQL - Anonymous + | Claude 2.1 |
- - | 7B | -βοΈ | -37.68 | -47.74 | +- | +- | +- | +- |
- Mar 17, 2023 + | π₯3 |
- ChatGPT + CoT - HKU & DAMO - [Li et al. NeurIPS'23] + | InternLM-7b |
- - [link] - | -UNK | -βοΈ | -36.64 | -40.08 | +- | +- | +- | +- |
- Mar 17, 2023 - | -ChatGPT - Baseline - |
- - | UNK | -βοΈ | -37.22 | -39.30 | -||||||
- Feb 17, 2023 - | -Codex - Baseline - |
- - | 175B | -βοΈ | -34.35 | -36.47 | -||||||
- Jul 16, 2023 - | -Palm-2 - Baseline - |
- - [link] - | -UNK | -βοΈ | -27.38 | -33.04 | -||||||
- Mar 17, 2023 - | -ChatGPT + CoT - HKU & DAMO [Li et al. NeurIPS'23] - |
- - [link] - | -UNK | -- | 25.88 | -28.95 | -||||||
- Mar 17, 2023 - | -ChatGPT - Baseline - |
- - | UNK | -- | 24.05 | -26.77 | -||||||
- Feb 17, 2023 - | -Codex - Baseline - |
- - | 175B | -- | 25.42 | -24.86 | -||||||
- Feb 5, 2023 - | -T5-3B - Baseline - |
- - | 3B | -βοΈ | -23.34 | -24.05 | -||||||
- Feb 3, 2023 - | -T5-Large - Baseline - |
- - | 770M | -βοΈ | -19.75 | -20.94 | -||||||
- Feb 3, 2023 + | 4 |
- T5-Base - Baseline + | InternLM-20b |
- - | 220M | -βοΈ | -11.54 | -12.89 | +- | +- | +- | +- |
- Feb 5, 2023 + | 5 |
- T5-3B - Baseline + | - |
- - | 3B | -- | 10.37 | -11.17 | +- | +- | +- | +- |
- Feb 3, 2023 + | 6 |
- T5-Large - Baseline + | - |
- - | 770M | -- | 9.71 | -10.38 | +- | +- | +- | +- |
- Feb 3, 2023 + | 7 |
- T5-Base - Baseline + | - |
- - | 220M | -- | 6.32 | -7.06 | +- | +- | +- | +- |
Model | -Code | -Size | -Oracle Knowledge | -Dev | -Test | +Parameter Size | +Context Window | +Openbook? | +Acc | Human Performance - Data Engineers + DB Students - |
- - | - | βοΈ | -- | 90.27 | - - -
---|---|---|---|---|---|---|---|---|---|---|---|---|
- Jan 14, 2024 - | -MCS-SQL + GPT-4 - Dunamu - |
- - | UNK | -βοΈ | -64.82 | -71.35 | -||||||
- Feb 27, 2024 - | -PB-SQL - Seoul National University - |
- - | UNK | -βοΈ | -71.31 | -68.90 | -||||||
- Nov 21, 2023 - | -MAC-SQL + GPT-4 - BUAA & Tencent [Wang et al. '23] - |
- - | UNK | -βοΈ | -58.76 | -67.68 | -||||||
- Feb 27, 2024 - | -DTS-SQL + DeepSeek 7B - University of Alberta [Pourreza et al. '24] - |
- - [link] - | -7B | -βοΈ | -60.31 | -64.52 | -||||||
- Oct 12, 2023 - | -SFT CodeS-15B - Renmin University of China [Li et al. SIGMOD'24] - |
- - [link] - | -15B | -βοΈ | -59.87 | -64.22 | -||||||
- Oct 12, 2023 - | -SFT CodeS-7B - Renmin University of China [Li et al. SIGMOD'24] - |
- - [link] - | -7B | -βοΈ | -58.80 | -63.62 | -||||||
- Nov 16, 2023 - | -Dubo-SQL, v1 - Mercator Technologies - |
- - | UNK | -βοΈ | -66.01 | -63.00 | ++ | + | β | +97% | ||
- Nov 09, 2023 - | -DAIL-SQL + GPT-4 - Alibaba Group [Gao and Wang et al. VLDB'24] - |
- - [link] - | -UNK | -βοΈ | -56.08 | -61.95 | -||||||
- Jul 01, 2023 + | π1 |
GPT-4 - Baseline - |
- - [link] | -UNK | -βοΈ | -49.77 | -60.77 | -|||||
- Aug 15, 2023 - | -DIN-SQL + GPT-4 - University of Alberta [Pourreza et al. '23] - |
- - [link] - | -UNK | -βοΈ | -58.79 | -59.44 | +- | +- | +- | +- | ||
- Mar 17, 2023 - | -ChatGPT + CoT - HKU & DAMO [Li et al. NeurIPS'23] - |
- - [link] - | -UNK | -βοΈ | -42.30 | -56.56 | -||||||
- Mar 17, 2023 - | -ChatGPT - Baseline - |
- - | UNK | -βοΈ | -43.81 | -51.40 | -||||||
- Mar 17, 2023 - | -ChatGPT + CoT - HKU & DAMO [Li et al. NeurIPS'23] - |
- - [link] - | -UNK | -- | 32.33 | -49.69 | -||||||
- Nov 23, 2023 - | -OPEN-SQL - Anonymous - |
- - | 7B | -βοΈ | -41.56 | -48.08 | -||||||
- Feb 17, 2023 + | π₯2 |
- Codex - Baseline + | Claude 3 |
- - | 175B | -βοΈ | -43.41 | -41.60 | +- | +- | +- | +- |
- Mar 17, 2023 + | π₯3 |
- ChatGPT - Baseline + | Claude 2.1k |
- - | UNK | -- | 27.97 | -36.68 | +- | +- | +- | +- |
- Feb 17, 2023 - | -Codex - Baseline - |
- - | 175B | -- | 33.37 | -35.40 | -||||||
- Feb 5, 2023 - | -T5-3B - Baseline - |
- - | 3B | -βοΈ | -25.57 | -27.80 | -||||||
- Feb 3, 2023 - | -T5-Large - Baseline - |
- - | 770M | -βοΈ | -22.74 | -25.00 | -||||||
- Feb 5, 2023 + | 4 |
- T5-3B - Baseline + | InternLM-7b |
- - | 3B | -- | 13.62 | -15.17 | +- | +- | +- | +- |
- Feb 3, 2023 + | 5 |
- T5-Base - Baseline + | InternLM-20b |
- - | 220M | -βοΈ | -12.90 | -14.70 | +- | +- | +- | +- |
- Feb 3, 2023 + | 6 |
- T5-Large - Baseline + | - |
- - | 770M | -- | 9.90 | -12.25 | +- | +- | +- | +- |
- Feb 3, 2023 + | 7 |
- T5-Base - Baseline + | - |
- - | 220M | -- | 7.78 | -8.97 | +- | +- | +- | +- |