[Feat] Add qwen2_audio model support and Automatic speech recognition task with LibriSpeech dataset #289

Prophet-C · 2024-09-30T15:41:09Z

[Feat] Add qwen2_audio model support
[Feat] Add Automatic speech recognition (ASR) task with LibriSpeech dataset
[Feat] Add evalutation metric word error rate (WER) for ASR task

[Need Check] require new packages to be installed for evaluation on ASR evaluation (no specific version required): zhconv, editdistance, more_itertools

Luodian · 2024-09-30T15:49:14Z

Thanks for this PR, seems linting not passed.

You may consider use pre-commit and execute pre-commit run --all-files

… task with LibriSpeech dataset (#289) * "add qwen2_audio model, asr librispeech eval task" * lint update for PR #289 --------- Co-authored-by: Pengyun <u7978909@anu.edu.au>

* [Feat] Add qwen2_audio model support and Automatic speech recognition task with LibriSpeech dataset (#289) * "add qwen2_audio model, asr librispeech eval task" * lint update for PR #289 --------- Co-authored-by: Pengyun <u7978909@anu.edu.au> * add clotho_aqa task * Apply black formatting * formatting * excluding xl due to downloading issue. * [Feat] add audiobench version of clothoaqa (#302) * add clothoaqa task * formatting * minor fixes * minor fixes * Add AIR_bench task (#315) * add air_bench * minor changes * add common_voice_15 and people_speech tasks (#316) Co-authored-by: Pengyun <u7978909@anu.edu.au> * add indent to yaml * Add openhermes task (#323) * add openhermes task * formatting * [Refactor] Fixing doc to audio return type, qwen_audio revise (#329) * Add downsample function for audio array * Batch support for qwen2 and use apply chat template * Return sr for common voice * Doc to audio to return the whole dict * add muchomusic and vocalsound task (#331) * add alpaca audio task (#333) * [feat] added gigaspeech config (#334) * fix xl yaml * Fixed config for gigaspeech_xl. gigaspeech_xl_test has intermittent problem. * add alpaca audio task (#333) * pre-committed utils.py --------- Co-authored-by: Cong <101887866+pbcong@users.noreply.github.com> * add tedlium_long_form and tedlium_dev_test tasks (#345) Co-authored-by: Pengyun <u7978909@anu.edu.au> * [Feat] add-wavcaps (#349) * fix xl yaml * Fixed config for gigaspeech_xl. gigaspeech_xl_test has intermittent problem. * add alpaca audio task (#333) * pre-committed utils.py * add wavcaps * add wavcaps --------- Co-authored-by: Cong <101887866+pbcong@users.noreply.github.com> * Update dep and fix log samples for audio (#355) * Update dep * Fix saved audio OOM error * Fix typing * Fix librispeech dataset name * Add add_generation_prompt as option for Qwen audio * Add add system propmt as optional * fix vocalsound (#362) * Add using simple prompt for Qwen2 Audio to align (#360) * Add retry for gpt api call and improve air_bench aggregation function (#376) * add retry for api calls and change air_bench_foundation aggregation function * make azure default api * minor changes * [Feat] Add mix_evals audio2text (#420) * Add mix_evals audio2text * Fix task tags in datasets * Gemini Audio (#421) * gemini audio * better variable naming * Revise prompt * delete redundant tasks in gigaspeech * Fix wavcaps bugs * Add lmms-eval-0.3 docs Update lmms-eval-0.3.md fix errors in markdown and add hyperlinks proofread markdown and fix errors rewrite some parts to fix errors rewrite some parts to fix errors rewrite some parts to fix errors try optimize the table format using html try optimize the table format using html try optimize the table 2 format final proofread final proofread final proofread add explanantion for AIF and ASR standardize WER to WER(↓) final proofread final proofread final proofread final proofread correct hyperlink errors modify readme to support lmms-eval0.3.0 release modify icon fix typos Co-Authored-By: KairuiHu <kairuih12@gmail.com> --------- Co-authored-by: Pengyun Wang <91826032+Prophet-C@users.noreply.github.com> Co-authored-by: Pengyun <u7978909@anu.edu.au> Co-authored-by: pbcong <congphamba2005@gmail.com> Co-authored-by: Li Bo <drluodian@gmail.com> Co-authored-by: Cong <101887866+pbcong@users.noreply.github.com> Co-authored-by: Yingluo <liyingluo57@gmail.com> Co-authored-by: Totoluo <52833580+Yingluo-momo@users.noreply.github.com> Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg> Co-authored-by: KairuiHu <kairuih12@gmail.com>

* [Feat] Add qwen2_audio model support and Automatic speech recognition task with LibriSpeech dataset (EvolvingLMMs-Lab#289) * "add qwen2_audio model, asr librispeech eval task" * lint update for PR EvolvingLMMs-Lab#289 --------- Co-authored-by: Pengyun <u7978909@anu.edu.au> * add clotho_aqa task * Apply black formatting * formatting * excluding xl due to downloading issue. * [Feat] add audiobench version of clothoaqa (EvolvingLMMs-Lab#302) * add clothoaqa task * formatting * minor fixes * minor fixes * Add AIR_bench task (EvolvingLMMs-Lab#315) * add air_bench * minor changes * add common_voice_15 and people_speech tasks (EvolvingLMMs-Lab#316) Co-authored-by: Pengyun <u7978909@anu.edu.au> * add indent to yaml * Add openhermes task (EvolvingLMMs-Lab#323) * add openhermes task * formatting * [Refactor] Fixing doc to audio return type, qwen_audio revise (EvolvingLMMs-Lab#329) * Add downsample function for audio array * Batch support for qwen2 and use apply chat template * Return sr for common voice * Doc to audio to return the whole dict * add muchomusic and vocalsound task (EvolvingLMMs-Lab#331) * add alpaca audio task (EvolvingLMMs-Lab#333) * [feat] added gigaspeech config (EvolvingLMMs-Lab#334) * fix xl yaml * Fixed config for gigaspeech_xl. gigaspeech_xl_test has intermittent problem. * add alpaca audio task (EvolvingLMMs-Lab#333) * pre-committed utils.py --------- Co-authored-by: Cong <101887866+pbcong@users.noreply.github.com> * add tedlium_long_form and tedlium_dev_test tasks (EvolvingLMMs-Lab#345) Co-authored-by: Pengyun <u7978909@anu.edu.au> * [Feat] add-wavcaps (EvolvingLMMs-Lab#349) * fix xl yaml * Fixed config for gigaspeech_xl. gigaspeech_xl_test has intermittent problem. * add alpaca audio task (EvolvingLMMs-Lab#333) * pre-committed utils.py * add wavcaps * add wavcaps --------- Co-authored-by: Cong <101887866+pbcong@users.noreply.github.com> * Update dep and fix log samples for audio (EvolvingLMMs-Lab#355) * Update dep * Fix saved audio OOM error * Fix typing * Fix librispeech dataset name * Add add_generation_prompt as option for Qwen audio * Add add system propmt as optional * fix vocalsound (EvolvingLMMs-Lab#362) * Add using simple prompt for Qwen2 Audio to align (EvolvingLMMs-Lab#360) * Add retry for gpt api call and improve air_bench aggregation function (EvolvingLMMs-Lab#376) * add retry for api calls and change air_bench_foundation aggregation function * make azure default api * minor changes * [Feat] Add mix_evals audio2text (EvolvingLMMs-Lab#420) * Add mix_evals audio2text * Fix task tags in datasets * Gemini Audio (EvolvingLMMs-Lab#421) * gemini audio * better variable naming * Revise prompt * delete redundant tasks in gigaspeech * Fix wavcaps bugs * Add lmms-eval-0.3 docs Update lmms-eval-0.3.md fix errors in markdown and add hyperlinks proofread markdown and fix errors rewrite some parts to fix errors rewrite some parts to fix errors rewrite some parts to fix errors try optimize the table format using html try optimize the table format using html try optimize the table 2 format final proofread final proofread final proofread add explanantion for AIF and ASR standardize WER to WER(↓) final proofread final proofread final proofread final proofread correct hyperlink errors modify readme to support lmms-eval0.3.0 release modify icon fix typos Co-Authored-By: KairuiHu <kairuih12@gmail.com> --------- Co-authored-by: Pengyun Wang <91826032+Prophet-C@users.noreply.github.com> Co-authored-by: Pengyun <u7978909@anu.edu.au> Co-authored-by: pbcong <congphamba2005@gmail.com> Co-authored-by: Li Bo <drluodian@gmail.com> Co-authored-by: Cong <101887866+pbcong@users.noreply.github.com> Co-authored-by: Yingluo <liyingluo57@gmail.com> Co-authored-by: Totoluo <52833580+Yingluo-momo@users.noreply.github.com> Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg> Co-authored-by: KairuiHu <kairuih12@gmail.com>

Prophet-C added 2 commits September 30, 2024 23:30

"add qwen2_audio model, asr librispeech eval task"

4bb0b38

Merge branch 'main' of https://github.com/Prophet-C/lmms-eval into main

b122c34

Luodian approved these changes Sep 30, 2024

View reviewed changes

Luodian changed the base branch from main to feat-dev September 30, 2024 15:51

lint update for PR EvolvingLMMs-Lab#289

1ce6b48

Luodian merged commit 4b7ccea into EvolvingLMMs-Lab:feat-dev Oct 1, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add qwen2_audio model support and Automatic speech recognition task with LibriSpeech dataset #289

[Feat] Add qwen2_audio model support and Automatic speech recognition task with LibriSpeech dataset #289

Prophet-C commented Sep 30, 2024

Luodian commented Sep 30, 2024

[Feat] Add qwen2_audio model support and Automatic speech recognition task with LibriSpeech dataset #289

[Feat] Add qwen2_audio model support and Automatic speech recognition task with LibriSpeech dataset #289

Conversation

Prophet-C commented Sep 30, 2024

Luodian commented Sep 30, 2024