Adds audio querying to MultimodalQ&A Example #1225

mhbuehler · 2024-12-04T23:02:02Z

Description

Adds the option to speak into the browser microphone or upload an audio file to the main query tab. The speech audio is POSTed as a base64 string to the MultimodalQnA gateway where it is transcribed to text by the ASR whisper service and then passed on to the LVM service.

Note: This PR has been updated so that the changes to the megaservice are included in this repo in multimodalqna.py, so a corresponding PR to GenAIComps is not needed.

Issues

This is a part of the MultimodalQnA Image & Audio Enhancements RFC

Type of change

New feature (non-breaking change which adds new functionality)

Dependencies

N/A

Tests

New tests were added to the example's README.md, test_compose_on_xeon.sh, and test_compose_on_gaudi.sh.

Signed-off-by: okhleif-IL <omar.khleif@intel.com> * validated, updated tests Signed-off-by: okhleif-IL <omar.khleif@intel.com> * added one more curl test for audio Signed-off-by: okhleif-IL <omar.khleif@intel.com> * fixed typo Signed-off-by: okhleif-IL <omar.khleif@intel.com> * reverted git clone command Signed-off-by: okhleif-IL <omar.khleif@intel.com> * added ASR test Signed-off-by: okhleif-IL <omar.khleif@intel.com> * fixed command with backslashes Signed-off-by: okhleif-IL <omar.khleif@intel.com> --------- Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com> Signed-off-by: okhleif-IL <omar.khleif@intel.com> Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* MMQnA doc update correcting ASR and whisper image names Signed-off-by: dmsuehir <dina.s.jones@intel.com> * Add image tags Signed-off-by: dmsuehir <dina.s.jones@intel.com> --------- Signed-off-by: dmsuehir <dina.s.jones@intel.com>

* Enabled audio query functionality in the MultimodalQnA UI Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

Temporarily redirect clones for tests

for more information, see https://pre-commit.ci

MultimodalQnA/tests/test_compose_on_xeon.sh

ashahba

LGTM but don't merge please unless the branch is changed to point to main again.

* Add services to tests and correct small text error Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com> * Revert unintended changes Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com> --------- Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

ashahba

Holding off on this PR until opea-project/GenAIComps#974 is merged.

Fixed build.yaml inconsistency

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

Update repo clones for MultimodalQnA E2E tests

* Moved gateway changes to multimodalqna.py Signed-off-by: okhleif-IL <omar.khleif@intel.com> * reverted port changes Signed-off-by: okhleif-IL <omar.khleif@intel.com> * addressed review comments Signed-off-by: okhleif-IL <omar.khleif@intel.com> * reverted print statement Signed-off-by: okhleif-IL <omar.khleif@intel.com> --------- Signed-off-by: okhleif-IL <omar.khleif@intel.com>

* Moved gateway changes to multimodalqna.py Signed-off-by: okhleif-IL <omar.khleif@intel.com> * reverted port changes Signed-off-by: okhleif-IL <omar.khleif@intel.com> * addressed review comments Signed-off-by: okhleif-IL <omar.khleif@intel.com> * reverted print statement Signed-off-by: okhleif-IL <omar.khleif@intel.com> * removed proxies Signed-off-by: okhleif-IL <omar.khleif@intel.com> --------- Signed-off-by: okhleif-IL <omar.khleif@intel.com>

ashahba

LGTM!

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com> Signed-off-by: okhleif-IL <omar.khleif@intel.com> Signed-off-by: dmsuehir <dina.s.jones@intel.com> Co-authored-by: Omar Khleif <omar.khleif@intel.com> Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> Signed-off-by: Chingis Yundunov <YundunovCN@sibedge.com>

okhleif-IL and others added 5 commits December 2, 2024 15:40

Integrate audio query into UI (#22)

84ec278

* Enabled audio query functionality in the MultimodalQnA UI Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

Temporarily redirect clones for tests

c9fe70e

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

Merge pull request #25 from mhbuehler/melanie/redirect_clones_for_tests

7f7236d

Temporarily redirect clones for tests

mhbuehler requested a review from lvliang-intel as a code owner December 4, 2024 23:02

mhbuehler and others added 2 commits December 4, 2024 15:02

Merge branch 'main' into mmqna-audio-query

56db11a

[pre-commit.ci] auto fixes from pre-commit.com hooks

f67146f

for more information, see https://pre-commit.ci

mhbuehler mentioned this pull request Dec 4, 2024

Adds audio querying to MultimodalQ&A gateway opea-project/GenAIComps#974

Closed

1 task

ashahba added WIP r1.2 OPEA 1.2 RELEASE TAG labels Dec 4, 2024

ashahba added this to the v1.2 milestone Dec 4, 2024

chensuyue reviewed Dec 5, 2024

View reviewed changes

MultimodalQnA/tests/test_compose_on_xeon.sh Outdated Show resolved Hide resolved

chensuyue reviewed Dec 5, 2024

View reviewed changes

MultimodalQnA/tests/test_compose_on_xeon.sh Outdated Show resolved Hide resolved

Merge branch 'main' into mmqna-audio-query

fdf5a08

ashahba approved these changes Dec 5, 2024

View reviewed changes

mhbuehler and others added 3 commits December 5, 2024 14:42

Fixed build.yaml inconsistency

30e33a6

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

Merge branch 'main' into mmqna-audio-query

9ba341b

ashahba requested changes Dec 6, 2024

View reviewed changes

mhbuehler and others added 6 commits December 6, 2024 16:48

Merge pull request #27 from mhbuehler/melanie/whisper_image_name

54c82ac

Fixed build.yaml inconsistency

Merge branch 'main' into mmqna-audio-query

bcabb36

Update repo clones for E2E tests

02b87b0

Signed-off-by: Melanie Buehler <melanie.h.buehler@intel.com>

Merge pull request #30 from mhbuehler/melanie/revert_clones

c421e68

Update repo clones for MultimodalQnA E2E tests

Merge branch 'main' into mmqna-audio-query

674c975

mhbuehler changed the title ~~Adds audio querying to MultimodalQ&A UI~~ Adds audio querying to MultimodalQ&A Example Dec 10, 2024

okhleif-IL and others added 2 commits December 10, 2024 14:54

Merge branch 'main' into mmqna-audio-query

ba1fd52

ashahba removed the WIP label Dec 10, 2024

ashahba approved these changes Dec 10, 2024

View reviewed changes

Merge branch 'main' into mmqna-audio-query

55585ab

lvliang-intel approved these changes Dec 12, 2024

View reviewed changes

lvliang-intel merged commit c760cac into opea-project:main Dec 12, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds audio querying to MultimodalQ&A Example #1225

Adds audio querying to MultimodalQ&A Example #1225

mhbuehler commented Dec 4, 2024 •

edited

Loading

ashahba left a comment

ashahba left a comment

ashahba left a comment

Adds audio querying to MultimodalQ&A Example #1225

Adds audio querying to MultimodalQ&A Example #1225

Conversation

mhbuehler commented Dec 4, 2024 • edited Loading

Description

Issues

Type of change

Dependencies

Tests

ashahba left a comment

Choose a reason for hiding this comment

ashahba left a comment

Choose a reason for hiding this comment

ashahba left a comment

Choose a reason for hiding this comment

mhbuehler commented Dec 4, 2024 •

edited

Loading