Transcribe audio files using the OpenAI Whisper (speech-to-text) API #451

dkotter · 2023-05-04T18:56:32Z

Description of the Change

This PR introduces a new integration with the OpenAI Whisper (speech-to-text) API. This integration will automatically create a transcript for any supported audio files and store that transcript as the post_content for the item (shows in the Description field).

Workflow

A new settings section is added under Tools > ClassifAI > Language Processing > OpenAI Whisper. Here there are three options to choose from:

Enter your API key
Turn the transcription feature on
Choose which roles have access

Once configured, whenever a valid audio file is uploaded (must be under 25 MB and the file type has to be one of: mp3, mp4, mpeg, mpga, m4a, wav, or webm) we send that file to the Whisper API and if we get a successful response back, we'll store the transcript as the post_content of the attachment item.

For existing audio items, there's a few options to generate transcripts. You can go to the Media Library grid view and click on an audio file. Within the modal that pops up, there will be an option there to (Re-) Transcribe:

You can also go to the single media view and there will be a custom metabox with an option to (Re-) Transcribe the item. Check the box and then save the item:

If you prefer using the Media Model list view, there's a Transcribe audio bulk edit option as well as an inline Transcribe option:

Bulk transcribe	Inline transcribe

How to test the Change

Go to Tools > ClassifAI > Language Processing > OpenAI Whisper and configure the feature
Upload a new audio file that meets the requirements
Go to the audio file and ensure the Description field has content
Test out the other ways to transcribe as explained above, using the media modal, using the single media view and using bulk edit options

Changelog Entry

Added - Automatically create transcripts of audio files using the OpenAI Whisper API

Credits

Props @dkotter

Checklist:

I agree to follow this project's Code of Conduct.
I have updated the documentation accordingly.
I have added tests to cover my change.
All new and existing tests pass.

…he result. Fire this when a new attachment is added

… button uses

…rate transcriptions

…ctions class to better support multiple providers, following what was done in #437

…er has access. Use this method anywhere we output our functionality. This fixes a bug our e2e tests found; thanks tests :)

iamdharmesh

@dkotter Thanks for the amazing work and very detailed information in the PR description. PR looks good to me and it's working fine.

Just added 2 minor notes to check and we are ready to merge this.

Thanks.

iamdharmesh · 2023-05-17T15:14:12Z

includes/Classifai/Providers/OpenAI/Whisper.php

+			return new WP_Error( 'not_enabled', esc_html__( 'Transcripts are not enabled.', 'classifai' ) );
+		}
+
+		return true;


I think we should also add a check for authenticated. without this when we don't have the API key saved and enable checkbox is checked it shows the "Transcribe audio" option in bulk actions.

// Check if valid authentication is in place. if ( empty( $settings ) || ( isset( $settings['authenticated'] ) && false === $settings['authenticated'] ) ) { return new WP_Error( 'auth', esc_html__( 'Please set up valid authentication with OpenAI.', 'classifai' ) ); }

iamdharmesh · 2023-05-17T15:25:07Z

src/js/media.js

+								textField.value = resp;
+							}
+						}
+					},


Suggested change

},

},

buttonText: __('Re-transcribe', 'classifai'),

To keep the button text "Re-transcribe" after the process is complete. Currently, it becomes "Rescan".

…is enabled. Ensure button text stays correct

dkotter added 8 commits May 3, 2023 10:41

Add initial structure for OpenAI Whisper integration

30549f6

Add a Transcribe class that handles making API requests and storing t…

c1e6c7d

…he result. Fire this when a new attachment is added

Add transcribe button to media modal. Add new REST endpoint that this…

6946e63

… button uses

Fix eslint errors

c64924e

Add metabox on single audio attachment pages that can be used to gene…

9591869

…rate transcriptions

Add bulk action support to transcribe audio files. Refactor our BulkA…

81bdf1a

…ctions class to better support multiple providers, following what was done in #437

Update docs

b0ee07a

Add tests

a208fcd

dkotter added this to the 2.2.0 milestone May 4, 2023

dkotter self-assigned this May 4, 2023

dkotter requested review from jeffpaul and a team as code owners May 4, 2023 18:56

Add new helper method to determine if the feature is enabled and a us…

bcec22a

…er has access. Use this method anywhere we output our functionality. This fixes a bug our e2e tests found; thanks tests :)

iamdharmesh previously approved these changes May 17, 2023

View reviewed changes

jeffpaul mentioned this pull request May 17, 2023

Release version 2.2.0 #457

Closed

16 tasks

dkotter added 2 commits May 18, 2023 11:05

Merge branch 'develop' into feature/openai-transcriptions

38e9389

Remove unneeded method. Add auth check when determining if a feature …

34d3d29

…is enabled. Ensure button text stays correct

dkotter dismissed iamdharmesh’s stale review via 34d3d29 May 18, 2023 17:12

dkotter merged commit 9d99bc1 into develop May 18, 2023

dkotter deleted the feature/openai-transcriptions branch May 18, 2023 17:17

dkotter mentioned this pull request Jun 20, 2023

Add WP-CLI command to bulk transcribe audio files #498

Closed

1 task

dkotter mentioned this pull request Jun 28, 2023

Add WP-CLI command to bulk process audio transcriptions #514

Merged

4 tasks

Sidsector9 mentioned this pull request Jul 13, 2023

Rendering of the "Re-Transcribe" button should independent of the "Generate transcripts from audio files" setting #534

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcribe audio files using the OpenAI Whisper (speech-to-text) API #451

Transcribe audio files using the OpenAI Whisper (speech-to-text) API #451

dkotter commented May 4, 2023 •

edited

Loading

iamdharmesh left a comment

iamdharmesh May 17, 2023

iamdharmesh May 17, 2023

Transcribe audio files using the OpenAI Whisper (speech-to-text) API #451

Transcribe audio files using the OpenAI Whisper (speech-to-text) API #451

Conversation

dkotter commented May 4, 2023 • edited Loading

Description of the Change

Workflow

How to test the Change

Changelog Entry

Credits

Checklist:

iamdharmesh left a comment

Choose a reason for hiding this comment

iamdharmesh May 17, 2023

Choose a reason for hiding this comment

iamdharmesh May 17, 2023

Choose a reason for hiding this comment

dkotter commented May 4, 2023 •

edited

Loading