Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcribe audio files using the OpenAI Whisper (speech-to-text) API #451

Merged
merged 11 commits into from
May 18, 2023

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented May 4, 2023

Description of the Change

This PR introduces a new integration with the OpenAI Whisper (speech-to-text) API. This integration will automatically create a transcript for any supported audio files and store that transcript as the post_content for the item (shows in the Description field).

Workflow

A new settings section is added under Tools > ClassifAI > Language Processing > OpenAI Whisper. Here there are three options to choose from:

  1. Enter your API key
  2. Turn the transcription feature on
  3. Choose which roles have access

OpenAI Whisper settings

Once configured, whenever a valid audio file is uploaded (must be under 25 MB and the file type has to be one of: mp3, mp4, mpeg, mpga, m4a, wav, or webm) we send that file to the Whisper API and if we get a successful response back, we'll store the transcript as the post_content of the attachment item.

For existing audio items, there's a few options to generate transcripts. You can go to the Media Library grid view and click on an audio file. Within the modal that pops up, there will be an option there to (Re-) Transcribe:

Transcribe audio file in media modal

You can also go to the single media view and there will be a custom metabox with an option to (Re-) Transcribe the item. Check the box and then save the item:

Transcribe audio file in single view

If you prefer using the Media Model list view, there's a Transcribe audio bulk edit option as well as an inline Transcribe option:

Bulk transcribe Inline transcribe
Bulk transcribe Inline transcribe

How to test the Change

  1. Go to Tools > ClassifAI > Language Processing > OpenAI Whisper and configure the feature
  2. Upload a new audio file that meets the requirements
  3. Go to the audio file and ensure the Description field has content
  4. Test out the other ways to transcribe as explained above, using the media modal, using the single media view and using bulk edit options

Changelog Entry

Added - Automatically create transcripts of audio files using the OpenAI Whisper API

Credits

Props @dkotter

Checklist:

  • I agree to follow this project's Code of Conduct.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests pass.

@dkotter dkotter added this to the 2.2.0 milestone May 4, 2023
@dkotter dkotter self-assigned this May 4, 2023
@dkotter dkotter requested review from jeffpaul and a team as code owners May 4, 2023 18:56
…er has access. Use this method anywhere we output our functionality. This fixes a bug our e2e tests found; thanks tests :)
iamdharmesh
iamdharmesh previously approved these changes May 17, 2023
Copy link
Member

@iamdharmesh iamdharmesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkotter Thanks for the amazing work and very detailed information in the PR description. PR looks good to me and it's working fine.

Just added 2 minor notes to check and we are ready to merge this.

Thanks.

return new WP_Error( 'not_enabled', esc_html__( 'Transcripts are not enabled.', 'classifai' ) );
}

return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add a check for authenticated. without this when we don't have the API key saved and enable checkbox is checked it shows the "Transcribe audio" option in bulk actions.

// Check if valid authentication is in place.
if ( empty( $settings ) || ( isset( $settings['authenticated'] ) && false === $settings['authenticated'] ) ) {
	return new WP_Error( 'auth', esc_html__( 'Please set up valid authentication with OpenAI.', 'classifai' ) );
}

textField.value = resp;
}
}
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
},
},
buttonText: __('Re-transcribe', 'classifai'),

To keep the button text "Re-transcribe" after the process is complete. Currently, it becomes "Rescan".

@jeffpaul jeffpaul mentioned this pull request May 17, 2023
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants