Skip to content

Commit

Permalink
Merge pull request #451 from 10up/feature/openai-transcriptions
Browse files Browse the repository at this point in the history
Transcribe audio files using the OpenAI Whisper (speech-to-text) API
  • Loading branch information
dkotter committed May 18, 2023
2 parents 7a8d8cb + 34d3d29 commit 9d99bc1
Show file tree
Hide file tree
Showing 18 changed files with 1,071 additions and 113 deletions.
58 changes: 44 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@
* [Installation](#installation)
* [Register ClassifAI account](#register-classifai-account)
* [Set Up NLU Language Processing](#set-up-language-processing-via-ibm-watson)
* [Set Up ChatGPT Language Processing](#set-up-language-processing-via-openai)
* [Set Up OpenAI ChatGPT Language Processing](#set-up-language-processing-via-openai-chatgpt)
* [Set Up OpenAI Whisper Language Processing](#set-up-language-processing-via-openai-whisper)
* [Set Up Computer Vision Image Processing](#set-up-image-processing-via-microsoft-azure)
* [Set Up DALL·E Image Processing](#set-up-image-processing-via-openai)
* [Set Up OpenAI DALL·E Image Processing](#set-up-image-processing-via-openai)
* [Set Up Recommended Content](#set-up-recommended-content-via-microsoft-azure-personalizer)
* [WP CLI Commands](#wp-cli-commands)
* [FAQs](#frequently-asked-questions)
Expand All @@ -27,6 +28,7 @@

* Automatically generate a summary of your content and store that as an excerpt using [OpenAI's ChatGPT](https://platform.openai.com/docs/guides/chat)
* Generate new images on demand to use in-content or as a featured image using [OpenAI's DALL·E](https://platform.openai.com/docs/guides/images)
* Automatically generate transcripts of your audio files using [OpenAI's Whisper](https://platform.openai.com/docs/guides/speech-to-text)
* Classify your content using [IBM Watson's Natural Language Understanding API](https://www.ibm.com/watson/services/natural-language-understanding/) and [Microsoft Azure's Computer Vision API](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/)
* Supports Watson's [Categories](https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-about#categories), [Keywords](https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-about#keywords), [Concepts](https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-about#concepts) & [Entities](https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-about#entities) and Azure's [Describe Image](https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/56f91f2e778daf14a499e1fe)
* Automatically classify content and images on save
Expand All @@ -36,9 +38,9 @@
* BETA: Recommend content based on overall site traffic via [Azure Personalizer](https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/) (note that we're gathering feedback on this feature and may significantly iterate depending on community input)
* Bulk classify content with [WP-CLI](https://wp-cli.org/)

| Language Processing - Tagging | Recommended Content | Excerpt Generation |
| :-: | :-: | :-: |
| ![Screenshot of ClassifAI post tagging](assets/img/screenshot-1.png "Example of a Block Editor post with Watson Categories, Keywords, Concepts, and Entities.") | ![Screenshot of ClassifAI recommended content](assets/img/screenshot-2.png "Example of a Recommended Content Block with Azure Personalizer.") | ![Screenshot of ClassifAI excerpt generation](assets/img/screenshot-7.png "Example of automatic excerpt generation with OpenAI.") |
| Language Processing - Tagging | Recommended Content | Excerpt Generation | Audio Transcripts |
| :-: | :-: | :-: | :-: |
| ![Screenshot of ClassifAI post tagging](assets/img/screenshot-1.png "Example of a Block Editor post with Watson Categories, Keywords, Concepts, and Entities.") | ![Screenshot of ClassifAI recommended content](assets/img/screenshot-2.png "Example of a Recommended Content Block with Azure Personalizer.") | ![Screenshot of ClassifAI excerpt generation](assets/img/screenshot-7.png "Example of automatic excerpt generation with OpenAI.") | ![Screenshot of ClassifAI audio transcript generation](assets/img/screenshot-9.png "Example of automatic audio transcript generation with OpenAI.") |

| Image Processing - Alt Text | Image Processing - Smart Cropping | Image Processing - Tagging | Image Processing - Generate Images |
| :-: | :-: | :-: | :-: |
Expand All @@ -49,7 +51,7 @@
* PHP 7.4+
* [WordPress](http://wordpress.org) 5.7+
* To utilize the NLU Language Processing functionality, you will need an active [IBM Watson](https://cloud.ibm.com/registration) account.
* To utilize the ChatGPT Language Processing functionality or DALL·E Image Processing functionality, you will need an active [OpenAI](https://platform.openai.com/signup) account.
* To utilize the ChatGPT or Whisper Language Processing functionality or DALL·E Image Processing functionality, you will need an active [OpenAI](https://platform.openai.com/signup) account.
* To utilize the Computer Vision Image Processing functionality, you will need an active [Microsoft Azure](https://signup.azure.com/signup) account.

## Pricing
Expand All @@ -58,7 +60,7 @@ Note that there is no cost to using ClassifAI itself. Both IBM Watson and Micros

The service that powers ClassifAI's NLU Language Processing, IBM Watson's Natural Language Understanding ("NLU"), has a ["lite" pricing tier](https://www.ibm.com/cloud/watson-natural-language-understanding/pricing) that offers 30,000 free NLU items per month.

The service that powers ClassifAI's ChatGPT Language Processing and DALL·E Image Processing, OpenAI, has a limited free trial and then requires a [pay per usage](https://openai.com/pricing) plan.
The service that powers ClassifAI's ChatGPT and Whisper Language Processing and DALL·E Image Processing, OpenAI, has a limited free trial and then requires a [pay per usage](https://openai.com/pricing) plan.

The service that powers ClassifAI's Computer Vision Image Processing, Microsoft Azure, has a ["free" pricing tier](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/computer-vision/) that offers 20 transactions per minute and 5,000 transactions per month.

Expand Down Expand Up @@ -134,7 +136,7 @@ ClassifAI is a sophisticated solution that we want organizations of all shapes a
- Check for an email from `ClassifAI Team` which contains the registration key.
- Note that the email will be sent from `opensource@10up.com`, so please whitelist this email address if needed.

### 2. Configure ClassifAI Registration Key under ClassifAI > ClassifAI
### 2. Configure ClassifAI Registration Key under Tools > ClassifAI

- In the `Registered Email` field, enter the email you used for registration.
- In the `Registration Key` field, enter the registration key from the email in step 1 above.
Expand All @@ -150,7 +152,7 @@ ClassifAI is a sophisticated solution that we want organizations of all shapes a
- Log into your account (accepting the privacy policy) and create a new [*Natural Language Understanding*](https://cloud.ibm.com/catalog/services/natural-language-understanding) Resource if you do not already have one. It may take a minute for your account to fully populate with the default resource group to use.
- Click `Manage` in the left hand menu, then `Show credentials` on the Manage page to view the credentials for this resource.

### 2. Configure IBM Watson API Keys under ClassifAI > Language Processing > IBM Watson
### 2. Configure IBM Watson API Keys under Tools > ClassifAI > Language Processing > IBM Watson

**The credentials screen will show either an API key or a username/password combination.**

Expand Down Expand Up @@ -178,7 +180,7 @@ For more information, see https://cloud.ibm.com/docs/watson?topic=watson-endpoin

### 4. Save a Post/Page/CPT or run WP CLI command to batch classify your content

## Set Up Language Processing (via OpenAI)
## Set Up Language Processing (via OpenAI ChatGPT)

### 1. Sign up for OpenAI

Expand All @@ -187,7 +189,7 @@ For more information, see https://cloud.ibm.com/docs/watson?topic=watson-endpoin
* Log into your account and go to the [API key page](https://platform.openai.com/account/api-keys).
* Click `Create new secret key` and copy the key that is shown.

### 2. Configure OpenAI API Keys under ClassifAI > Language Processing > OpenAI
### 2. Configure OpenAI API Keys under Tools > ClassifAI > Language Processing > OpenAI ChatGPT

* Enter your API Key copied from the above step into the `API Key` field.

Expand All @@ -203,6 +205,34 @@ For more information, see https://cloud.ibm.com/docs/watson?topic=watson-endpoin
* Ensure this item has content saved.
* Open the Excerpt panel in the sidebar and click on `Generate Excerpt`

## Set Up Language Processing (via OpenAI Whisper)

Note that [OpenAI](https://platform.openai.com/docs/guides/speech-to-text) can create a transcript for audio files that meet the following requirements:
* The file must be presented in mp3, mp4, mpeg, mpga, m4a, wav, or webm format
* The file size must be less than 25 megabytes (MB)

### 1. Sign up for OpenAI

* [Sign up for an OpenAI account](https://platform.openai.com/signup) or sign into your existing one.
* If creating a new account, complete the verification process (requires confirming your email and phone number).
* Log into your account and go to the [API key page](https://platform.openai.com/account/api-keys).
* Click `Create new secret key` and copy the key that is shown.

### 2. Configure OpenAI API Keys under Tools > ClassifAI > Language Processing > OpenAI Whisper

* Enter your API Key copied from the above step into the `API Key` field.

### 3. Enable specific features

* Choose to enable the ability to automatically generate transcripts from supported audio files.
* Choose which user roles have access to this ability.
* Save changes and ensure a success message is shown. An error will show if API authentication fails.

### 4. Upload a new audio file

* Upload a new audio file.
* Check to make sure the transcript was stored in the Description field.

## Set Up Image Processing (via Microsoft Azure)

Note that [Computer Vision](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/home#image-requirements) can analyze and crop images that meet the following requirements:
Expand All @@ -218,7 +248,7 @@ Note that [Computer Vision](https://docs.microsoft.com/en-us/azure/cognitive-ser
- Click `Keys and Endpoint` in the left hand Resource Management menu to view the `Endpoint` URL for this resource.
- Click the copy icon next to `KEY 1` to copy the API Key credential for this resource.

### 2. Configure Microsoft Azure API and Key under ClassifAI > Image Processing
### 2. Configure Microsoft Azure API and Key under Tools > ClassifAI > Image Processing

- In the `Endpoint URL` field, enter your `API endpoint`.
- In the `API Key` field, enter your `KEY 1`.
Expand All @@ -239,7 +269,7 @@ Note that [Computer Vision](https://docs.microsoft.com/en-us/azure/cognitive-ser
* Log into your account and go to the [API key page](https://platform.openai.com/account/api-keys).
* Click `Create new secret key` and copy the key that is shown.

### 2. Configure OpenAI API Keys under ClassifAI > Image Processing > OpenAI
### 2. Configure OpenAI API Keys under Tools > ClassifAI > Image Processing > OpenAI

* Enter your API Key copied from the above step into the `API Key` field.

Expand Down Expand Up @@ -274,7 +304,7 @@ Note that [Personalizer](https://azure.microsoft.com/en-us/services/cognitive-se

For more information, see https://docs.microsoft.com/en-us/azure/cognitive-services/personalizer/how-to-create-resource

### 2. Configure Microsoft Azure API and Key under ClassifAI > Recommended Content
### 2. Configure Microsoft Azure API and Key under Tools > ClassifAI > Recommended Content

- In the `Endpoint URL` field, enter your `Endpoint` URL from Step 1 above.
- In the `API Key` field, enter your `KEY 1` from Step 1 above.
Expand Down
Binary file modified assets/img/screenshot-6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/img/screenshot-9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 9d99bc1

Please sign in to comment.