Image generation using OpenAI's DALL·E API #419

dkotter · 2023-03-23T02:45:29Z

Description of the Change

This PR adds an integration with OpenAI as a new provider in the Image Processing service, specifically integrating with DALL·E. This integration utilizes DALL·E to generate one or more images from a user provided prompt, allowing you to then import those images into your Media Library and insert those in-content.

Closes #398.

Setup

Setup with this provider only requires an API key. There's validation done on the settings page, anytime settings are saved, to verify if the API key is valid. This will be the exact same as the validation described in #405.

Other than the API key, there's a few settings that can be modified. The most important is turning on the Enable image generation option. If a valid API key is added but this setting isn't on, no integration will happen. The Allowed roles setting lets you choose which roles are allowed to generate images. This list is filtered down to only include roles that also have the upload_media capability. The Number of images setting controls how many images will be generated in a single prompt. This can be set from 1 to 10, defaulting to 1. The Image size setting controls the size of the generated images. This can be set at 1024x1024, 512x512 or 256x256 (the only sizes supported by the API). Defaults to 1024x1024.

Image generation

My initial thought was to integrate image generation directly into the Featured Image flow. This would allow easy creation of Featured Images before publishing. But after giving it more thought, I came to the decision that having this functionality in other places would also be nice (like inserting images in content).

I landed then on integrating this directly into the existing Media Modal flow. This supports both the Featured Image flow and any blocks that utilize the normal Media Modal (like the core Image block, core Media & Text block and core Cover block).

API integration

When the media modal is loaded in and the Generate images tab is clicked, we show some helper text as well as a prompt input:

Once a prompt is entered, a request is made to a new REST endpoint (wp-json/classifai/v1/openai/generate-image). This endpoint verifies the current user has permission to upload files, we are properly authenticated with OpenAI and the Enable image generation setting is on.

This endpoint then utilizes the APIRequest class to send a request to the DALL·E API, with the passed in prompt. We then parse that response, ensure it contains what we expect and then return that back. This data is then parsed out and the image(s) are rendered to the user.

A primary Import into Media Library button and secondary Import and insert button will be shown beneath each image. Clicking on the first will import the image into your sites Media Library. The button then changes to say Select image. Clicking on that will send you to the normal Media Library tab in the media modal, with your image selected. This allows you to add alt text or a caption or other details before finally inserting the image (either into the content or as a featured image). Clicking on the second imports the image and immediately sends you to the Media Library.

I debated on this flow a bit and added the second button after feedback. The current approach allows someone to import multiple images before finally choosing one to insert. They then are sent to the normal Media Library screen which allows them to add alt text or other details (and is the same flow that happens when manually uploading an image). If they only want one image, they can click on that second button which skips a step.

Reviewer notes

The JS integration into the media modal follows (or tries to follow) the approach WP core uses with Backbone.js. I'm sure there improvements that can be done there but seems to work as intended
I think there for sure could be some text tweaks and UX/design tweaks here, so happy to hear any feedback on those items
There is no integration into the Classic Editor or directly into the Media Library for now. I think the latter is interesting as a follow-up, the former we can probably hold off on unless we get user feedback requesting it

How to test the Change

A valid OpenAI API key is needed to fully test this feature. OpenAI does offer a free $5 credit for new users so if you haven't signed up before, you can sign up and get an API key.

Log in to your OpenAI account and go to your API key section. Generate a new API key there and copy it
Go to ClassifAI > Image Processing > OpenAI and paste in your API key
Turn on the Enable image generation setting. The other settings can be left default. Save changes and ensure no error message is shown
Create a new post (or edit an existing one) and either insert an Image block and click on the Media Library button or click on the Featured Image panel
Within the modal, click on the Generate images tab
Type in a prompt and hit enter (or press the button)
Ensure images are generated (should be the number of images you had selected in settings)
Click the button underneath the image to import it into your Media Library. Click the button again to go to the Media Library and insert the image into the content
Ensure no errors happen in any step

Changelog Entry

Added - Generate images using OpenAI's DALL·E API

Credits

Props @dkotter

Checklist:

I agree to follow this project's Code of Conduct.
I have updated the documentation accordingly.
I have added tests to cover my change.
All new and existing tests pass.

… from the ChatGPT integration that isn't merged yet. Add a basic REST endpoint

… image tab has been added. This tab loads a text input and button that allows you to enter a prompt. This then fires a request to our custom endpoint and loads in generated images

…he admin. Enqueue our JS only on the pages we want.

…e values that are hardcoded that need to be dynamic in our script. Fix script loading

… previous items and prompt text when request is done. Adjust styling

…a bit

… properly to the Media Library tab

…t duplicating that. Change our auth callback to use the cheapest model, since API keys work the same for all endpoints

…e this file is built and we load the built version. Add an argument to our REST endpoint to set the format of returned images. Modify our JS to use the base64 encoded images to avoid CORS issues

jeffpaul · 2023-03-23T14:00:02Z

Import > Insert

I debated on this flow a bit and can definitely make changes (as it does require a number of clicks to actually insert an image). The reason I did this is so if someone likes multiple images that are generated, they can import each of those individually before finally selecting one image to insert. If the import and insert process happened at the same time, we would save the user an extra click but they wouldn't be able to import more than one image.

I agree with this rationale, but would add on to this that perhaps next to those Import into Media Library we could add a secondary action link of Import into Media Library and Insert into Post (or perhaps shorter Import and Insert) to give folks the quickest route to "done" assuming they only want to select one of the images.

Attribution
Perhaps we include a default caption of "Image generated by OpenAI's DALL·E / <whatever default license DALL·E applies>"?
Media Library integration

The JS integration into the media modal follows (or tries to follow) the approach WP core uses with Backbone.js. I'm sure there improvements that can be done there but seems to work as intended

@joemcgill any advice from your end / experience with ML & Backbone?

Classic Editor & direct ML integration

There is no integration into the Classic Editor or directly into the Media Library for now. I think the latter is interesting as a follow-up, the former we can probably hold off on unless we get user feedback requesting it

I agree on both parts here.

… use base64 encoded images. Fix a logic error in how we save our auth data. Add test to ensure disabling image generation works

…anel to see if this fixes an issue on trunk

…se this setting to limit functionality from loading and to limit access to the REST endpoint. Add a test around this new setting

…a library in one step

iamdharmesh

Thanks for adding this @dkotter, This looks great 🎉

Everything is working as expected and as mentioned in the PR description, I added a minor code suggestion to apply otherwise all looks good to me.

Thanks again for the great work here.

includes/Classifai/Providers/OpenAI/ChatGPT.php

iamdharmesh · 2023-03-28T15:22:34Z

Hi @dkotter,

ChatGPT and DALL-E both can work under the same API key of OpenAI, could we add some UI like a checkbox to share the same key between both services or maybe auto-fill the API key if any of one service has saved API key in settings? What do you think?

Thanks

Co-authored-by: Dharmesh Patel <dspatel44@gmail.com>

joemcgill · 2023-03-28T20:47:57Z

@joemcgill any advice from your end / experience with ML & Backbone?

I've had a quick look at the approach here, and it generally seems to follow the extend/replace pattern for parts of the Backbone.js code that the media library is built from, so all looks good there. The one thing to be mindful of is that the Select media frame (i.e. wp.media.view.MediaFrame.Select) is not consistently used in all UI instances. For example the image gallery still uses the Post frame, so you may need to do some double checking that all your intended use cases are covered.

…oviders has been setup, use that API key as the default for the other since both can use the same key. Add a note that we are prefilling that value from the other provider

dkotter · 2023-03-28T21:29:28Z

Hi @dkotter,

ChatGPT and DALL-E both can work under the same API key of OpenAI, could we add some UI like a checkbox to share the same key between both services or maybe auto-fill the API key if any of one service has saved API key in settings? What do you think?

Thanks

Yeah, I had debated doing something like this but ended up not (it's a good idea though).

I've gone ahead and moved more duplicate code to the OpenAI trait and modified the API handling a bit so if you have a valid API key already (for instance, you've set up ChatGPT but not DALL·E yet) it will autofill with that key and give you a message so you know what's going on: 6018638

src/scss/language-processing.scss

mehidi258 · 2023-04-06T08:06:45Z

src/scss/language-processing.scss

+
+			button {
+				display: block;
+				float: left;


Nitpicky but it would be great to remove float and get rid from it. we may use flex properties here to achive similar UI

mehidi258 · 2023-04-06T08:08:01Z

src/scss/language-processing.scss

+		.prompt {
+			margin-right: 10px;
+			padding: 1px 8px;
+			width: 25%;


Here i can see width is 25% but not seeing anything related to media queries, Are we not considering mobile versions?

mehidi258

Overall looks great to me, though i have added few improvements from styling perspective

dkotter added 21 commits March 15, 2023 16:09

Add initial integration with DALLE. Add base class and copy over code…

65ed99e

… from the ChatGPT integration that isn't merged yet. Add a basic REST endpoint

Add proper permission callback to our image gen endpoint

e3125b7

Add two more settings in the admin and in our REST endpoint

01e726c

Update option key that stores last response

7aed8ed

Initial work on integrating into the media modal flow. A new Generate…

320f142

… image tab has been added. This tab loads a text input and button that allows you to enter a prompt. This then fires a request to our custom endpoint and loads in generated images

Connect up our REST endpoint to the OpenAI API

d13e41c

Allow params passed in to the REST endpoint to override settings in t…

a758283

…he admin. Enqueue our JS only on the pages we want.

Start moving templates out of our JS file for easier managing. Replac…

2f1f2da

…e values that are hardcoded that need to be dynamic in our script. Fix script loading

Add a loading state. Disable button when a request happens. Clear out…

5c53d97

… previous items and prompt text when request is done. Adjust styling

Wire up the import button

4a2522e

Merge branch 'develop' into feature/openai-images

91fb9a8

Add loading state for image import. Handle error state. Tweak styles …

2c244ed

…a bit

After an image is imported and a user wants to insert it, switch back…

7557f28

… properly to the Media Library tab

Add prompt helper text. Update button text

b75816a

Add a shared OpenAI trait and move shared code into there so we aren'…

280cdbd

…t duplicating that. Change our auth callback to use the cheapest model, since API keys work the same for all endpoints

Add first pass at E2E tests. Adjust styling a bit

8227ac4

Move our JS code into individual files instead of one big file. Ensur…

ba3a135

…e this file is built and we load the built version. Add an argument to our REST endpoint to set the format of returned images. Modify our JS to use the base64 encoded images to avoid CORS issues

Fix lint errors

97c8a8f

Update readmes

9c72481

Merge branch 'develop' into feature/openai-images

f74d623

Add better error handling. Adjust some text a bit

afb70ca

dkotter self-assigned this Mar 23, 2023

dkotter added 2 commits March 22, 2023 20:57

Fix VIP PHPCS issue and try and fix failing test

e715018

Fix typo. Ensure we mock our new auth endpoint correctly

53597ac

jeffpaul added this to the 1.9.0 milestone Mar 23, 2023

dkotter added 4 commits March 23, 2023 09:29

Fix the data our test plugin uses to match what we expect now that we…

b30d9b3

… use base64 encoded images. Fix a logic error in how we save our auth data. Add test to ensure disabling image generation works

Change our tests to trigger the media modal from the featured image p…

937ddf2

…anel to see if this fixes an issue on trunk

Add a setting to choose which roles are allowed to generate images. U…

edecbb2

…se this setting to limit functionality from loading and to limit access to the REST endpoint. Add a test around this new setting

Add another import button that both imports and sends you to the medi…

56a8f3b

…a library in one step

dkotter requested review from jeffpaul and a team as code owners March 23, 2023 18:45

dkotter requested review from a team and iamdharmesh and removed request for a team March 24, 2023 20:58

iamdharmesh previously approved these changes Mar 28, 2023

View reviewed changes

includes/Classifai/Providers/OpenAI/ChatGPT.php Outdated Show resolved Hide resolved

dkotter dismissed iamdharmesh’s stale review via ec53133 March 28, 2023 17:28

Make sure auth settings save correctly for ChatGPT

ec53133

Co-authored-by: Dharmesh Patel <dspatel44@gmail.com>

Move more duplicate code to our OpenAI trait. If one of the OpenAI pr…

6018638

…oviders has been setup, use that API key as the default for the other since both can use the same key. Add a note that we are prefilling that value from the other provider

dkotter requested a review from iamdharmesh March 28, 2023 21:29

iamdharmesh approved these changes Mar 29, 2023

View reviewed changes

dkotter merged commit 1772cf9 into develop Mar 29, 2023

dkotter deleted the feature/openai-images branch March 29, 2023 19:50

This was referenced Mar 30, 2023

Content Generation #398

Closed

Add default attribution to generated images #429

Closed

Use proper plurals based on image settings #430

Closed

Add new filters to provide more granular control over new OpenAI features #432

Closed