Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image generation using OpenAI's DALL·E API #419

Merged
merged 29 commits into from
Mar 29, 2023
Merged

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Mar 23, 2023

Description of the Change

This PR adds an integration with OpenAI as a new provider in the Image Processing service, specifically integrating with DALL·E. This integration utilizes DALL·E to generate one or more images from a user provided prompt, allowing you to then import those images into your Media Library and insert those in-content.

Closes #398.

Setup

Setup with this provider only requires an API key. There's validation done on the settings page, anytime settings are saved, to verify if the API key is valid. This will be the exact same as the validation described in #405.

Other than the API key, there's a few settings that can be modified. The most important is turning on the Enable image generation option. If a valid API key is added but this setting isn't on, no integration will happen. The Allowed roles setting lets you choose which roles are allowed to generate images. This list is filtered down to only include roles that also have the upload_media capability. The Number of images setting controls how many images will be generated in a single prompt. This can be set from 1 to 10, defaulting to 1. The Image size setting controls the size of the generated images. This can be set at 1024x1024, 512x512 or 256x256 (the only sizes supported by the API). Defaults to 1024x1024.

DALL·E settings

Image generation

My initial thought was to integrate image generation directly into the Featured Image flow. This would allow easy creation of Featured Images before publishing. But after giving it more thought, I came to the decision that having this functionality in other places would also be nice (like inserting images in content).

I landed then on integrating this directly into the existing Media Modal flow. This supports both the Featured Image flow and any blocks that utilize the normal Media Modal (like the core Image block, core Media & Text block and core Cover block).

New tab in Media Modal

API integration

When the media modal is loaded in and the Generate images tab is clicked, we show some helper text as well as a prompt input:

Generate images prompt

Once a prompt is entered, a request is made to a new REST endpoint (wp-json/classifai/v1/openai/generate-image). This endpoint verifies the current user has permission to upload files, we are properly authenticated with OpenAI and the Enable image generation setting is on.

Sending prompt

This endpoint then utilizes the APIRequest class to send a request to the DALL·E API, with the passed in prompt. We then parse that response, ensure it contains what we expect and then return that back. This data is then parsed out and the image(s) are rendered to the user.

Generated images

A primary Import into Media Library button and secondary Import and insert button will be shown beneath each image. Clicking on the first will import the image into your sites Media Library. The button then changes to say Select image. Clicking on that will send you to the normal Media Library tab in the media modal, with your image selected. This allows you to add alt text or a caption or other details before finally inserting the image (either into the content or as a featured image). Clicking on the second imports the image and immediately sends you to the Media Library.

I debated on this flow a bit and added the second button after feedback. The current approach allows someone to import multiple images before finally choosing one to insert. They then are sent to the normal Media Library screen which allows them to add alt text or other details (and is the same flow that happens when manually uploading an image). If they only want one image, they can click on that second button which skips a step.

Reviewer notes

  • The JS integration into the media modal follows (or tries to follow) the approach WP core uses with Backbone.js. I'm sure there improvements that can be done there but seems to work as intended
  • I think there for sure could be some text tweaks and UX/design tweaks here, so happy to hear any feedback on those items
  • There is no integration into the Classic Editor or directly into the Media Library for now. I think the latter is interesting as a follow-up, the former we can probably hold off on unless we get user feedback requesting it

How to test the Change

A valid OpenAI API key is needed to fully test this feature. OpenAI does offer a free $5 credit for new users so if you haven't signed up before, you can sign up and get an API key.

  1. Log in to your OpenAI account and go to your API key section. Generate a new API key there and copy it
  2. Go to ClassifAI > Image Processing > OpenAI and paste in your API key
  3. Turn on the Enable image generation setting. The other settings can be left default. Save changes and ensure no error message is shown
  4. Create a new post (or edit an existing one) and either insert an Image block and click on the Media Library button or click on the Featured Image panel
  5. Within the modal, click on the Generate images tab
  6. Type in a prompt and hit enter (or press the button)
  7. Ensure images are generated (should be the number of images you had selected in settings)
  8. Click the button underneath the image to import it into your Media Library. Click the button again to go to the Media Library and insert the image into the content
  9. Ensure no errors happen in any step

Changelog Entry

Added - Generate images using OpenAI's DALL·E API

Credits

Props @dkotter

Checklist:

  • I agree to follow this project's Code of Conduct.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests pass.

… from the ChatGPT integration that isn't merged yet. Add a basic REST endpoint
… image tab has been added. This tab loads a text input and button that allows you to enter a prompt. This then fires a request to our custom endpoint and loads in generated images
…he admin. Enqueue our JS only on the pages we want.
…e values that are hardcoded that need to be dynamic in our script. Fix script loading
… previous items and prompt text when request is done. Adjust styling
…t duplicating that. Change our auth callback to use the cheapest model, since API keys work the same for all endpoints
…e this file is built and we load the built version. Add an argument to our REST endpoint to set the format of returned images. Modify our JS to use the base64 encoded images to avoid CORS issues
@dkotter dkotter self-assigned this Mar 23, 2023
@jeffpaul
Copy link
Member

  1. Import > Insert

I debated on this flow a bit and can definitely make changes (as it does require a number of clicks to actually insert an image). The reason I did this is so if someone likes multiple images that are generated, they can import each of those individually before finally selecting one image to insert. If the import and insert process happened at the same time, we would save the user an extra click but they wouldn't be able to import more than one image.

I agree with this rationale, but would add on to this that perhaps next to those Import into Media Library we could add a secondary action link of Import into Media Library and Insert into Post (or perhaps shorter Import and Insert) to give folks the quickest route to "done" assuming they only want to select one of the images.

  1. Attribution
    Perhaps we include a default caption of "Image generated by OpenAI's DALL·E / <whatever default license DALL·E applies>"?

  2. Media Library integration

The JS integration into the media modal follows (or tries to follow) the approach WP core uses with Backbone.js. I'm sure there improvements that can be done there but seems to work as intended

@joemcgill any advice from your end / experience with ML & Backbone?

  1. Classic Editor & direct ML integration

There is no integration into the Classic Editor or directly into the Media Library for now. I think the latter is interesting as a follow-up, the former we can probably hold off on unless we get user feedback requesting it

I agree on both parts here.

@jeffpaul jeffpaul added this to the 1.9.0 milestone Mar 23, 2023
… use base64 encoded images. Fix a logic error in how we save our auth data. Add test to ensure disabling image generation works
…se this setting to limit functionality from loading and to limit access to the REST endpoint. Add a test around this new setting
@dkotter dkotter requested review from jeffpaul and a team as code owners March 23, 2023 18:45
@dkotter dkotter requested review from a team and iamdharmesh and removed request for a team March 24, 2023 20:58
iamdharmesh
iamdharmesh previously approved these changes Mar 28, 2023
Copy link
Member

@iamdharmesh iamdharmesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @dkotter, This looks great 🎉

Everything is working as expected and as mentioned in the PR description, I added a minor code suggestion to apply otherwise all looks good to me.

Thanks again for the great work here.

includes/Classifai/Providers/OpenAI/ChatGPT.php Outdated Show resolved Hide resolved
@iamdharmesh
Copy link
Member

Hi @dkotter,

ChatGPT and DALL-E both can work under the same API key of OpenAI, could we add some UI like a checkbox to share the same key between both services or maybe auto-fill the API key if any of one service has saved API key in settings? What do you think?

Thanks

Co-authored-by: Dharmesh Patel <dspatel44@gmail.com>
@joemcgill
Copy link

@joemcgill any advice from your end / experience with ML & Backbone?

I've had a quick look at the approach here, and it generally seems to follow the extend/replace pattern for parts of the Backbone.js code that the media library is built from, so all looks good there. The one thing to be mindful of is that the Select media frame (i.e. wp.media.view.MediaFrame.Select) is not consistently used in all UI instances. For example the image gallery still uses the Post frame, so you may need to do some double checking that all your intended use cases are covered.

…oviders has been setup, use that API key as the default for the other since both can use the same key. Add a note that we are prefilling that value from the other provider
@dkotter
Copy link
Collaborator Author

dkotter commented Mar 28, 2023

Hi @dkotter,

ChatGPT and DALL-E both can work under the same API key of OpenAI, could we add some UI like a checkbox to share the same key between both services or maybe auto-fill the API key if any of one service has saved API key in settings? What do you think?

Thanks

Yeah, I had debated doing something like this but ended up not (it's a good idea though).

I've gone ahead and moved more duplicate code to the OpenAI trait and modified the API handling a bit so if you have a valid API key already (for instance, you've set up ChatGPT but not DALL·E yet) it will autofill with that key and give you a message so you know what's going on: 6018638


button {
display: block;
float: left;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicky but it would be great to remove float and get rid from it. we may use flex properties here to achive similar UI

.prompt {
margin-right: 10px;
padding: 1px 8px;
width: 25%;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here i can see width is 25% but not seeing anything related to media queries, Are we not considering mobile versions?

Copy link

@mehidi258 mehidi258 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great to me, though i have added few improvements from styling perspective

@jeffpaul jeffpaul mentioned this pull request Apr 6, 2023
1 task
@dkotter dkotter mentioned this pull request Apr 26, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Content Generation
5 participants