-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image generation using OpenAI's DALL·E API #419
Conversation
… from the ChatGPT integration that isn't merged yet. Add a basic REST endpoint
… image tab has been added. This tab loads a text input and button that allows you to enter a prompt. This then fires a request to our custom endpoint and loads in generated images
…he admin. Enqueue our JS only on the pages we want.
…e values that are hardcoded that need to be dynamic in our script. Fix script loading
… previous items and prompt text when request is done. Adjust styling
… properly to the Media Library tab
…t duplicating that. Change our auth callback to use the cheapest model, since API keys work the same for all endpoints
…e this file is built and we load the built version. Add an argument to our REST endpoint to set the format of returned images. Modify our JS to use the base64 encoded images to avoid CORS issues
I agree with this rationale, but would add on to this that perhaps next to those
@joemcgill any advice from your end / experience with ML & Backbone?
I agree on both parts here. |
… use base64 encoded images. Fix a logic error in how we save our auth data. Add test to ensure disabling image generation works
…anel to see if this fixes an issue on trunk
…se this setting to limit functionality from loading and to limit access to the REST endpoint. Add a test around this new setting
…a library in one step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @dkotter, This looks great 🎉
Everything is working as expected and as mentioned in the PR description, I added a minor code suggestion to apply otherwise all looks good to me.
Thanks again for the great work here.
Hi @dkotter, ChatGPT and DALL-E both can work under the same API key of OpenAI, could we add some UI like a checkbox to share the same key between both services or maybe auto-fill the API key if any of one service has saved API key in settings? What do you think? Thanks |
Co-authored-by: Dharmesh Patel <dspatel44@gmail.com>
I've had a quick look at the approach here, and it generally seems to follow the extend/replace pattern for parts of the Backbone.js code that the media library is built from, so all looks good there. The one thing to be mindful of is that the Select media frame (i.e. |
…oviders has been setup, use that API key as the default for the other since both can use the same key. Add a note that we are prefilling that value from the other provider
Yeah, I had debated doing something like this but ended up not (it's a good idea though). I've gone ahead and moved more duplicate code to the OpenAI |
|
||
button { | ||
display: block; | ||
float: left; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicky but it would be great to remove float and get rid from it. we may use flex properties here to achive similar UI
.prompt { | ||
margin-right: 10px; | ||
padding: 1px 8px; | ||
width: 25%; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here i can see width is 25% but not seeing anything related to media queries, Are we not considering mobile versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great to me, though i have added few improvements from styling perspective
Description of the Change
This PR adds an integration with OpenAI as a new provider in the Image Processing service, specifically integrating with DALL·E. This integration utilizes DALL·E to generate one or more images from a user provided prompt, allowing you to then import those images into your Media Library and insert those in-content.
Closes #398.
Setup
Setup with this provider only requires an API key. There's validation done on the settings page, anytime settings are saved, to verify if the API key is valid. This will be the exact same as the validation described in #405.
Other than the API key, there's a few settings that can be modified. The most important is turning on the
Enable image generation
option. If a valid API key is added but this setting isn't on, no integration will happen. TheAllowed roles
setting lets you choose which roles are allowed to generate images. This list is filtered down to only include roles that also have theupload_media
capability. TheNumber of images
setting controls how many images will be generated in a single prompt. This can be set from 1 to 10, defaulting to 1. TheImage size
setting controls the size of the generated images. This can be set at 1024x1024, 512x512 or 256x256 (the only sizes supported by the API). Defaults to 1024x1024.Image generation
My initial thought was to integrate image generation directly into the Featured Image flow. This would allow easy creation of Featured Images before publishing. But after giving it more thought, I came to the decision that having this functionality in other places would also be nice (like inserting images in content).
I landed then on integrating this directly into the existing
Media Modal
flow. This supports both theFeatured Image
flow and any blocks that utilize the normalMedia Modal
(like the core Image block, core Media & Text block and core Cover block).API integration
When the media modal is loaded in and the
Generate images
tab is clicked, we show some helper text as well as a prompt input:Once a prompt is entered, a request is made to a new REST endpoint (
wp-json/classifai/v1/openai/generate-image
). This endpoint verifies the current user has permission to upload files, we are properly authenticated with OpenAI and theEnable image generation
setting is on.This endpoint then utilizes the
APIRequest
class to send a request to the DALL·E API, with the passed in prompt. We then parse that response, ensure it contains what we expect and then return that back. This data is then parsed out and the image(s) are rendered to the user.A primary
Import into Media Library
button and secondaryImport and insert
button will be shown beneath each image. Clicking on the first will import the image into your sites Media Library. The button then changes to saySelect image
. Clicking on that will send you to the normal Media Library tab in the media modal, with your image selected. This allows you to addalt
text or a caption or other details before finally inserting the image (either into the content or as a featured image). Clicking on the second imports the image and immediately sends you to the Media Library.I debated on this flow a bit and added the second button after feedback. The current approach allows someone to import multiple images before finally choosing one to insert. They then are sent to the normal Media Library screen which allows them to add
alt
text or other details (and is the same flow that happens when manually uploading an image). If they only want one image, they can click on that second button which skips a step.Reviewer notes
How to test the Change
A valid OpenAI API key is needed to fully test this feature. OpenAI does offer a free $5 credit for new users so if you haven't signed up before, you can sign up and get an API key.
ClassifAI > Image Processing > OpenAI
and paste in your API keyEnable image generation
setting. The other settings can be left default. Save changes and ensure no error message is shownGenerate images
tabChangelog Entry
Credits
Props @dkotter
Checklist: