[Feature] Adds Image Generation Capability 2.0 #1907

WaelKarkoub · 2024-03-07T19:17:44Z

@BeibinLi @rickyloynd-microsoft @ekzhu I created this PR because the other PR (#1874) branch was based on my fork, which won't allow me to run openai tests. Closing #1874

Why are these changes needed?

Proof of concept to use agent capabilities as a way of multimodal communication. I found trying to extend agent capabilities to include multimodal interactions, without undertaking extensive refactoring, difficult.

I went with a modular approach, treating different modalities as distinct agent capabilities. This strategy streamlines the integration of multimodal functions and enhances the versatility of "simple" agents with minimal adjustments to the existing architecture.

For this PR, I experimented with image generation, since I've seen quite a bit of great work already done by @BeibinLi. The idea is that the user can add the ability to generate images to any of their existing agents. I architected this code by creating an abstract class called ImageGenerator, where the user can implement their image generator from their favorite API provider (there's an example for dalle DalleImageGenerator). All the user has to do now is pass in the generator they like to ImageGeneration (the agent's ability to generate images) and add the ability to the agent.

The way ImageGeneration works is by adding a custom reply function, where it checks "Did I receive a message asking me to generate an image? If so, what is the prompt?" and generates the image accordingly.

The design idea laid by this image generation capability will allow for future extensions, such as:

Sound generation,
image description,
video generation, etc..

Design-wise it does have some downfalls:

I'm using TextAnalyzerAgent, so more llm calls, more token usage.
If an image was generated, we treat it as a final reply. Should it be the final reply? Or should we pass it along the reply chain?
Current implementation assumes only one image is desired to be generated when most APIs can generate more.

I wrote a quick script to test out the functionality test_image_generation.py (planning to remove and add a notebook instead)

image_gen_poc.mp4

Related issue number

Checks

I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

codecov-commenter · 2024-03-07T19:19:47Z

Codecov Report

Attention: Patch coverage is 78.00000% with 22 lines in your changes are missing coverage. Please review.

Project coverage is 60.87%. Comparing base (ea2c1b2) to head (a250c3d).

Files	Patch %	Lines
.../agentchat/contrib/capabilities/generate_images.py	78.00%	17 Missing and 5 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1907       +/-   ##
===========================================
+ Coverage   37.53%   60.87%   +23.33%     
===========================================
  Files          65       66        +1     
  Lines        6913     7013      +100     
  Branches     1521     1660      +139     
===========================================
+ Hits         2595     4269     +1674     
+ Misses       4092     2357     -1735     
- Partials      226      387      +161

Flag	Coverage Δ
unittests	`60.58% <78.00%> (+23.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* adds image generation capability * add todo * readded cache * wip * fix content str bugs * removed todo: delete imshow * wip * fix circular imports * add notebook * improve prompt * improved text analyzer + notebook * notebook update * improve notebook * smaller notebook size * made changes to the wrong branch :( * resolve comments + 1 * adds doc strings * adds cache doc string * adds doc string to add_to_agent * adds doc string to ImageGeneration * instructions are not configurable * removed unnecessary imports * changed doc string location * more doc strings * improves testability * adds tests * adds cache test * added test to github workflow * compatible llm config format * configurable reply function position * skip_openai + better comments * fix test * fix test? * please fix test? * last fix test? * remove type hint * skip cache test * adds mock api key * dalle-2 test * fix dalle config * use apu key function --------- Co-authored-by: Chi Wang <wang.chi@microsoft.com>

WaelKarkoub added 10 commits March 5, 2024 22:42

adds image generation capability

636bf9d

add todo

aca17d5

readded cache

81e438c

Merge branch 'main' into describe-image-capability

1bfe7c7

Merge branch 'main' into describe-image-capability

a35a8d2

Merge branch 'main' into describe-image-capability

8f7aeff

wip

5cfada7

Merge branch 'main' into describe-image-capability

5a1e23b

fix content str bugs

8e37db3

removed todo: delete imshow

d2682e0

WaelKarkoub assigned BeibinLi and rickyloynd-microsoft Mar 7, 2024

WaelKarkoub had a problem deploying to openai1 March 7, 2024 19:17 — with GitHub Actions Failure

WaelKarkoub added 4 commits March 7, 2024 21:16

wip

0f74c33

fix circular imports

2403c1b

add notebook

4a0f842

improve prompt

ed8e2d6

WaelKarkoub had a problem deploying to openai1 March 7, 2024 21:15 — with GitHub Actions Failure

WaelKarkoub had a problem deploying to openai1 March 15, 2024 19:11 — with GitHub Actions Failure

use apu key function

a250c3d

WaelKarkoub had a problem deploying to openai1 March 15, 2024 19:14 — with GitHub Actions Failure

WaelKarkoub temporarily deployed to openai1 March 15, 2024 19:14 — with GitHub Actions Inactive

WaelKarkoub had a problem deploying to openai1 March 15, 2024 19:14 — with GitHub Actions Failure

WaelKarkoub temporarily deployed to openai1 March 15, 2024 19:14 — with GitHub Actions Inactive

WaelKarkoub had a problem deploying to openai1 March 15, 2024 19:14 — with GitHub Actions Failure

WaelKarkoub temporarily deployed to openai1 March 15, 2024 19:14 — with GitHub Actions Inactive

WaelKarkoub had a problem deploying to openai1 March 15, 2024 19:14 — with GitHub Actions Failure

WaelKarkoub temporarily deployed to openai1 March 15, 2024 19:14 — with GitHub Actions Inactive

ekzhu approved these changes Mar 15, 2024

View reviewed changes

sonichi added this pull request to the merge queue Mar 15, 2024

Merged via the queue into main with commit c5536ee Mar 15, 2024
64 of 68 checks passed

WaelKarkoub deleted the describe-image-capability branch March 16, 2024 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Adds Image Generation Capability 2.0 #1907

[Feature] Adds Image Generation Capability 2.0 #1907

WaelKarkoub commented Mar 7, 2024

codecov-commenter commented Mar 7, 2024 •

edited

Loading

[Feature] Adds Image Generation Capability 2.0 #1907

[Feature] Adds Image Generation Capability 2.0 #1907

Conversation

WaelKarkoub commented Mar 7, 2024

Why are these changes needed?

Related issue number

Checks

codecov-commenter commented Mar 7, 2024 • edited Loading

Codecov Report

codecov-commenter commented Mar 7, 2024 •

edited

Loading