Skip to content

FamousDirector/DALLEePaperFrame

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DALLEePaperFrame

Ever wanted to display never before seen art, on demand, using AI? Press a button, speak a "prompt" for the AI artist, and see the new art!

What is this project?

By now everyone has seen AI generated art. There has been lots of amazing works in this field, perhaps most notably DALLE2 by OpenAI. In my opinion, the best way to view art is not on a computer screen, but in a frame on the wall.

This project use a local server to host the art generation AI and automatic speech recognition capabilities. The ePaper frame acts as a client to the server, requesting new art to be generated on demand.

How does it work?

The server is using an NVIDIA GPU (e.g. a Jetson, or other discrete GPU), and the ePaper frame "client" is running on a Raspberry Pi. The frame has four buttons, and a microphone.
The four buttons have the following functions:

  1. Request a new generation of art with the same prompt previously used (and currently displayed on the ePaper frame).
  2. Request a new generation of art with a new prompt created from the pre-built prompts. (see prompts.txt)
  3. Request a new generation of art with a new prompt created from the microphone. After the button is pressed, the microphone will start recording for 3 seconds.
  4. Enable/disable automatic art generation (based on previously used prompt or pre-built prompts).

Extra Technical Details

The display I used was an Inky Impression 5.7" ePaper frame. It is connected to a Raspberry Pi 1B+ but any other Raspberry Pi should work. The client/ directory contains the single Python script used to control the frame and connect to the server.

The server is running two docker containers, orchestrated by a Docker Compose file. The two containers are:

  • triton-inference-server: Uses NVIDIA's Triton Inference Server to host the art generation AI model and the automatic speech recognition (ASR) model.
    • The ASR model is a wav2vec 2.0 Large model converted to ONNX format for inference.
    • The art generation model is a DALLE-mini variant called min-dalle (massive shoutout to Brett Kuprel for this incredible Pytorch port).
  • art-generator-api: a FastAPI server that acts a clean endpoint for the client to request new art. The server/ directory contains the code for the server.

How do I use this project?

Set up the Server

Set up the server with the script setup_server.sh:

cd server/
bash setup_server.sh

Run the Server

cd server/
bash run_server.sh

Set up the Client

Set up the client with the script setup_client.sh:

cd client/
bash setup_client.sh 

Run the Client

cd client/
bash run_client.sh <ip_address> # the IP address of the server

Final thoughts

Power considerations

If you want to have this hanging on a wall like I do, you can connect the Raspberry Pi to a cellphone battery pack.

See here for other notes on reducing power consumption on Raspberry Pis:

System requirements

The min-dalle model and ASR model take around 8GB and 4GB of GPU memory, respectively. So ensure you have at least 12GB of GPU memory. If your GPU does not have enough memory, you may want to consider only running the min-dalle model for generating art.

Generation Time

The time taken for art generation is about 10 seconds on an NVIDIA Jetson AGX Orin and about 7 seconds with an NVIDIA RTX2070.

ePaper

One thing to note about the ePaper is that it is not a perfect display. The particular one I chose has only 7 colors, which can lead to some images looking a bit weird.

Another note is when ePaper refreshes its display. It takes a few seconds to do so. The particular one I chose has a refresh rate of about 30 seconds. Here is a highly sped up sample:

About

AI generated art on an e-paper display

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published