Imagine capturing the essence of a book in a single image! This project tackled the challenge of processing summaries from a big dataset of 16,559
books. The goal? To generate images that reflect the vibe of each book.
Phase | Description |
---|---|
Data Preprocessing | Cleaning and standardizing the raw data |
EDA | Exploratory Data Analysis |
NLP | Using Text summarization model to condense the book summaries |
Vision | Using Text-to-image model to generate book images |
Sample Outputs | 20 Sample Outputs |
Here's a quick breakdown of each phase. For a deeper look, check out the details in the notebook!
In this step, the raw crawled data of book titles and summaries is cleaned and transformed into a format that is suitable for analysis and the NLP model. This involves removing duplicates, handling missing values, and standardizing the data.
This step involves exploring the data to gain insights and the size of summary tokens.
The challenge here is to find the best parameters for the Text summarization model.
Overview | Inliers |
---|---|
The NLP model is used to condense the book summaries to make them suitable for the Text-To-Image model.
The most effective model discovered for the task of summarizing long texts in our dataset is a fine-tuned version of LongT5, Which is an encoder-decoder longT5 model that is trained using book summaries.
Finally, in this step, images of condensed summaries are generated. I decided to use SD-XL 1.0-base model which was a pretty accurate model for the condensed summaries.
Click on each image to see it in its actual size.