Test custom scene graph #12

HuilingSun · 2023-12-25T07:42:03Z

Hi， Ling Yang
If I want to test generating an image from a custom scene graph, What data should I need to prepare and which part of the code should I change?

marquies · 2024-03-02T09:14:40Z

I also want to test with my own scene graphs. I modified the testset_ddim_sampler.py to only load all the stuff and then created my own data:

objs =torch.LongTensor( [ 1,35,118,3,134,2,4,0 ] ).cuda()
#imgs =torch.LongTensor( []).cuda()
triples = torch.LongTensor( [[3,3,6], [3,3,2],[1,3,4],[3,1,1]] ).cuda()
obj_to_img = torch.zeros(8, dtype=torch.long).cuda()
triple_to_img = torch.zeros(4, dtype=torch.long).cuda()

to use it for the sampler (image generation):

     graph_info = [imgs, objs, None, triples, obj_to_img, triple_to_img]
     cond = model.get_learned_conditioning(graph_info)

Result is worse, but I don't know if it is on my model (trained to epoch 35) or the image. I wonder why I need to add the image to the data for the generation process.

Maelic · 2024-03-21T10:53:32Z

I also want to test with my own scene graphs. I modified the testset_ddim_sampler.py to only load all the stuff and then created my own data:
objs =torch.LongTensor( [ 1,35,118,3,134,2,4,0 ] ).cuda()
#imgs =torch.LongTensor( []).cuda()
triples = torch.LongTensor( [[3,3,6], [3,3,2],[1,3,4],[3,1,1]] ).cuda()
obj_to_img = torch.zeros(8, dtype=torch.long).cuda()
triple_to_img = torch.zeros(4, dtype=torch.long).cuda()
to use it for the sampler (image generation):
     graph_info = [imgs, objs, None, triples, obj_to_img, triple_to_img]
     cond = model.get_learned_conditioning(graph_info)
Result is worse, but I don't know if it is on my model (trained to epoch 35) or the image. I wonder why I need to add the image to the data for the generation process.

You need to train the model for much longer if you want to obtain good results, it took me roughly 8 days and 335 epochs to reproduce the authors' results, see #7 (comment).

You will also need to carefully design your custom scene graphs: the original VG dataset is highly unbalanced so the diffusion model does not learn efficient representation for all types of relations. In my experiments, it works relatively well to reconstruct images from graphs composed of spatial relations but it will fail with other more complex relations (such as semantics, e.g. "person eating sandwich", "person drinking wine" etc).

Maelic mentioned this issue Apr 8, 2024

About Evaluation #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test custom scene graph #12

Test custom scene graph #12

HuilingSun commented Dec 25, 2023

marquies commented Mar 2, 2024

Maelic commented Mar 21, 2024

Test custom scene graph #12

Test custom scene graph #12

Comments

HuilingSun commented Dec 25, 2023

marquies commented Mar 2, 2024

Maelic commented Mar 21, 2024