Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test custom scene graph #12

Open
HuilingSun opened this issue Dec 25, 2023 · 2 comments
Open

Test custom scene graph #12

HuilingSun opened this issue Dec 25, 2023 · 2 comments

Comments

@HuilingSun
Copy link

Hi, Ling Yang
If I want to test generating an image from a custom scene graph, What data should I need to prepare and which part of the code should I change?

@marquies
Copy link

marquies commented Mar 2, 2024

I also want to test with my own scene graphs. I modified the testset_ddim_sampler.py to only load all the stuff and then created my own data:

objs =torch.LongTensor( [ 1,35,118,3,134,2,4,0 ] ).cuda()
#imgs =torch.LongTensor( []).cuda()
triples = torch.LongTensor( [[3,3,6], [3,3,2],[1,3,4],[3,1,1]] ).cuda()
obj_to_img = torch.zeros(8, dtype=torch.long).cuda()
triple_to_img = torch.zeros(4, dtype=torch.long).cuda()

to use it for the sampler (image generation):

     graph_info = [imgs, objs, None, triples, obj_to_img, triple_to_img]
     cond = model.get_learned_conditioning(graph_info)

Result is worse, but I don't know if it is on my model (trained to epoch 35) or the image. I wonder why I need to add the image to the data for the generation process.

@Maelic
Copy link
Contributor

Maelic commented Mar 21, 2024

I also want to test with my own scene graphs. I modified the testset_ddim_sampler.py to only load all the stuff and then created my own data:

objs =torch.LongTensor( [ 1,35,118,3,134,2,4,0 ] ).cuda()
#imgs =torch.LongTensor( []).cuda()
triples = torch.LongTensor( [[3,3,6], [3,3,2],[1,3,4],[3,1,1]] ).cuda()
obj_to_img = torch.zeros(8, dtype=torch.long).cuda()
triple_to_img = torch.zeros(4, dtype=torch.long).cuda()

to use it for the sampler (image generation):

     graph_info = [imgs, objs, None, triples, obj_to_img, triple_to_img]
     cond = model.get_learned_conditioning(graph_info)

Result is worse, but I don't know if it is on my model (trained to epoch 35) or the image. I wonder why I need to add the image to the data for the generation process.

You need to train the model for much longer if you want to obtain good results, it took me roughly 8 days and 335 epochs to reproduce the authors' results, see #7 (comment).

You will also need to carefully design your custom scene graphs: the original VG dataset is highly unbalanced so the diffusion model does not learn efficient representation for all types of relations. In my experiments, it works relatively well to reconstruct images from graphs composed of spatial relations but it will fail with other more complex relations (such as semantics, e.g. "person eating sandwich", "person drinking wine" etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants