Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About train dataset processing #28

Closed
TomSuen opened this issue Jun 28, 2024 · 11 comments
Closed

About train dataset processing #28

TomSuen opened this issue Jun 28, 2024 · 11 comments

Comments

@TomSuen
Copy link

TomSuen commented Jun 28, 2024

Hi, thank you for such a wonderful work!I would like to ask a question about the preparation of training sets. Notice that you mentioned in the paper During training, we randomly sample 14 video frames with a stride of 4. ...with a resolution of 256 × 256. We first train ... and directly taking the first frame together with the estimated optical flow from Unimatch.

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

@MyNiuuu
Copy link
Owner

MyNiuuu commented Jun 28, 2024

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

We set nms_ks=15 during the training process. I think nms_ks=3 may be a little small to train the model.

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.

@TomSuen
Copy link
Author

TomSuen commented Jul 1, 2024

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

We set nms_ks=15 during the training process. I think nms_ks=3 may be a little small to train the model.

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.

Okay,thank you for ur reply, When will you open source the training code?

@TomSuen
Copy link
Author

TomSuen commented Jul 1, 2024

And I have another question, if I resize the (336,596) video to (256,256) to predict the optical flow, and if I predict the optical flow with the original size video and then resize it to (256,256), will there be any difference between the two optical flow graphs? Normally, unimatch should not be too sensitive to size.

@MyNiuuu
Copy link
Owner

MyNiuuu commented Jul 1, 2024

And I have another question, if I resize the (336,596) video to (256,256) to predict the optical flow, and if I predict the optical flow with the original size video and then resize it to (256,256), will there be any difference between the two optical flow graphs? Normally, unimatch should not be too sensitive to size.

We found that Unimatch is actually (relatively) sensitive to input size. Unimatch produces sharper predictions when resizing the input frames to its training size: [384, 512], and we adopted this setting during the training process.

@TomSuen
Copy link
Author

TomSuen commented Jul 2, 2024

Thx again, forgive me for having so many questions. I noticed that the pre-trained models you provided are all for 25 frames. Can I fine-tune them on 14 frames of data?

@MyNiuuu
Copy link
Owner

MyNiuuu commented Jul 2, 2024

Thx again, forgive me for having so many questions. I noticed that the pre-trained models you provided are all for 25 frames. Can I fine-tune them on 14 frames of data?

Yes, you can finetune the model on 14 frames of data. I think it will not negatively impact the performance of the model.

@MyNiuuu
Copy link
Owner

MyNiuuu commented Jul 2, 2024

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

We set nms_ks=15 during the training process. I think nms_ks=3 may be a little small to train the model.

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.

Okay,thank you for ur reply, When will you open source the training code?

I am starting to prepare the release of training codes since our paper has been accepted for ECCV'24. I think that the training codes will be made available within a week 🤔.

@TomSuen
Copy link
Author

TomSuen commented Jul 2, 2024

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

We set nms_ks=15 during the training process. I think nms_ks=3 may be a little small to train the model.

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.

Okay,thank you for ur reply, When will you open source the training code?

I am starting to prepare the release of training codes since our paper has been accepted for ECCV'24. I think that the training codes will be made available within a week 🤔.

Waooooo, great news😄

@TomSuen
Copy link
Author

TomSuen commented Jul 12, 2024

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

We set nms_ks=15 during the training process. I think nms_ks=3 may be a little small to train the model.

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.

Okay,thank you for ur reply, When will you open source the training code?

I am starting to prepare the release of training codes since our paper has been accepted for ECCV'24. I think that the training codes will be made available within a week 🤔.

Hi, I have other questions.

  1. For the first frame of optical flow, will the watershed sampler algorithm definitely sample the position of the maximum optical flow value?
  2. When training, do you want the sparse optical flow obtained after cmp to be exactly the same as the dense optical flow extracted from the original video? I mean, training is just to enable the model to learn the guiding role of any optical flow, not to completely restore the video, right?

@MyNiuuu
Copy link
Owner

MyNiuuu commented Jul 14, 2024

So my question is what is the number of nms_ks in the flow_sampler function you used in the watershed sampler? I set it to 3 to get sampling points as much as possible,but it hard to just use these points to reconstruct the ori video, is this normal?

We set nms_ks=15 during the training process. I think nms_ks=3 may be a little small to train the model.

Btw,I found that one of the possible reasons is that the mask are all taken from the first frame. If an object in the first frame does not move, it is difficult for the Watershed algorithm to sample there, resulting in a lack of guidance for this object in the sparse flow guidance sequence, so the reconstruction effect is not ideal, right?

Yes, the watershed algorithm is unable to sample points that remain stationary in the initial frame. However, this may not significantly impact the model's training, as it is unnecessary and even inadvisable to sample every moving part throughout the video.

Okay,thank you for ur reply, When will you open source the training code?

I am starting to prepare the release of training codes since our paper has been accepted for ECCV'24. I think that the training codes will be made available within a week 🤔.

Hi, I have other questions.

  1. For the first frame of optical flow, will the watershed sampler algorithm definitely sample the position of the maximum optical flow value?
  2. When training, do you want the sparse optical flow obtained after cmp to be exactly the same as the dense optical flow extracted from the original video? I mean, training is just to enable the model to learn the guiding role of any optical flow, not to completely restore the video, right?

Sorry for the late reply, busy days.

For the first frame of optical flow, will the watershed sampler algorithm definitely sample the position of the maximum optical flow value?

Actually I am not very sure about the answer to this question, I think yes, will the watershed sampler algorithm definitely sample the position of the maximum optical flow value. You can check the codes now since I have just already released the training codes.

When training, do you want the sparse optical flow obtained after cmp to be exactly the same as the dense optical flow extracted from the original video?

No. This is the last thing I expect😂. This will make the model depend on flow too much and lack enough generation ability.

I mean, training is just to enable the model to learn the guiding role of any optical flow, not to completely restore the video, right?

Yes, our goal is to use a rough flow as guidance, that is to say, given an in accurate optical flow from CMP, the model can still generate semantically meaningful videos that correctly reflect the intention of the user.

By the way, I have just released the training codes, you can check for details.

Best regards,

@TomSuen
Copy link
Author

TomSuen commented Jul 17, 2024

Thank you very much for your kind reply! I feel I have almost fully understood your work.

@TomSuen TomSuen closed this as completed Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants