Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataPreparation understanding #16

Open
eduardatmadenn opened this issue Jul 10, 2024 · 3 comments
Open

DataPreparation understanding #16

eduardatmadenn opened this issue Jul 10, 2024 · 3 comments

Comments

@eduardatmadenn
Copy link

Hi, first off, congratulations on your work. The use of event data for this is fascinating.

I'm trying to reproduce your work using simulated data and I have a couple of question

  1. What is the FPS you recommend for frame interpolation? As you use a B=5, would it be a safe bet to assume 5 extra frames, between every 2 frames?

  2. Am I understanding this correct? for calendar.h5, data looks like this:

dict_keys(['images/000000', 'images/000001', 'images/000002', 'images/000003', 'images/000004', 'images/000005', 'images/000006', 'images/000007', 'images/000008', 'images/000009', 'images/000010', 'images/000011', 'images/000012', 'images/000013', 'images/000014', 'images/000015', 'images/000016', 'images/000017', 'images/000018', 'images/000019', 'images/000020', 'images/000021', 'images/000022', 'images/000023', 'images/000024', 'images/000025', 'images/000026', 'images/000027', 'images/000028', 'images/000029', 'images/000030', 'images/000031', 'images/000032', 'images/000033', 'images/000034', 'images/000035', 'images/000036', 'images/000037', 'images/000038', 'images/000039', 'images/000040', 'voxels_b/000000', 'voxels_b/000001', 'voxels_b/000002', 'voxels_b/000003', 'voxels_b/000004', 'voxels_b/000005', 'voxels_b/000006', 'voxels_b/000007', 'voxels_b/000008', 'voxels_b/000009', 'voxels_b/000010', 'voxels_b/000011', 'voxels_b/000012', 'voxels_b/000013', 'voxels_b/000014', 'voxels_b/000015', 'voxels_b/000016', 'voxels_b/000017', 'voxels_b/000018', 'voxels_b/000019', 'voxels_b/000020', 'voxels_b/000021', 'voxels_b/000022', 'voxels_b/000023', 'voxels_b/000024', 'voxels_b/000025', 'voxels_b/000026', 'voxels_b/000027', 'voxels_b/000028', 'voxels_b/000029', 'voxels_b/000030', 'voxels_b/000031', 'voxels_b/000032', 'voxels_b/000033', 'voxels_b/000034', 'voxels_b/000035', 'voxels_b/000036', 'voxels_b/000037', 'voxels_b/000038', 'voxels_b/000039', 'voxels_f/000000', 'voxels_f/000001', 'voxels_f/000002', 'voxels_f/000003', 'voxels_f/000004', 'voxels_f/000005', 'voxels_f/000006', 'voxels_f/000007', 'voxels_f/000008', 'voxels_f/000009', 'voxels_f/000010', 'voxels_f/000011', 'voxels_f/000012', 'voxels_f/000013', 'voxels_f/000014', 'voxels_f/000015', 'voxels_f/000016', 'voxels_f/000017', 'voxels_f/000018', 'voxels_f/000019', 'voxels_f/000020', 'voxels_f/000021', 'voxels_f/000022', 'voxels_f/000023', 'voxels_f/000024', 'voxels_f/000025', 'voxels_f/000026', 'voxels_f/000027', 'voxels_f/000028', 'voxels_f/000029', 'voxels_f/000030', 'voxels_f/000031', 'voxels_f/000032', 'voxels_f/000033', 'voxels_f/000034', 'voxels_f/000035', 'voxels_f/000036', 'voxels_f/000037', 'voxels_f/000038', 'voxels_f/000039']) 

where the shape of each individual voxel tensor is [B, H, W].

So in order to replicate this, should I use events_to_voxel_torch on the Event data of each real frame, individually? (When I say real frame, I mean, the original frame plus all the interpolated frames between t and t+1)

  1. Also, 'images/000000' looks to just be the LR image, saved as h5. Would it be enough to open the image as a np array and save it without any other processing? Maybe I missed this detail, but I don't see any processing involved in this DataPreparation step, for the LR images

Thank you

@DachunKai
Copy link
Owner

  1. For the video's meta information, if you know the original FPS, use that FPS. If not, you can set it to 25 or 30 FPS. For better understanding, you can assume B=5, which means there are 5 extra frames between two frames. However, these extra frames come from event signals, and they are still different in form from frame signals.
  2. The images in dict_keys are the frames from the original video and do not include the interpolated frames. The interpolated frames are only used to simulate better event signals and are not packaged into the h5 file. So you should use events_to_voxel_torch to convert the events between two original frames into voxels.
  3. The LR image is downsampled from the GT image. We use a MATLAB script to downsample the GT image. You can refer to generate_bicubic_img.m.

@eduardatmadenn
Copy link
Author

Hi, Thanks for you answer, just to clarify:

  1. I am asking how many actual interpolated frames do you use between two (real) frames
  2. In the Event Signal data, do the timestamps (first column, here) reset to 0 for every real frame, or are they continuous?
674 571 274 0
687 632 314 0
706 632 313 0
707 639 336 0
710 639 329 0
716 639 328 0
718 0 294 0
718 639 332 0
720 639 335 0
724 639 330 0
724 639 331 0
729 550 163 0
735 550 164 0

@DachunKai
Copy link
Owner

  1. We interpolate 7 frames between two (real) frames for the Vimeo90k dataset, and interpolate 3 frames between two (real) frames for the REDS and Vid4 datasets using the RIFE interpolation model.
  2. I can't understand your second question. Event data consists of x, y, t, and p, where p equals +1 or -1, indicating event polarity, and t is continuous with a very small delay.
    Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants