DataPreparation understanding #16

eduardatmadenn · 2024-07-10T12:13:16Z

Hi, first off, congratulations on your work. The use of event data for this is fascinating.

I'm trying to reproduce your work using simulated data and I have a couple of question

What is the FPS you recommend for frame interpolation? As you use a B=5, would it be a safe bet to assume 5 extra frames, between every 2 frames?
Am I understanding this correct? for calendar.h5, data looks like this:

dict_keys(['images/000000', 'images/000001', 'images/000002', 'images/000003', 'images/000004', 'images/000005', 'images/000006', 'images/000007', 'images/000008', 'images/000009', 'images/000010', 'images/000011', 'images/000012', 'images/000013', 'images/000014', 'images/000015', 'images/000016', 'images/000017', 'images/000018', 'images/000019', 'images/000020', 'images/000021', 'images/000022', 'images/000023', 'images/000024', 'images/000025', 'images/000026', 'images/000027', 'images/000028', 'images/000029', 'images/000030', 'images/000031', 'images/000032', 'images/000033', 'images/000034', 'images/000035', 'images/000036', 'images/000037', 'images/000038', 'images/000039', 'images/000040', 'voxels_b/000000', 'voxels_b/000001', 'voxels_b/000002', 'voxels_b/000003', 'voxels_b/000004', 'voxels_b/000005', 'voxels_b/000006', 'voxels_b/000007', 'voxels_b/000008', 'voxels_b/000009', 'voxels_b/000010', 'voxels_b/000011', 'voxels_b/000012', 'voxels_b/000013', 'voxels_b/000014', 'voxels_b/000015', 'voxels_b/000016', 'voxels_b/000017', 'voxels_b/000018', 'voxels_b/000019', 'voxels_b/000020', 'voxels_b/000021', 'voxels_b/000022', 'voxels_b/000023', 'voxels_b/000024', 'voxels_b/000025', 'voxels_b/000026', 'voxels_b/000027', 'voxels_b/000028', 'voxels_b/000029', 'voxels_b/000030', 'voxels_b/000031', 'voxels_b/000032', 'voxels_b/000033', 'voxels_b/000034', 'voxels_b/000035', 'voxels_b/000036', 'voxels_b/000037', 'voxels_b/000038', 'voxels_b/000039', 'voxels_f/000000', 'voxels_f/000001', 'voxels_f/000002', 'voxels_f/000003', 'voxels_f/000004', 'voxels_f/000005', 'voxels_f/000006', 'voxels_f/000007', 'voxels_f/000008', 'voxels_f/000009', 'voxels_f/000010', 'voxels_f/000011', 'voxels_f/000012', 'voxels_f/000013', 'voxels_f/000014', 'voxels_f/000015', 'voxels_f/000016', 'voxels_f/000017', 'voxels_f/000018', 'voxels_f/000019', 'voxels_f/000020', 'voxels_f/000021', 'voxels_f/000022', 'voxels_f/000023', 'voxels_f/000024', 'voxels_f/000025', 'voxels_f/000026', 'voxels_f/000027', 'voxels_f/000028', 'voxels_f/000029', 'voxels_f/000030', 'voxels_f/000031', 'voxels_f/000032', 'voxels_f/000033', 'voxels_f/000034', 'voxels_f/000035', 'voxels_f/000036', 'voxels_f/000037', 'voxels_f/000038', 'voxels_f/000039'])

where the shape of each individual voxel tensor is [B, H, W].

So in order to replicate this, should I use events_to_voxel_torch on the Event data of each real frame, individually? (When I say real frame, I mean, the original frame plus all the interpolated frames between t and t+1)

Also, 'images/000000' looks to just be the LR image, saved as h5. Would it be enough to open the image as a np array and save it without any other processing? Maybe I missed this detail, but I don't see any processing involved in this DataPreparation step, for the LR images

Thank you

The text was updated successfully, but these errors were encountered:

DachunKai · 2024-07-10T12:57:46Z

For the video's meta information, if you know the original FPS, use that FPS. If not, you can set it to 25 or 30 FPS. For better understanding, you can assume B=5, which means there are 5 extra frames between two frames. However, these extra frames come from event signals, and they are still different in form from frame signals.
The images in dict_keys are the frames from the original video and do not include the interpolated frames. The interpolated frames are only used to simulate better event signals and are not packaged into the h5 file. So you should use events_to_voxel_torch to convert the events between two original frames into voxels.
The LR image is downsampled from the GT image. We use a MATLAB script to downsample the GT image. You can refer to generate_bicubic_img.m.

eduardatmadenn · 2024-07-10T13:15:16Z

Hi, Thanks for you answer, just to clarify:

I am asking how many actual interpolated frames do you use between two (real) frames
In the Event Signal data, do the timestamps (first column, here) reset to 0 for every real frame, or are they continuous?

674 571 274 0
687 632 314 0
706 632 313 0
707 639 336 0
710 639 329 0
716 639 328 0
718 0 294 0
718 639 332 0
720 639 335 0
724 639 330 0
724 639 331 0
729 550 163 0
735 550 164 0

DachunKai · 2024-07-11T07:06:35Z

We interpolate 7 frames between two (real) frames for the Vimeo90k dataset, and interpolate 3 frames between two (real) frames for the REDS and Vid4 datasets using the RIFE interpolation model.
I can't understand your second question. Event data consists of x, y, t, and p, where p equals +1 or -1, indicating event polarity, and t is continuous with a very small delay.
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataPreparation understanding #16

DataPreparation understanding #16

eduardatmadenn commented Jul 10, 2024

DachunKai commented Jul 10, 2024

eduardatmadenn commented Jul 10, 2024

DachunKai commented Jul 11, 2024

DataPreparation understanding #16

DataPreparation understanding #16

Comments

eduardatmadenn commented Jul 10, 2024

DachunKai commented Jul 10, 2024

eduardatmadenn commented Jul 10, 2024

DachunKai commented Jul 11, 2024