Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up random access time of PyAVReaderIndexed. #340

Merged
merged 4 commits into from
May 4, 2020

Conversation

keunhong
Copy link
Contributor

@keunhong keunhong commented Apr 14, 2020

Previously PyAVReaderIndexed decoded every packet to seek to a specific packet. This PR tries to speed this up by instead indexing the timestamps of each packet and seeking to it using container.seek().

This speeds up random access times by an order of magnitude. Sequential access time remains unchanged.

Test Code

test_reader = PyAVReaderIndexed(...)
inds = np.random.randint(low=0, high=len(test_reader), size=3)
print(test_reader)

def test():
    for i in inds:
        print(i)
        test_reader.get_frame(i)
    
%time test()

Before

<Frames>
Source: /local1/kpar/spatialaudiogen/data/preproc-hr/cA88rNC2uuM-video.mp4
Length: 9061 frames
Frame Shape: (1080, 1920, 3)

673
7158
5975
CPU times: user 41.1 s, sys: 7.8 ms, total: 41.1 s
Wall time: 41.1 s

After

<Frames>
Source: /local1/kpar/spatialaudiogen/data/preproc-hr/cA88rNC2uuM-video.mp4
Length: 9061 frames
Frame Shape: (1080, 1920, 3)

673
7158
5975
CPU times: user 1.78 s, sys: 8.01 ms, total: 1.79 s
Wall time: 1.78 s

@keunhong keunhong changed the title Speed up random access time of PyAVReaderTimed. Speed up random access time of PyAVReaderIndexed. Apr 14, 2020
@keunhong
Copy link
Contributor Author

Hi, can anyone take a look at this? I've been using this for the past 2 weeks in my machine learning workflow and I haven't run into any issues yet.

You can even get more gains by re-encoding videos with more frequent keyframes since less frames have to be decoded after seeking.

@nkeim nkeim mentioned this pull request Apr 30, 2020
6 tasks
@nkeim nkeim added this to the v0.5 milestone Apr 30, 2020
@sofroniewn
Copy link

@keunhong i'm not a PIMS maintainer, but am very excited about this speed-up. Not sure if you've seen my post here dask/dask-image#134 asking about such things for usage with dask, but I'm curious if you'd expect to see speed-ups using dask-image imread which uses PIMs under the hood. I can give it a try and report back too if others are interested.

@caspervdw caspervdw self-requested a review May 2, 2020 11:57
Copy link
Member

@caspervdw caspervdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this improvement. I looked through the code and it looks good. Tests also pass, so this is good to merge. I'll leave it up to @rbnvrw or @nkeim to decide whether it should go in v0.5 or not.

@nkeim nkeim merged commit 25e9357 into soft-matter:master May 4, 2020
@nkeim
Copy link
Contributor

nkeim commented May 4, 2020

Thanks for submitting this great PR, @keunhong ! And thanks for the review, @caspervdw . I think this should go into v0.5 since it is such a major improvement. The logical alternative would be to wait until v0.6, and I'm scared of that 😉.

@sofroniewn Since this is for random access, I imagine this would help tremendously with any kind of distributed loading of images from PyAV. I hope I am right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants