Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize the badfiles plugin #3006

Merged
merged 3 commits into from
Feb 20, 2019
Merged

Conversation

lovesegfault
Copy link
Contributor

Following my work on #3003 I figured I'd do the same thing for beet bad.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool! But what do you think about making a new function in util to do the parallel map? That way, we can avoid copying & pasting the necessary boilerplate code.

return self.check_mp3val
elif ext == "flac":
if ext == "flac":
return self.check_flac
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why these were changed from elif to if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They all return, so it's pointless to have them aselif.

@lovesegfault
Copy link
Contributor Author

I can try getting this in util, @sampsyo, but it's a little sad that we can't use multiprocessing because self has thread- local members. I think that should be the first step taken, to do this the right way everywhere possible.

I can try and investigate, any ideas why is it that self can't be shared across processes?

@sampsyo
Copy link
Member

sampsyo commented Aug 20, 2018

No, I don’t have an immediate guess about what can’t be shared on self.

But I actually disagree that multiprocessing would be better than threading here. Since both cases work by shelling out to external programs, we get parallelism outside the Python VM—in other words, the GIL is released during the parallel section. So threading will work just fine to get the tasks to execute in parallel, and may even be faster because threads are faster to create and destroy.

@lovesegfault
Copy link
Contributor Author

@sampsyo Sure, that's assuming that the slow path lies in the code executed outside of beets, and thus when the GIL is freed, which is true for beet bad, but not true for beet absubmit.

After my previous PR got merged I started seeing what speed I would get, and was a little underwhelmed, I think this was due to GIL contention during the upload step. I think this would be improved by multiprocessing, no? I think there are valid places within beets for both threads (badfiles) and processes (absubmit).

@sampsyo
Copy link
Member

sampsyo commented Aug 20, 2018

Hmm, I’m not sure. Unless something’s going very wrong, most of the time for absubmit should be spent on invoking the analysis binary and sending the results over the network, both of which are OS calls and should release the GIL. If that’s not happening, maybe some profiling is in order?

@lovesegfault lovesegfault force-pushed the parallel-bad branch 2 times, most recently from b1d0a30 to dddec73 Compare February 19, 2019 06:39
@lovesegfault lovesegfault changed the title Parallelize the badfiles plugin [WIP] Parallelize the badfiles plugin Feb 19, 2019
@lovesegfault
Copy link
Contributor Author

@sampsyo Do not merge this until #3153 has been merged (then the diff will make sense again, I have this branch rebased on top of that one)

@lovesegfault lovesegfault changed the title [WIP] Parallelize the badfiles plugin Parallelize the badfiles plugin Feb 19, 2019
@lovesegfault
Copy link
Contributor Author

@sampsyo This is ready to be merged!

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks lovely! I suggested one small change, and it looks like we’re having another problem on master that we’ll need to fix.

docs/changelog.rst Outdated Show resolved Hide resolved
@lovesegfault
Copy link
Contributor Author

@sampsyo What's the problem in master, I can take a look :)

@sampsyo
Copy link
Member

sampsyo commented Feb 20, 2019

Oops; never mind! It must have been intermittent. There was a failure on AppVeyor that looked like it was coming from a dependency, but that seems to have cleared itself up. Thank you for coming back to this!!

@sampsyo sampsyo merged commit 9d0f2c1 into beetbox:master Feb 20, 2019
@lovesegfault lovesegfault deleted the parallel-bad branch February 20, 2019 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants