-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize the badfiles plugin #3006
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks cool! But what do you think about making a new function in util
to do the parallel map? That way, we can avoid copying & pasting the necessary boilerplate code.
return self.check_mp3val | ||
elif ext == "flac": | ||
if ext == "flac": | ||
return self.check_flac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason why these were changed from elif
to if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They all return, so it's pointless to have them aselif
.
I can try getting this in I can try and investigate, any ideas why is it that |
No, I don’t have an immediate guess about what can’t be shared on But I actually disagree that multiprocessing would be better than threading here. Since both cases work by shelling out to external programs, we get parallelism outside the Python VM—in other words, the GIL is released during the parallel section. So threading will work just fine to get the tasks to execute in parallel, and may even be faster because threads are faster to create and destroy. |
@sampsyo Sure, that's assuming that the slow path lies in the code executed outside of beets, and thus when the GIL is freed, which is true for After my previous PR got merged I started seeing what speed I would get, and was a little underwhelmed, I think this was due to GIL contention during the upload step. I think this would be improved by multiprocessing, no? I think there are valid places within beets for both threads (badfiles) and processes (absubmit). |
Hmm, I’m not sure. Unless something’s going very wrong, most of the time for absubmit should be spent on invoking the analysis binary and sending the results over the network, both of which are OS calls and should release the GIL. If that’s not happening, maybe some profiling is in order? |
b1d0a30
to
dddec73
Compare
dddec73
to
295a27a
Compare
@sampsyo This is ready to be merged! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks lovely! I suggested one small change, and it looks like we’re having another problem on master that we’ll need to fix.
295a27a
to
0483e56
Compare
0483e56
to
c3c7aa6
Compare
@sampsyo What's the problem in master, I can take a look :) |
Oops; never mind! It must have been intermittent. There was a failure on AppVeyor that looked like it was coming from a dependency, but that seems to have cleared itself up. Thank you for coming back to this!! |
Following my work on #3003 I figured I'd do the same thing for
beet bad
.