You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using joblib to run the function in parallel, a similar behavior is seen as with the issue in wrapt with a slight difference. In that issue, running in multi-process mode gave a similar error to pandarallel above (problem with serializing) but in my test it doesn't throw an error, it just seems to ignore the decorator (fails silently)
fromjoblibimportParallel, delayedgroups=df.groupby("C", axis=0, group_keys=False)
withParallel(n_jobs=2, prefer="threads") asparallel:
result=parallel(delayed(fraction)(grp) foridx, grpingroups)
# Works correctly, throws a SchemaErrorout3=pd.concat(result)
groups=df.groupby("C", axis=0, group_keys=False)
withParallel(n_jobs=2, prefer="processes") asparallel:
result=parallel(delayed(fraction)(grp) foridx, grpingroups)
out4=pd.concat(result)
# This does not seem to check the output type and throws no error
Expected behavior
In the example with pandarallel , both groupby-apply operations should throw SchemaErrors when type-checking the output dataframe (this can be achieved by replacing .parallel_apply with .apply).
Expected output: SchemaError: error in check_types decorator of function 'fraction': expected series 'A' to have type str
For joblib, whether using multi-thread or multi-process mode, it should throw the same SchemaError
Desktop (please complete the following information):
Hi @dcnadler, so there have been other issues with the @wrapt dependency in the past. I don't have too much context now on what can be done with wrapt to fix this.
Perhaps a longer term solution would be to refactor these decorators so that they work with plain python or functools.wraps (I used wrapt early on in pandera's development mainly because I found it convenient to use).
I'd support a PR to make this change (with updated tests) if you're open to making one!
When a function is wrapped by a pandera typing-checking decorator, it can't be used in parallel execution without error or silently failing.
I believe this is because of a known issue with wrapt-based decorators.
We would like to use the
check_io
decorator in our existing project, but can't with this bug since we use most of our functions in parallel.The issue on wrapt is over three years old, so may be worth it to rewrite the decorators without wrapt?
Code Sample, a copy-pastable example
When using joblib to run the function in parallel, a similar behavior is seen as with the issue in wrapt with a slight difference. In that issue, running in multi-process mode gave a similar error to pandarallel above (problem with serializing) but in my test it doesn't throw an error, it just seems to ignore the decorator (fails silently)
Expected behavior
In the example with pandarallel , both groupby-apply operations should throw
SchemaErrors
when type-checking the output dataframe (this can be achieved by replacing.parallel_apply
with.apply
).Expected output:
SchemaError: error in check_types decorator of function 'fraction': expected series 'A' to have type str
For joblib, whether using multi-thread or multi-process mode, it should throw the same
SchemaError
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: