Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: BlockManager.apply_allow_failures #34714

Closed
jbrockmendel opened this issue Jun 11, 2020 · 2 comments · Fixed by #35900
Closed

REF: BlockManager.apply_allow_failures #34714

jbrockmendel opened this issue Jun 11, 2020 · 2 comments · Fixed by #35900
Labels
Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code
Milestone

Comments

@jbrockmendel
Copy link
Member

cc @mroeschke w/r/t rolling ._apply, @WillAyd w/r/t _cython_agg_blocks, and anyone else w/r/t DataFrame._reducewith numeric_only=None, apply.FrameApply with ignore_failures=True.

All of these do something along the lines of:

results = []
exclude = []
for i, block in enumerate(mgr.blocks):
    try:
          res = func(block.values)
          results.append(res)
    except:
          exclude.append(i)

out = reconstruct(results, exclude)

Two things should be done with this pattern:

  1. Deprecate it, since it is a disproportionate maintainence burden
  2. implement something like BlockManager.apply_with_ignore_failures that would resemble BlockManager.apply but with the try/except logic*

* We could add that logic into BlockManager.apply, but BM.apply assumes the output has the same shape as the original frame, which I don't think holds for the cases mentioned above, so that would be a little more invasive than just adding a try/except.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 11, 2020
@WillAyd
Copy link
Member

WillAyd commented Jun 11, 2020 via email

@mroeschke
Copy link
Member

FWIW, if we're open to allowing numba in the internals (with some opt-in, global execution engine setting), this is a great candidate for numba parallelism and can ameliorate the performance hit if we decide to deprecate this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants