Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF/INT: preserve block type in joining when only having a single block #17954

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Oct 23, 2017

xref #17283

@gfyoung gfyoung added the Internals Related to non-user accessible pandas implementation label Oct 23, 2017
@codecov
Copy link

codecov bot commented Oct 24, 2017

Codecov Report

Merging #17954 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17954      +/-   ##
==========================================
- Coverage   91.23%   91.22%   -0.02%     
==========================================
  Files         163      163              
  Lines       50113    50121       +8     
==========================================
- Hits        45723    45722       -1     
- Misses       4390     4399       +9
Flag Coverage Δ
#multiple 89.03% <100%> (ø) ⬆️
#single 40.27% <77.77%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals.py 94.47% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1dabf3...2125ac0. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 24, 2017

Codecov Report

Merging #17954 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17954      +/-   ##
==========================================
+ Coverage   91.23%   91.25%   +0.01%     
==========================================
  Files         163      163              
  Lines       50113    50173      +60     
==========================================
+ Hits        45723    45786      +63     
+ Misses       4390     4387       -3
Flag Coverage Δ
#multiple 89.06% <100%> (+0.03%) ⬆️
#single 40.29% <77.77%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals.py 94.47% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/core/generic.py 92.44% <0%> (-0.09%) ⬇️
pandas/core/panel4d.py 100% <0%> (ø) ⬆️
pandas/io/sql.py 94.8% <0%> (ø) ⬆️
pandas/core/ops.py 91.77% <0%> (ø) ⬆️
pandas/core/reshape/concat.py 97.58% <0%> (ø) ⬆️
pandas/core/categorical.py 95.75% <0%> (+0.02%) ⬆️
pandas/core/panel.py 97% <0%> (+0.04%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1dabf3...3cee1aa. Read the comment docs.

if len(join_units) == 1 and not join_units[0].indexers:
b = join_units[0].block
values = b.values
if copy: # and values.base is not None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I first had this with and values.base is not None: but that failed one test in merge. But I copied that logic from concatenate_join_units:

https://github.com/jorisvandenbossche/pandas/blob/2125ac09726cfc720def79e836398ed04ecae04a/pandas/core/internals.py#L5332

Not sure why there it is needed, but here it fails.

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Oct 25, 2017

@jreback can you have a look at this one? (would like to merge it for 0.21.0, although it is messing with the internals after the rc. But I think you have a better idea on how safe this change is)

values = b.values
if copy: # and values.base is not None:
values = values.copy()
elif not copy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for this special case? IOW, why can't you handle this in concat_same_type (maybe pass in copy flag if that helps), IOW you can test there that you only have 1 block and write the same code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the separate special case because it occurs in different cases (and so how it is here, is more close to the original code), but could add it to the other special case. But, that makes concat_same_type more complex. Currently it does not need to bother about copying the data or not in this special case of to_concat values of 1, because concatenate always takes a copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I guess. Still like to come back in and re-org this to make as clear as possible during next release. Do appreciate the support for CustomBlocks, though worry we are adding even more technical debt here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah what I would actually do is define a _concatentor for the base case (e.g. where we have staticmethod(np.concatenate); in pandas/core/dtypes/concat.py, call it _concat_arrays or something. This (and you would have to modify all the others to take a copy= kwarg, then this code become much simpler & less special cased.

Then all of the concat logic is in a single place and copies are handled there. Pls create an issue for 0.22 for this (you may already have one).

@TomAugspurger TomAugspurger added this to the 0.21.0 milestone Oct 26, 2017
@TomAugspurger
Copy link
Contributor

Added to the 0.21 milestone, but will leave up to @jreback to decide if it should be pushed.

@TomAugspurger TomAugspurger mentioned this pull request Oct 26, 2017
64 tasks
@jorisvandenbossche
Copy link
Member Author

Opened issue at #17997

@jreback
Copy link
Contributor

jreback commented Oct 27, 2017

lgtm.

@TomAugspurger
Copy link
Contributor

Appveyor finished, but hasn't updated the checkmark yet. Merging.

@TomAugspurger TomAugspurger merged commit e38bd74 into pandas-dev:master Oct 27, 2017
@jorisvandenbossche jorisvandenbossche deleted the internals-concat-axis1 branch October 27, 2017 12:08
peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017
…ck (pandas-dev#17954)

* REF/INT: preserve block type in joining when only having a single block

* add test

* remove checking of values.base

* clean-up comment
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
…ck (pandas-dev#17954)

* REF/INT: preserve block type in joining when only having a single block

* add test

* remove checking of values.base

* clean-up comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants