Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: (GH3602) Concat to produce a non-unique columns when duplicates are across dtypes #3647

Merged
merged 1 commit into from
May 19, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ pandas 0.11.1
- Duplicate indexes with getitem will return items in the correct order (GH3455_, GH3457_)
and handle missing elements like unique indices (GH3561_)
- Duplicate indexes with and empty DataFrame.from_records will return a correct frame (GH3562_)
- Concat to produce a non-unique columns when duplicates are across dtypes is fixed (GH3602_)
- Fixed bug in groupby with empty series referencing a variable before assignment. (GH3510_)
- Fixed bug in mixed-frame assignment with aligned series (GH3492_)
- Fixed bug in selecting month/quarter/year from a series would not select the time element
Expand Down Expand Up @@ -196,6 +197,7 @@ pandas 0.11.1
.. _GH3626: https://github.com/pydata/pandas/issues/3626
.. _GH3601: https://github.com/pydata/pandas/issues/3601
.. _GH3631: https://github.com/pydata/pandas/issues/3631
.. _GH3602: https://github.com/pydata/pandas/issues/3602
.. _GH1512: https://github.com/pydata/pandas/issues/1512
=======
.. _GH3571: https://github.com/pydata/pandas/issues/3571
Expand Down
28 changes: 28 additions & 0 deletions doc/source/v0.11.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ API changes

Enhancements
~~~~~~~~~~~~

- ``pd.read_html()`` can now parse HTML string, files or urls and return dataframes
courtesy of @cpcloud. (GH3477_)
- ``HDFStore``
Expand Down Expand Up @@ -114,10 +115,37 @@ Enhancements
import os
os.remove('mi.csv')

Bug Fixes
~~~~~~~~~

- Non-unique index support clarified (GH3468_).

- Fix assigning a new index to a duplicate index in a DataFrame would fail (GH3468_)
- Fix construction of a DataFrame with a duplicate index
- ref_locs support to allow duplicative indices across dtypes,
allows iget support to always find the index (even across dtypes) (GH2194_)
- applymap on a DataFrame with a non-unique index now works
(removed warning) (GH2786_), and fix (GH3230_)
- Fix to_csv to handle non-unique columns (GH3495_)
- Duplicate indexes with getitem will return items in the correct order (GH3455_, GH3457_)
and handle missing elements like unique indices (GH3561_)
- Duplicate indexes with and empty DataFrame.from_records will return a correct frame (GH3562_)
- Concat to produce a non-unique columns when duplicates are across dtypes is fixed (GH3602_)

See the `full release notes
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
on GitHub for a complete list.

.. _GH3468: https://github.com/pydata/pandas/issues/3468
.. _GH2194: https://github.com/pydata/pandas/issues/2194
.. _GH2786: https://github.com/pydata/pandas/issues/2786
.. _GH3230: https://github.com/pydata/pandas/issues/3230
.. _GH3495: https://github.com/pydata/pandas/issues/3495
.. _GH3455: https://github.com/pydata/pandas/issues/3455
.. _GH3457: https://github.com/pydata/pandas/issues/3457
.. _GH3561: https://github.com/pydata/pandas/issues/3561
.. _GH3562: https://github.com/pydata/pandas/issues/3562
.. _GH3602: https://github.com/pydata/pandas/issues/3602
.. _GH2437: https://github.com/pydata/pandas/issues/2437
.. _GH2852: https://github.com/pydata/pandas/issues/2852
.. _GH3477: https://github.com/pydata/pandas/issues/3477
Expand Down
14 changes: 12 additions & 2 deletions pandas/tools/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1043,6 +1043,7 @@ def _concat_blocks(self, blocks):
'DataFrames')
return make_block(concat_values, blocks[0].items, self.new_axes[0])
else:

offsets = np.r_[0, np.cumsum([len(x._data.axes[0]) for
x in self.objs])]
indexer = np.concatenate([offsets[i] + b.ref_locs
Expand All @@ -1052,12 +1053,21 @@ def _concat_blocks(self, blocks):
concat_items = indexer
else:
concat_items = self.new_axes[0].take(indexer)

if self.ignore_index:
ref_items = self._get_fresh_axis()
return make_block(concat_values, concat_items, ref_items)

return make_block(concat_values, concat_items, self.new_axes[0])
block = make_block(concat_values, concat_items, self.new_axes[0])

# we need to set the ref_locs in this block so we have the mapping
# as we now have a non-unique index across dtypes, and we need to
# map the column location to the block location
# GH3602
if not self.new_axes[0].is_unique:
block._ref_locs = indexer

return block

def _concat_single_item(self, objs, item):
all_values = []
Expand Down
14 changes: 14 additions & 0 deletions pandas/tools/tests/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1682,6 +1682,20 @@ def test_concat_bug_2972(self):
expected.columns=['same name', 'same name']
assert_frame_equal(result, expected)

def test_concat_bug_3602(self):

# GH 3602, duplicate columns
df1 = DataFrame({'firmNo' : [0,0,0,0], 'stringvar' : ['rrr', 'rrr', 'rrr', 'rrr'], 'prc' : [6,6,6,6] })
df2 = DataFrame({'misc' : [1,2,3,4], 'prc' : [6,6,6,6], 'C' : [9,10,11,12]})
expected = DataFrame([[0,6,'rrr',9,1,6],
[0,6,'rrr',10,2,6],
[0,6,'rrr',11,3,6],
[0,6,'rrr',12,4,6]])
expected.columns = ['firmNo','prc','stringvar','C','misc','prc']

result = concat([df1,df2],axis=1)
assert_frame_equal(result,expected)

def test_concat_series_axis1_same_names_ignore_index(self):
dates = date_range('01-Jan-2013', '01-Jan-2014', freq='MS')[0:-1]
s1 = Series(randn(len(dates)), index=dates, name='value')
Expand Down