Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create a DataFrame from an empty Series #2234

Closed
grsr opened this issue Nov 12, 2012 · 3 comments
Closed

Can't create a DataFrame from an empty Series #2234

grsr opened this issue Nov 12, 2012 · 3 comments
Labels
Milestone

Comments

@grsr
Copy link

grsr commented Nov 12, 2012

For a project I am working on it would be convenient if you could create a DataFrame from an empty Series. I have some library code that will create a Series object constructed from a dict, most of the time this dict will have some entries, but occasionally it does not. Currently you can create an empty DataFrame by not supplying anything to the constructor, and you can create a DataFrame or a Series from an empty dict, but if you try to create a DataFrame from an Series constructed from an empty dict pandas throws an AssertionError. I can code around this easily enough, but it would be preferable just to return an empty DataFrame. Example code below:

In [73]: pandas.__version__
Out[73]: '0.9.0'

In [74]: df = DataFrame()

In [75]: df = DataFrame({})

In [76]: s = Series({}, name="foo")

In [77]: df = DataFrame(s)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-77-4c4337b321e1> in <module>()
----> 1 df = DataFrame(s)

/nfs/users/nfs_g/gr5/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/pandas-0.9.0-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    392             else:
    393                 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 394                                          copy=copy)
    395         elif isinstance(data, list):
    396             if len(data) > 0:

/nfs/users/nfs_g/gr5/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/pandas-0.9.0-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _init_ndarray(self, values, index, columns, dtype, copy)
    504             columns = _ensure_index(columns)
    505 
--> 506         block = make_block(values.T, columns, columns)
    507         return BlockManager([block], [columns, index])
    508 

/nfs/users/nfs_g/gr5/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/pandas-0.9.0-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in make_block(values, items, ref_items, do_integrity_check)
    459 
    460     return klass(values, items, ref_items, ndim=values.ndim,
--> 461                  do_integrity_check=do_integrity_check)
    462 
    463 # TODO: flexible with index=None and/or items=None

/nfs/users/nfs_g/gr5/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/pandas-0.9.0-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in __init__(self, values, items, ref_items, ndim, do_integrity_check)
     24 
     25         assert(values.ndim == ndim)
---> 26         assert(len(items) == len(values))
     27 
     28         self.values = values

AssertionError: 

Cheers,

Graham

@grsr
Copy link
Author

grsr commented Nov 12, 2012

Actually, on further inspection, it seems the problem is creating a DataFrame from an empty Series that has been given a name, a Series constructed only from an empty dict can be used to construct a DataFrame. I guess the issue is something to do with constructing the list of column names in DataFrame, but I can successfully create a DataFrame from an empty dict and explicitly include some columns, and I get what I want, namely a DataFrame with no rows but with entries in the column index. It would be handy if I could get the same behaviour from my existing code, i.e. creating a DataFrame from an empty Series which has a name.

This isn't a major issue now though as I didn't realise that you can join a Series with a DataFrame, which is why I was converting my Series into DataFrame in the first place, and if you join a (populated) DataFrame with an empty Series with a name then pandas does exactly what I want in that it creates a column in the resulting DataFrame with NA entries for all the rows.

In [19]: pandas.__version__
Out[19]: '0.9.0'

In [20]: DataFrame(Series({})) # works
Out[20]: 
Empty DataFrame
Columns: array([], dtype=int64)
Index: array([], dtype=object)

In [21]: DataFrame({}, columns = ['foo']) # works, and adds the 'foo' column
Out[21]: 
Empty DataFrame
Columns: array([foo], dtype=object)
Index: array([], dtype=object)

In [31]: df = DataFrame({'foo': {'item1': 1, 'item2': 2}, 'bar': {'item2': 3}}) # create a populated df

In [32]: s = Series({},name = 'baz') # and an empty Series with a name

In [33]: df.join(s)
Out[33]: 
       bar  foo  baz
item1  NaN    1  NaN
item2    3    2  NaN

@grsr
Copy link
Author

grsr commented Nov 12, 2012

OK, I understand what's going on now. I can indeed create a DataFrame from an empty Series, but I have to do so by passing a dict with the name of the Series as the key and the Series as the corresponding value. Otherwise the Series is being interpreted as a numpy ndarray rather than a pandas Series object in the DataFrame constructor. This isn't a bug, so I'm going to close this issue. Thanks for reading anyway!

In [56]: s = Series({}, name = 'foo')

In [57]: DataFrame({s.name: s})
Out[57]: 
Empty DataFrame
Columns: array([foo], dtype=object)
Index: array([], dtype=object)

@grsr grsr closed this as completed Nov 12, 2012
@wesm
Copy link
Member

wesm commented Nov 12, 2012

I actually might have expected what you typed to work. I'll reopen until someone can have a look.

@wesm wesm reopened this Nov 12, 2012
@wesm wesm closed this as completed in 60e69a3 Nov 14, 2012
yarikoptic added a commit to neurodebian/pandas that referenced this issue Nov 15, 2012
Version 0.9.1

* tag 'v0.9.1':
  RLS: Version 0.9.1 final
  BUG: icol() should propegate fill_value for sparse data frames pandas-dev#2249
  TST: icol() should propegate fill_value for sparse data frames
  BUG: override SparseDataFrame.icol to use __getitem__ instead of accessing _data internals. close pandas-dev#2251
  BUG: make Series.tz_localize work with length-0 non-DatetimeIndex. close pandas-dev#2248
  BUG: parallel_coordinates bugfix with matplotlib 1.2.0. close pandas-dev#2237
  BUG: issue constructing DataFrame from empty Series with name. close pandas-dev#2234
  ENH: disable repr dependence on terminal width when running non-interactively. pandas-dev#1610
  BUG: ExcelWriter raises exception on PeriodIndex pandas-dev#2240
  BUG: SparseDataFrame.icol return SparseSeries. SparseSeries.from_array return SparseSeries. close pandas-dev#2227, pandas-dev#2229
  BUG: fix tz-aware resampling issue. close pandas-dev#2245
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants