Can't create a DataFrame from an empty Series #2234

grsr · 2012-11-12T13:56:11Z

For a project I am working on it would be convenient if you could create a DataFrame from an empty Series. I have some library code that will create a Series object constructed from a dict, most of the time this dict will have some entries, but occasionally it does not. Currently you can create an empty DataFrame by not supplying anything to the constructor, and you can create a DataFrame or a Series from an empty dict, but if you try to create a DataFrame from an Series constructed from an empty dict pandas throws an AssertionError. I can code around this easily enough, but it would be preferable just to return an empty DataFrame. Example code below:

In [73]: pandas.__version__
Out[73]: '0.9.0'

In [74]: df = DataFrame()

In [75]: df = DataFrame({})

In [76]: s = Series({}, name="foo")

In [77]: df = DataFrame(s)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-77-4c4337b321e1> in <module>()
----> 1 df = DataFrame(s)

/nfs/users/nfs_g/gr5/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/pandas-0.9.0-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    392             else:
    393                 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 394                                          copy=copy)
    395         elif isinstance(data, list):
    396             if len(data) > 0:

/nfs/users/nfs_g/gr5/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/pandas-0.9.0-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _init_ndarray(self, values, index, columns, dtype, copy)
    504             columns = _ensure_index(columns)
    505 
--> 506         block = make_block(values.T, columns, columns)
    507         return BlockManager([block], [columns, index])
    508 

/nfs/users/nfs_g/gr5/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/pandas-0.9.0-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in make_block(values, items, ref_items, do_integrity_check)
    459 
    460     return klass(values, items, ref_items, ndim=values.ndim,
--> 461                  do_integrity_check=do_integrity_check)
    462 
    463 # TODO: flexible with index=None and/or items=None

/nfs/users/nfs_g/gr5/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/pandas-0.9.0-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in __init__(self, values, items, ref_items, ndim, do_integrity_check)
     24 
     25         assert(values.ndim == ndim)
---> 26         assert(len(items) == len(values))
     27 
     28         self.values = values

AssertionError:

Cheers,

Graham

grsr · 2012-11-12T14:34:42Z

Actually, on further inspection, it seems the problem is creating a DataFrame from an empty Series that has been given a name, a Series constructed only from an empty dict can be used to construct a DataFrame. I guess the issue is something to do with constructing the list of column names in DataFrame, but I can successfully create a DataFrame from an empty dict and explicitly include some columns, and I get what I want, namely a DataFrame with no rows but with entries in the column index. It would be handy if I could get the same behaviour from my existing code, i.e. creating a DataFrame from an empty Series which has a name.

This isn't a major issue now though as I didn't realise that you can join a Series with a DataFrame, which is why I was converting my Series into DataFrame in the first place, and if you join a (populated) DataFrame with an empty Series with a name then pandas does exactly what I want in that it creates a column in the resulting DataFrame with NA entries for all the rows.

In [19]: pandas.__version__
Out[19]: '0.9.0'

In [20]: DataFrame(Series({})) # works
Out[20]: 
Empty DataFrame
Columns: array([], dtype=int64)
Index: array([], dtype=object)

In [21]: DataFrame({}, columns = ['foo']) # works, and adds the 'foo' column
Out[21]: 
Empty DataFrame
Columns: array([foo], dtype=object)
Index: array([], dtype=object)

In [31]: df = DataFrame({'foo': {'item1': 1, 'item2': 2}, 'bar': {'item2': 3}}) # create a populated df

In [32]: s = Series({},name = 'baz') # and an empty Series with a name

In [33]: df.join(s)
Out[33]: 
       bar  foo  baz
item1  NaN    1  NaN
item2    3    2  NaN

grsr · 2012-11-12T15:08:50Z

OK, I understand what's going on now. I can indeed create a DataFrame from an empty Series, but I have to do so by passing a dict with the name of the Series as the key and the Series as the corresponding value. Otherwise the Series is being interpreted as a numpy ndarray rather than a pandas Series object in the DataFrame constructor. This isn't a bug, so I'm going to close this issue. Thanks for reading anyway!

In [56]: s = Series({}, name = 'foo')

In [57]: DataFrame({s.name: s})
Out[57]: 
Empty DataFrame
Columns: array([foo], dtype=object)
Index: array([], dtype=object)

wesm · 2012-11-12T16:48:12Z

I actually might have expected what you typed to work. I'll reopen until someone can have a look.

Version 0.9.1 * tag 'v0.9.1': RLS: Version 0.9.1 final BUG: icol() should propegate fill_value for sparse data frames pandas-dev#2249 TST: icol() should propegate fill_value for sparse data frames BUG: override SparseDataFrame.icol to use __getitem__ instead of accessing _data internals. close pandas-dev#2251 BUG: make Series.tz_localize work with length-0 non-DatetimeIndex. close pandas-dev#2248 BUG: parallel_coordinates bugfix with matplotlib 1.2.0. close pandas-dev#2237 BUG: issue constructing DataFrame from empty Series with name. close pandas-dev#2234 ENH: disable repr dependence on terminal width when running non-interactively. pandas-dev#1610 BUG: ExcelWriter raises exception on PeriodIndex pandas-dev#2240 BUG: SparseDataFrame.icol return SparseSeries. SparseSeries.from_array return SparseSeries. close pandas-dev#2227, pandas-dev#2229 BUG: fix tz-aware resampling issue. close pandas-dev#2245

grsr closed this as completed Nov 12, 2012

wesm reopened this Nov 12, 2012

wesm closed this as completed in 60e69a3 Nov 14, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't create a DataFrame from an empty Series #2234

Can't create a DataFrame from an empty Series #2234

grsr commented Nov 12, 2012

grsr commented Nov 12, 2012

grsr commented Nov 12, 2012

wesm commented Nov 12, 2012

Can't create a DataFrame from an empty Series #2234

Can't create a DataFrame from an empty Series #2234

Comments

grsr commented Nov 12, 2012

grsr commented Nov 12, 2012

grsr commented Nov 12, 2012

wesm commented Nov 12, 2012