-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't create a DataFrame from an empty Series #2234
Comments
Actually, on further inspection, it seems the problem is creating a DataFrame from an empty Series that has been given a name, a Series constructed only from an empty dict can be used to construct a DataFrame. I guess the issue is something to do with constructing the list of column names in DataFrame, but I can successfully create a DataFrame from an empty dict and explicitly include some columns, and I get what I want, namely a DataFrame with no rows but with entries in the column index. It would be handy if I could get the same behaviour from my existing code, i.e. creating a DataFrame from an empty Series which has a name. This isn't a major issue now though as I didn't realise that you can join a Series with a DataFrame, which is why I was converting my Series into DataFrame in the first place, and if you join a (populated) DataFrame with an empty Series with a name then pandas does exactly what I want in that it creates a column in the resulting DataFrame with NA entries for all the rows. In [19]: pandas.__version__
Out[19]: '0.9.0'
In [20]: DataFrame(Series({})) # works
Out[20]:
Empty DataFrame
Columns: array([], dtype=int64)
Index: array([], dtype=object)
In [21]: DataFrame({}, columns = ['foo']) # works, and adds the 'foo' column
Out[21]:
Empty DataFrame
Columns: array([foo], dtype=object)
Index: array([], dtype=object)
In [31]: df = DataFrame({'foo': {'item1': 1, 'item2': 2}, 'bar': {'item2': 3}}) # create a populated df
In [32]: s = Series({},name = 'baz') # and an empty Series with a name
In [33]: df.join(s)
Out[33]:
bar foo baz
item1 NaN 1 NaN
item2 3 2 NaN |
OK, I understand what's going on now. I can indeed create a DataFrame from an empty Series, but I have to do so by passing a dict with the name of the Series as the key and the Series as the corresponding value. Otherwise the Series is being interpreted as a numpy ndarray rather than a pandas Series object in the DataFrame constructor. This isn't a bug, so I'm going to close this issue. Thanks for reading anyway! In [56]: s = Series({}, name = 'foo')
In [57]: DataFrame({s.name: s})
Out[57]:
Empty DataFrame
Columns: array([foo], dtype=object)
Index: array([], dtype=object) |
I actually might have expected what you typed to work. I'll reopen until someone can have a look. |
Version 0.9.1 * tag 'v0.9.1': RLS: Version 0.9.1 final BUG: icol() should propegate fill_value for sparse data frames pandas-dev#2249 TST: icol() should propegate fill_value for sparse data frames BUG: override SparseDataFrame.icol to use __getitem__ instead of accessing _data internals. close pandas-dev#2251 BUG: make Series.tz_localize work with length-0 non-DatetimeIndex. close pandas-dev#2248 BUG: parallel_coordinates bugfix with matplotlib 1.2.0. close pandas-dev#2237 BUG: issue constructing DataFrame from empty Series with name. close pandas-dev#2234 ENH: disable repr dependence on terminal width when running non-interactively. pandas-dev#1610 BUG: ExcelWriter raises exception on PeriodIndex pandas-dev#2240 BUG: SparseDataFrame.icol return SparseSeries. SparseSeries.from_array return SparseSeries. close pandas-dev#2227, pandas-dev#2229 BUG: fix tz-aware resampling issue. close pandas-dev#2245
For a project I am working on it would be convenient if you could create a DataFrame from an empty Series. I have some library code that will create a Series object constructed from a dict, most of the time this dict will have some entries, but occasionally it does not. Currently you can create an empty DataFrame by not supplying anything to the constructor, and you can create a DataFrame or a Series from an empty dict, but if you try to create a DataFrame from an Series constructed from an empty dict pandas throws an AssertionError. I can code around this easily enough, but it would be preferable just to return an empty DataFrame. Example code below:
Cheers,
Graham
The text was updated successfully, but these errors were encountered: