Skip to content

Commit

Permalink
Merge tag 'v0.9.0' into debian
Browse files Browse the repository at this point in the history
Version 0.9.0

* tag 'v0.9.0': (43 commits)
  RLS: Version 0.9.0 final
  Fix groupby.median documentation
  BUG: need extra slash on windows for file://
  BUG: default pandas.io.data start date 1/1/2000 per docs. close pandas-dev#2011
  clean up tests
  Allow DataFrame.update to accept non DataFrame object and attempt to coerce.
  ENH: Use given name for DataFrame column name for FRED API
  BLD: quiet tox warning about missing dep
  BUG: reset_index fails with MultiIndex in columns pandas-dev#2017
  BUG: with_statement in test_console_encode() (3a11f00) broke 2.5 test suite
  BUG: dict comprehension in (af3e13c) broke 2.6 test suite
  BUG: Timestamp dayofyear returns day of month pandas-dev#2021
  BUG: pandas breaks mpl plot_date
  DOC: update parsers header, names args doc
  BUG: read_csv regression, moved date parsing to before type conversions now so can parse yymmdd hhmm format now pandas-dev#1905
  Fix naming of ewmvar and ewmstd in documentation
  DOC: whats new for pandas-dev#2000
  ENH: change default header names in read_* functions from X.1, X.2, ... to X0, X1, ... close pandas-dev#2000
  TST: make test suite pass cleanly on python 3 with no matplotlib
  BUG: datetime64 formatting issues in DataFrame.to_csv. close pandas-dev#1993
  ...
  • Loading branch information
yarikoptic committed Oct 8, 2012
2 parents e654346 + b5956fd commit 9084be0
Show file tree
Hide file tree
Showing 30 changed files with 646 additions and 125 deletions.
9 changes: 8 additions & 1 deletion RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Where to get it
pandas 0.9.0
============

**Release date:** NOT YET RELEASED
**Release date:** 10/7/2012

**New features**

Expand All @@ -36,9 +36,11 @@ pandas 0.9.0
Finance (#1748, #1739)
- Recognize and convert more boolean values in file parsing (Yes, No, TRUE,
FALSE, variants thereof) (#1691, #1295)
- Add Panel.update method, analogous to DataFrame.update (#1999, #1988)

**Improvements to existing features**

- Proper handling of NA values in merge operations (#1990)
- Add ``flags`` option for ``re.compile`` in some Series.str methods (#1659)
- Parsing of UTC date strings in read_* functions (#1693)
- Handle generator input to Series (#1679)
Expand All @@ -62,6 +64,8 @@ pandas 0.9.0

**API Changes**

- Change default header names in read_* functions to more Pythonic X0, X1,
etc. instead of X.1, X.2. (#2000)
- Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
(#1723)
- Don't modify NumPy suppress printoption at import time
Expand Down Expand Up @@ -240,6 +244,9 @@ pandas 0.9.0
- Fix BlockManager.iget bug when dealing with non-unique MultiIndex as columns
(#1970)
- Fix reset_index bug if both drop and level are specified (#1957)
- Work around unsafe NumPy object->int casting with Cython function (#1987)
- Fix datetime64 formatting bug in DataFrame.to_csv (#1993)
- Default start date in pandas.io.data to 1/1/2000 as the docs say (#2011)


pandas 0.8.1
Expand Down
4 changes: 2 additions & 2 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -397,8 +397,8 @@ available:
:widths: 20, 80

``ewma``, EW moving average
``ewvar``, EW moving variance
``ewstd``, EW moving standard deviation
``ewmvar``, EW moving variance
``ewmstd``, EW moving standard deviation
``ewmcorr``, EW moving correlation
``ewmcov``, EW moving covariance

Expand Down
85 changes: 58 additions & 27 deletions doc/source/v0.9.0.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _whatsnew_0900:

v0.9.0 (September 25, 2012)
---------------------------
v0.9.0 (October 7, 2012)
------------------------

This is a major release from 0.8.1 and includes several new features and
enhancements along with a large number of bug fixes. New features include
Expand Down Expand Up @@ -30,31 +30,62 @@ New features
API changes
~~~~~~~~~~~

- Creating a Series from another Series, passing an index, will cause
reindexing to happen inside rather than treating the Series like an
ndarray. Technically improper usages like Series(df[col1], index=df[col2])
that worked before "by accident" (this was never intended) will lead to all
NA Series in some cases.
- Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
(GH1723_)
- Don't modify NumPy suppress printoption to True at import time
- The internal HDF5 data arrangement for DataFrames has been transposed.
Legacy files will still be readable by HDFStore (GH1834_, GH1824_)
- Legacy cruft removed: pandas.stats.misc.quantileTS
- Use ISO8601 format for Period repr: monthly, daily, and on down (GH1776_)
- Empty DataFrame columns are now created as object dtype. This will prevent
a class of TypeErrors that was occurring in code where the dtype of a
column would depend on the presence of data or not (e.g. a SQL query having
results) (GH1783_)
- Setting parts of DataFrame/Panel using ix now aligns input Series/DataFrame
(GH1630_)
- ``first`` and ``last`` methods in ``GroupBy`` no longer drop non-numeric
columns (GH1809_)
- Resolved inconsistencies in specifying custom NA values in text parser.
`na_values` of type dict no longer override default NAs unless
`keep_default_na` is set to false explicitly (GH1657_)
- DataFrame.dot will not do data alignment, and also work with Series
(GH1915_)
- The default column names when ``header=None`` and no columns names passed to
functions like ``read_csv`` has changed to be more Pythonic and amenable to
attribute access:

.. ipython:: python

from StringIO import StringIO

data = '0,0,1\n1,1,0\n0,1,0'
df = read_csv(StringIO(data), header=None)
df


- Creating a Series from another Series, passing an index, will cause reindexing
to happen inside rather than treating the Series like an ndarray. Technically
improper usages like ``Series(df[col1], index=df[col2])11 that worked before
"by accident" (this was never intended) will lead to all NA Series in some
cases. To be perfectly clear:

.. ipython:: python

s1 = Series([1, 2, 3])
s1

s2 = Series(s1, index=['foo', 'bar', 'baz'])
s2

- Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
(GH1723_)

- Don't modify NumPy suppress printoption to True at import time

- The internal HDF5 data arrangement for DataFrames has been transposed. Legacy
files will still be readable by HDFStore (GH1834_, GH1824_)

- Legacy cruft removed: pandas.stats.misc.quantileTS

- Use ISO8601 format for Period repr: monthly, daily, and on down (GH1776_)

- Empty DataFrame columns are now created as object dtype. This will prevent a
class of TypeErrors that was occurring in code where the dtype of a column
would depend on the presence of data or not (e.g. a SQL query having results)
(GH1783_)

- Setting parts of DataFrame/Panel using ix now aligns input Series/DataFrame
(GH1630_)

- ``first`` and ``last`` methods in ``GroupBy`` no longer drop non-numeric
columns (GH1809_)

- Resolved inconsistencies in specifying custom NA values in text parser.
``na_values`` of type dict no longer override default NAs unless
``keep_default_na`` is set to false explicitly (GH1657_)

- ``DataFrame.dot`` will not do data alignment, and also work with Series
(GH1915_)


See the `full release notes
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -786,7 +786,7 @@ def is_list_like(arg):


def _astype_nansafe(arr, dtype):
if isinstance(dtype, basestring):
if not isinstance(dtype, np.dtype):
dtype = np.dtype(dtype)

if issubclass(arr.dtype.type, np.datetime64):
Expand All @@ -797,6 +797,9 @@ def _astype_nansafe(arr, dtype):

if np.isnan(arr).any():
raise ValueError('Cannot convert NA to integer')
elif arr.dtype == np.object_ and np.issubdtype(dtype.type, np.integer):
# work around NumPy brokenness, #1987
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)

return arr.astype(dtype)

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ def to_latex(self, force_unicode=False, column_format=None):
if column_format is None:
column_format = '|l|%s|' % '|'.join('c' for _ in strcols)
else:
assert isinstance(column_format, str)
assert isinstance(column_format, basestring)

self.buf.write('\\begin{tabular}{%s}\n' % column_format)
self.buf.write('\\hline\n')
Expand Down
50 changes: 45 additions & 5 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1106,8 +1106,11 @@ def _helper_csvexcel(self, writer, na_rep=None, cols=None,
val = series[col][j]
if lib.checknull(val):
val = na_rep

if float_format is not None and com.is_float(val):
val = float_format % val
elif isinstance(val, np.datetime64):
val = lib.Timestamp(val)._repr_base

row_fields.append(val)

Expand Down Expand Up @@ -1366,7 +1369,7 @@ def info(self, verbose=True, buf=None):
counts = self.count()
assert(len(cols) == len(counts))
for col, count in counts.iteritems():
if not isinstance(col, (unicode, str)):
if not isinstance(col, basestring):
col = str(col)
lines.append(_put_str(col, space) +
'%d non-null values' % count)
Expand Down Expand Up @@ -2458,7 +2461,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
frame.index = index
return frame

def reset_index(self, level=None, drop=False, inplace=False):
def reset_index(self, level=None, drop=False, inplace=False, col_level=0,
col_fill=''):
"""
For DataFrame with multi-level index, return new DataFrame with
labeling information in the columns under the index names, defaulting
Expand All @@ -2476,6 +2480,13 @@ def reset_index(self, level=None, drop=False, inplace=False):
the index to the default integer index.
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
col_level : int or str, default 0
If the columns have multiple levels, determines which level the
labels are inserted into. By default it is inserted into the first
level.
col_fill : object, default ''
If the columns have multiple levels, determines how the other levels
are named. If None then the index name is repeated.
Returns
-------
Expand Down Expand Up @@ -2504,11 +2515,22 @@ def _maybe_cast(values):
names = self.index.names
zipped = zip(self.index.levels, self.index.labels)

multi_col = isinstance(self.columns, MultiIndex)
for i, (lev, lab) in reversed(list(enumerate(zipped))):
col_name = names[i]
if col_name is None:
col_name = 'level_%d' % i

if multi_col:
if col_fill is None:
col_name = tuple([col_name] *
self.columns.nlevels)
else:
name_lst = [col_fill] * self.columns.nlevels
lev_num = self.columns._get_level_number(col_level)
name_lst[lev_num] = col_name
col_name = tuple(name_lst)

# to ndarray and maybe infer different dtype
level_values = _maybe_cast(lev.values)
if level is None or i in level:
Expand All @@ -2518,6 +2540,14 @@ def _maybe_cast(values):
name = self.index.name
if name is None or name == 'index':
name = 'index' if 'index' not in self else 'level_0'
if isinstance(self.columns, MultiIndex):
if col_fill is None:
name = tuple([name] * self.columns.nlevels)
else:
name_lst = [col_fill] * self.columns.nlevels
lev_num = self.columns._get_level_number(col_level)
name_lst[lev_num] = name
name = tuple(name_lst)
new_obj.insert(0, name, _maybe_cast(self.index.values))

new_obj.index = new_index
Expand Down Expand Up @@ -2714,7 +2744,13 @@ def _m8_to_i8(x):
values = list(_m8_to_i8(self.values.T))
else:
if np.iterable(cols) and not isinstance(cols, basestring):
values = [_m8_to_i8(self[x].values) for x in cols]
if isinstance(cols, tuple):
if cols in self.columns:
values = [self[cols]]
else:
values = [_m8_to_i8(self[x].values) for x in cols]
else:
values = [_m8_to_i8(self[x].values) for x in cols]
else:
values = [self[cols]]

Expand Down Expand Up @@ -3359,7 +3395,7 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
Parameters
----------
other : DataFrame
other : DataFrame, or object coercible into a DataFrame
join : {'left', 'right', 'outer', 'inner'}, default 'left'
overwrite : boolean, default True
If True then overwrite values for common keys in the calling frame
Expand All @@ -3373,7 +3409,11 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
if join != 'left':
raise NotImplementedError

if not isinstance(other, DataFrame):
other = DataFrame(other)

other = other.reindex_like(self)

for col in self.columns:
this = self[col].values
that = other[col].values
Expand Down Expand Up @@ -4385,7 +4425,7 @@ def var(self, axis=0, skipna=True, level=None, ddof=1):

@Substitution(name='standard deviation', shortname='std',
na_action=_doc_exclude_na, extras='')
@Appender(_stat_doc +
@Appender(_stat_doc +
"""
Normalized by N-1 (unbiased estimator).
""")
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ def mean(self):

def median(self):
"""
Compute mean of groups, excluding missing values
Compute median of groups, excluding missing values
For multiple groupings, the result index will be a MultiIndex
"""
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ def _has_complex_internals(self):

def summary(self, name=None):
if len(self) > 0:
index_summary = ', %s to %s' % (str(self[0]), str(self[-1]))
index_summary = ', %s to %s' % (unicode(self[0]), unicode(self[-1]))
else:
index_summary = ''

Expand Down
30 changes: 30 additions & 0 deletions pandas/core/panel.py
Original file line number Diff line number Diff line change
Expand Up @@ -1318,6 +1318,36 @@ def join(self, other, how='left', lsuffix='', rsuffix=''):
return concat([self] + list(other), axis=0, join=how,
join_axes=join_axes, verify_integrity=True)

def update(self, other, join='left', overwrite=True, filter_func=None,
raise_conflict=False):
"""
Modify Panel in place using non-NA values from passed
Panel, or object coercible to Panel. Aligns on items
Parameters
----------
other : Panel, or object coercible to Panel
join : How to join individual DataFrames
{'left', 'right', 'outer', 'inner'}, default 'left'
overwrite : boolean, default True
If True then overwrite values for common keys in the calling panel
filter_func : callable(1d-array) -> 1d-array<boolean>, default None
Can choose to replace values other than NA. Return True for values
that should be updated
raise_conflict : bool
If True, will raise an error if a DataFrame and other both
contain data in the same place.
"""

if not isinstance(other, Panel):
other = Panel(other)

other = other.reindex(items=self.items)

for frame in self.items:
self[frame].update(other[frame], join, overwrite, filter_func,
raise_conflict)

def _get_join_index(self, other, how):
if how == 'left':
join_major, join_minor = self.major_axis, self.minor_axis
Expand Down
5 changes: 3 additions & 2 deletions pandas/io/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def _sanitize_dates(start, end):
start = to_datetime(start)
end = to_datetime(end)
if start is None:
start = dt.datetime.today() - dt.timedelta(365)
start = dt.datetime(2010, 1, 1)
if end is None:
end = dt.datetime.today()
return start, end
Expand Down Expand Up @@ -178,7 +178,8 @@ def get_data_fred(name=None, start=dt.datetime(2010, 1, 1),

url = fred_URL + '%s' % name + \
'/downloaddata/%s' % name + '.csv'
data = read_csv(urllib.urlopen(url), index_col=0, parse_dates=True)
data = read_csv(urllib.urlopen(url), index_col=0, parse_dates=True, header=None,
skiprows=1, names=["DATE", name])
return data.truncate(start, end)


Expand Down
Loading

0 comments on commit 9084be0

Please sign in to comment.