Merge tag 'v0.9.0' into debian

Version 0.9.0 * tag 'v0.9.0': (43 commits) RLS: Version 0.9.0 final Fix groupby.median documentation BUG: need extra slash on windows for file:// BUG: default pandas.io.data start date 1/1/2000 per docs. close pandas-dev#2011 clean up tests Allow DataFrame.update to accept non DataFrame object and attempt to coerce. ENH: Use given name for DataFrame column name for FRED API BLD: quiet tox warning about missing dep BUG: reset_index fails with MultiIndex in columns pandas-dev#2017 BUG: with_statement in test_console_encode() (3a11f00) broke 2.5 test suite BUG: dict comprehension in (af3e13c) broke 2.6 test suite BUG: Timestamp dayofyear returns day of month pandas-dev#2021 BUG: pandas breaks mpl plot_date DOC: update parsers header, names args doc BUG: read_csv regression, moved date parsing to before type conversions now so can parse yymmdd hhmm format now pandas-dev#1905 Fix naming of ewmvar and ewmstd in documentation DOC: whats new for pandas-dev#2000 ENH: change default header names in read_* functions from X.1, X.2, ... to X0, X1, ... close pandas-dev#2000 TST: make test suite pass cleanly on python 3 with no matplotlib BUG: datetime64 formatting issues in DataFrame.to_csv. close pandas-dev#1993 ...
neurodebian · Oct 8, 2012 · 9084be0 · 9084be0
2 parents e654346 + b5956fd
commit 9084be0
Show file tree

Hide file tree

Showing 30 changed files with 646 additions and 125 deletions.
diff --git a/RELEASE.rst b/RELEASE.rst
@@ -25,7 +25,7 @@ Where to get it
 pandas 0.9.0
 ============
 
-**Release date:** NOT YET RELEASED
+**Release date:** 10/7/2012
 
 **New features**
 
@@ -36,9 +36,11 @@ pandas 0.9.0
     Finance (#1748, #1739)
   - Recognize and convert more boolean values in file parsing (Yes, No, TRUE,
     FALSE, variants thereof) (#1691, #1295)
+  - Add Panel.update method, analogous to DataFrame.update (#1999, #1988)
 
 **Improvements to existing features**
 
+  - Proper handling of NA values in merge operations (#1990)
   - Add ``flags`` option for ``re.compile`` in some Series.str methods (#1659)
   - Parsing of UTC date strings in read_* functions (#1693)
   - Handle generator input to Series (#1679)
@@ -62,6 +64,8 @@ pandas 0.9.0
 
 **API Changes**
 
+  - Change default header names in read_* functions to more Pythonic X0, X1,
+    etc. instead of X.1, X.2. (#2000)
   - Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
     (#1723)
   - Don't modify NumPy suppress printoption at import time
@@ -240,6 +244,9 @@ pandas 0.9.0
   - Fix BlockManager.iget bug when dealing with non-unique MultiIndex as columns
     (#1970)
   - Fix reset_index bug if both drop and level are specified (#1957)
+  - Work around unsafe NumPy object->int casting with Cython function (#1987)
+  - Fix datetime64 formatting bug in DataFrame.to_csv (#1993)
+  - Default start date in pandas.io.data to 1/1/2000 as the docs say (#2011)
 
 
 pandas 0.8.1

diff --git a/doc/source/computation.rst b/doc/source/computation.rst
@@ -397,8 +397,8 @@ available:
     :widths: 20, 80
 
     ``ewma``, EW moving average
-    ``ewvar``, EW moving variance
-    ``ewstd``, EW moving standard deviation
+    ``ewmvar``, EW moving variance
+    ``ewmstd``, EW moving standard deviation
     ``ewmcorr``, EW moving correlation
     ``ewmcov``, EW moving covariance
 

diff --git a/doc/source/v0.9.0.txt b/doc/source/v0.9.0.txt
@@ -1,7 +1,7 @@
 .. _whatsnew_0900:
 
-v0.9.0 (September 25, 2012)
----------------------------
+v0.9.0 (October 7, 2012)
+------------------------
 
 This is a major release from 0.8.1 and includes several new features and
 enhancements along with a large number of bug fixes. New features include
@@ -30,31 +30,62 @@ New features
 API changes
 ~~~~~~~~~~~
 
-  - Creating a Series from another Series, passing an index, will cause
-    reindexing to happen inside rather than treating the Series like an
-    ndarray. Technically improper usages like Series(df[col1], index=df[col2])
-    that worked before "by accident" (this was never intended) will lead to all
-    NA Series in some cases.
-  - Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
-    (GH1723_)
-  - Don't modify NumPy suppress printoption to True at import time
-  - The internal HDF5 data arrangement for DataFrames has been transposed.
-    Legacy files will still be readable by HDFStore (GH1834_, GH1824_)
-  - Legacy cruft removed: pandas.stats.misc.quantileTS
-  - Use ISO8601 format for Period repr: monthly, daily, and on down (GH1776_)
-  - Empty DataFrame columns are now created as object dtype. This will prevent
-    a class of TypeErrors that was occurring in code where the dtype of a
-    column would depend on the presence of data or not (e.g. a SQL query having
-    results) (GH1783_)
-  - Setting parts of DataFrame/Panel using ix now aligns input Series/DataFrame
-    (GH1630_)
-  - ``first`` and ``last`` methods in ``GroupBy`` no longer drop non-numeric
-    columns (GH1809_)
-  - Resolved inconsistencies in specifying custom NA values in text parser.
-    `na_values` of type dict no longer override default NAs unless
-    `keep_default_na` is set to false explicitly (GH1657_)
-  - DataFrame.dot will not do data alignment, and also work with Series
-    (GH1915_)
+  - The default column names when ``header=None`` and no columns names passed to
+    functions like ``read_csv`` has changed to be more Pythonic and amenable to
+    attribute access:
+
+.. ipython:: python
+
+   from StringIO import StringIO
+
+   data = '0,0,1\n1,1,0\n0,1,0'
+   df = read_csv(StringIO(data), header=None)
+   df
+
+
+- Creating a Series from another Series, passing an index, will cause reindexing
+  to happen inside rather than treating the Series like an ndarray. Technically
+  improper usages like ``Series(df[col1], index=df[col2])11 that worked before
+  "by accident" (this was never intended) will lead to all NA Series in some
+  cases. To be perfectly clear:
+
+.. ipython:: python
+
+   s1 = Series([1, 2, 3])
+   s1
+
+   s2 = Series(s1, index=['foo', 'bar', 'baz'])
+   s2
+
+- Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
+  (GH1723_)
+
+- Don't modify NumPy suppress printoption to True at import time
+
+- The internal HDF5 data arrangement for DataFrames has been transposed.  Legacy
+  files will still be readable by HDFStore (GH1834_, GH1824_)
+
+- Legacy cruft removed: pandas.stats.misc.quantileTS
+
+- Use ISO8601 format for Period repr: monthly, daily, and on down (GH1776_)
+
+- Empty DataFrame columns are now created as object dtype. This will prevent a
+  class of TypeErrors that was occurring in code where the dtype of a column
+  would depend on the presence of data or not (e.g. a SQL query having results)
+  (GH1783_)
+
+- Setting parts of DataFrame/Panel using ix now aligns input Series/DataFrame
+  (GH1630_)
+
+- ``first`` and ``last`` methods in ``GroupBy`` no longer drop non-numeric
+  columns (GH1809_)
+
+- Resolved inconsistencies in specifying custom NA values in text parser.
+  ``na_values`` of type dict no longer override default NAs unless
+  ``keep_default_na`` is set to false explicitly (GH1657_)
+
+- ``DataFrame.dot`` will not do data alignment, and also work with Series
+  (GH1915_)
 
 
 See the `full release notes

diff --git a/pandas/core/common.py b/pandas/core/common.py
@@ -786,7 +786,7 @@ def is_list_like(arg):
 
 
 def _astype_nansafe(arr, dtype):
-    if isinstance(dtype, basestring):
+    if not isinstance(dtype, np.dtype):
         dtype = np.dtype(dtype)
 
     if issubclass(arr.dtype.type, np.datetime64):
@@ -797,6 +797,9 @@ def _astype_nansafe(arr, dtype):
 
         if np.isnan(arr).any():
             raise ValueError('Cannot convert NA to integer')
+    elif arr.dtype == np.object_ and np.issubdtype(dtype.type, np.integer):
+        # work around NumPy brokenness, #1987
+        return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
 
     return arr.astype(dtype)
 

diff --git a/pandas/core/format.py b/pandas/core/format.py
@@ -298,7 +298,7 @@ def to_latex(self, force_unicode=False, column_format=None):
         if column_format is None:
             column_format = '|l|%s|' % '|'.join('c' for _ in strcols)
         else:
-            assert isinstance(column_format, str)
+            assert isinstance(column_format, basestring)
 
         self.buf.write('\\begin{tabular}{%s}\n' % column_format)
         self.buf.write('\\hline\n')

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -1106,8 +1106,11 @@ def _helper_csvexcel(self, writer, na_rep=None, cols=None,
                 val = series[col][j]
                 if lib.checknull(val):
                     val = na_rep
+
                 if float_format is not None and com.is_float(val):
                     val = float_format % val
+                elif isinstance(val, np.datetime64):
+                    val = lib.Timestamp(val)._repr_base
 
                 row_fields.append(val)
 
@@ -1366,7 +1369,7 @@ def info(self, verbose=True, buf=None):
             counts = self.count()
             assert(len(cols) == len(counts))
             for col, count in counts.iteritems():
-                if not isinstance(col, (unicode, str)):
+                if not isinstance(col, basestring):
                     col = str(col)
                 lines.append(_put_str(col, space) +
                              '%d  non-null values' % count)
@@ -2458,7 +2461,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
         frame.index = index
         return frame
 
-    def reset_index(self, level=None, drop=False, inplace=False):
+    def reset_index(self, level=None, drop=False, inplace=False, col_level=0,
+                    col_fill=''):
         """
         For DataFrame with multi-level index, return new DataFrame with
         labeling information in the columns under the index names, defaulting
@@ -2476,6 +2480,13 @@ def reset_index(self, level=None, drop=False, inplace=False):
             the index to the default integer index.
         inplace : boolean, default False
             Modify the DataFrame in place (do not create a new object)
+        col_level : int or str, default 0
+            If the columns have multiple levels, determines which level the
+            labels are inserted into. By default it is inserted into the first
+            level.
+        col_fill : object, default ''
+            If the columns have multiple levels, determines how the other levels
+            are named. If None then the index name is repeated.
 
         Returns
         -------
@@ -2504,11 +2515,22 @@ def _maybe_cast(values):
                 names = self.index.names
                 zipped = zip(self.index.levels, self.index.labels)
 
+                multi_col = isinstance(self.columns, MultiIndex)
                 for i, (lev, lab) in reversed(list(enumerate(zipped))):
                     col_name = names[i]
                     if col_name is None:
                         col_name = 'level_%d' % i
 
+                    if multi_col:
+                        if col_fill is None:
+                            col_name = tuple([col_name] *
+                                             self.columns.nlevels)
+                        else:
+                            name_lst = [col_fill] * self.columns.nlevels
+                            lev_num = self.columns._get_level_number(col_level)
+                            name_lst[lev_num] = col_name
+                            col_name = tuple(name_lst)
+
                     # to ndarray and maybe infer different dtype
                     level_values = _maybe_cast(lev.values)
                     if level is None or i in level:
@@ -2518,6 +2540,14 @@ def _maybe_cast(values):
             name = self.index.name
             if name is None or name == 'index':
                 name = 'index' if 'index' not in self else 'level_0'
+            if isinstance(self.columns, MultiIndex):
+                if col_fill is None:
+                    name = tuple([name] * self.columns.nlevels)
+                else:
+                    name_lst = [col_fill] * self.columns.nlevels
+                    lev_num = self.columns._get_level_number(col_level)
+                    name_lst[lev_num] = name
+                    name = tuple(name_lst)
             new_obj.insert(0, name, _maybe_cast(self.index.values))
 
         new_obj.index = new_index
@@ -2714,7 +2744,13 @@ def _m8_to_i8(x):
             values = list(_m8_to_i8(self.values.T))
         else:
             if np.iterable(cols) and not isinstance(cols, basestring):
-                values = [_m8_to_i8(self[x].values) for x in cols]
+                if isinstance(cols, tuple):
+                    if cols in self.columns:
+                        values = [self[cols]]
+                    else:
+                        values = [_m8_to_i8(self[x].values) for x in cols]
+                else:
+                    values = [_m8_to_i8(self[x].values) for x in cols]
             else:
                 values = [self[cols]]
 
@@ -3359,7 +3395,7 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
 
         Parameters
         ----------
-        other : DataFrame
+        other : DataFrame, or object coercible into a DataFrame
         join : {'left', 'right', 'outer', 'inner'}, default 'left'
         overwrite : boolean, default True
             If True then overwrite values for common keys in the calling frame
@@ -3373,7 +3409,11 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
         if join != 'left':
             raise NotImplementedError
 
+        if not isinstance(other, DataFrame):
+            other = DataFrame(other)
+
         other = other.reindex_like(self)
+
         for col in self.columns:
             this = self[col].values
             that = other[col].values
@@ -4385,7 +4425,7 @@ def var(self, axis=0, skipna=True, level=None, ddof=1):
 
     @Substitution(name='standard deviation', shortname='std',
                   na_action=_doc_exclude_na, extras='')
-    @Appender(_stat_doc + 
+    @Appender(_stat_doc +
         """
         Normalized by N-1 (unbiased estimator).
         """)

diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
@@ -303,7 +303,7 @@ def mean(self):
 
     def median(self):
         """
-        Compute mean of groups, excluding missing values
+        Compute median of groups, excluding missing values
 
         For multiple groupings, the result index will be a MultiIndex
         """

diff --git a/pandas/core/index.py b/pandas/core/index.py
@@ -196,7 +196,7 @@ def _has_complex_internals(self):
 
     def summary(self, name=None):
         if len(self) > 0:
-            index_summary = ', %s to %s' % (str(self[0]), str(self[-1]))
+            index_summary = ', %s to %s' % (unicode(self[0]), unicode(self[-1]))
         else:
             index_summary = ''
 

diff --git a/pandas/core/panel.py b/pandas/core/panel.py
@@ -1318,6 +1318,36 @@ def join(self, other, how='left', lsuffix='', rsuffix=''):
             return concat([self] + list(other), axis=0, join=how,
                           join_axes=join_axes, verify_integrity=True)
 
+    def update(self, other, join='left', overwrite=True, filter_func=None,
+                     raise_conflict=False):
+        """
+        Modify Panel in place using non-NA values from passed
+        Panel, or object coercible to Panel. Aligns on items
+
+        Parameters
+        ----------
+        other : Panel, or object coercible to Panel
+        join : How to join individual DataFrames
+            {'left', 'right', 'outer', 'inner'}, default 'left'
+        overwrite : boolean, default True
+            If True then overwrite values for common keys in the calling panel
+        filter_func : callable(1d-array) -> 1d-array<boolean>, default None
+            Can choose to replace values other than NA. Return True for values
+            that should be updated
+        raise_conflict : bool
+            If True, will raise an error if a DataFrame and other both
+            contain data in the same place.
+        """
+
+        if not isinstance(other, Panel):
+            other = Panel(other)
+
+        other = other.reindex(items=self.items)
+
+        for frame in self.items:
+            self[frame].update(other[frame], join, overwrite, filter_func,
+                               raise_conflict)
+
     def _get_join_index(self, other, how):
         if how == 'left':
             join_major, join_minor = self.major_axis, self.minor_axis

diff --git a/pandas/io/data.py b/pandas/io/data.py
@@ -67,7 +67,7 @@ def _sanitize_dates(start, end):
     start = to_datetime(start)
     end = to_datetime(end)
     if start is None:
-        start = dt.datetime.today() - dt.timedelta(365)
+        start = dt.datetime(2010, 1, 1)
     if end is None:
         end = dt.datetime.today()
     return start, end
@@ -178,7 +178,8 @@ def get_data_fred(name=None, start=dt.datetime(2010, 1, 1),
 
     url = fred_URL + '%s' % name + \
       '/downloaddata/%s' % name + '.csv'
-    data = read_csv(urllib.urlopen(url), index_col=0, parse_dates=True)
+    data = read_csv(urllib.urlopen(url), index_col=0, parse_dates=True, header=None,
+                    skiprows=1, names=["DATE", name])
     return data.truncate(start, end)