Merge commit 'v0.8.0rc2-26-g76c6351' into debian-0.8

* commit 'v0.8.0rc2-26-g76c6351': (42 commits) BUG/TST: typo caused read_csv to lose index name pandas-dev#1536 BUG: incorrect tick label positions pandas-dev#1531 (zooming is still wrong) ENH: register converters with matplotlib for better datetime convesion ENH: handle datetime.date in Period constructor DOC: small doc for pandas-dev#1450 BUG: repr of pre-1900 datetime64 values in a DataFrame column close pandas-dev#1518 BUG: workaround vstack/concat bug in numpy 1.6 pandas-dev#1518 DOC: lreshape docstring, release note ENH: experimental lreshape function BUG: plotting DataFrame with freq with offset BUG: DataFrame plotting with inferred freq BUG: timedelta.total_seconds only in 2.7 and 3.2 DOC: release notes ENH: Add raise on conflict keyword to update DOC: release notes re: pandas-dev#921 overload header keyword instead of extra col_aliases keyword ENH: column aliases for to_csv/to_excel pandas-dev#921 ENH: handle weekly resampling via daily BUG: plot mixed frequencies pandas-dev#1517 BUG/TST: plot irregular and reg freq on same subplot ...
neurodebian · Jun 27, 2012 · 1f2b250 · 1f2b250
2 parents 1c08383 + 76c6351
commit 1f2b250
Show file tree

Hide file tree

Showing 43 changed files with 1,091 additions and 199 deletions.
diff --git a/RELEASE.rst b/RELEASE.rst
@@ -25,7 +25,7 @@ Where to get it
 pandas 0.8.0
 ============
 
-**Release date:** NOT YET RELEASED
+**Release date:** 6/26/2012
 
 **New features**
 
@@ -43,7 +43,7 @@ pandas 0.8.0
     conversion method (#1018)
   - Implement robust frequency inference function and `inferred_freq` attribute
     on DatetimeIndex (#391)
-  - New ``tz_convert`` methods in Series / DataFrame
+  - New ``tz_convert`` and ``tz_localize`` methods in Series / DataFrame
   - Convert DatetimeIndexes to UTC if time zones are different in join/setops
     (#864)
   - Add limit argument for forward/backward filling to reindex, fillna,
@@ -86,7 +86,10 @@ pandas 0.8.0
   - Add lag plot (#1440)
   - Add autocorrelation_plot (#1425)
   - Add support for tox and Travis CI (#1382)
-  - Add support for ordered factors and use in GroupBy (#292)
+  - Add support for Categorical use in GroupBy (#292)
+  - Add ``any`` and ``all`` methods to DataFrame (#1416)
+  - Add ``secondary_y`` option to Series.plot
+  - Add experimental ``lreshape`` function for reshaping wide to long
 
 **Improvements to existing features**
 
@@ -124,9 +127,20 @@ pandas 0.8.0
   - Add ``convert_dtype`` option to Series.apply to be able to leave data as
     dtype=object (#1414)
   - Can specify all index level names in concat (#1419)
+  - Add ``dialect`` keyword to parsers for quoting conventions (#1363)
+  - Enable DataFrame[bool_DataFrame] += value (#1366)
+  - Add ``retries`` argument to ``get_data_yahoo`` to try to prevent Yahoo! API
+    404s (#826)
+  - Improve performance of reshaping by using O(N) categorical sorting
+  - Series names will be used for index of DataFrame if no index passed (#1494)
+  - Header argument in DataFrame.to_csv can accept a list of column names to
+    use instead of the object's columns (#921)
+  - Add ``raise_conflict`` argument to DataFrame.update (#1526)
 
 **API Changes**
 
+  - Rename Factor to Categorical and add improvements. Numerous Categorical bug
+    fixes
   - Frequency name overhaul, WEEKDAY/EOM and rules with @
     deprecated. get_legacy_offset_name backwards compatibility function added
   - Raise ValueError in DataFrame.__nonzero__, so "if df" no longer works
@@ -190,6 +204,11 @@ pandas 0.8.0
   - Fix outer/inner DataFrame.join with non-unique indexes (#1421)
   - Fix MultiIndex groupby bugs with empty lower levels (#1401)
   - Calling fillna with a Series will have same behavior as with dict (#1486)
+  - SparseSeries reduction bug (#1375)
+  - Fix unicode serialization issue in HDFStore (#1361)
+  - Pass keywords to pyplot.boxplot in DataFrame.boxplot (#1493)
+  - Bug fixes in MonthBegin (#1483)
+  - Preserve MultiIndex names in drop (#1513)
 
 pandas 0.7.3
 ============

diff --git a/doc/source/gotchas.rst b/doc/source/gotchas.rst
@@ -217,3 +217,27 @@ passed in the index, thus finding the integers ``0`` and ``1``. While it would
 be possible to insert some logic to check whether a passed sequence is all
 contained in the index, that logic would exact a very high cost in large data
 sets.
+
+Timestamp limitations
+---------------------
+
+Minimum and maximum timestamps
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since pandas represents timestamps in nanosecond resolution, the timespan that
+can be represented using a 64-bit integer is limited to approximately 584 years:
+
+.. ipython:: python
+
+   begin = Timestamp(-9223285636854775809L)
+   begin
+   end = Timestamp(np.iinfo(np.int64).max)
+   end
+
+If you need to represent time series data outside the nanosecond timespan, use
+PeriodIndex:
+
+.. ipython:: python
+
+   span = period_range('1215-01-01', '1381-01-01', freq='D')
+   span
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -83,8 +83,10 @@ data into a DataFrame object. They can take a number of arguments:
     as the index.
   - ``names``: List of column names to use. If passed, header will be
     implicitly set to None.
-  - ``na_values``: optional list of strings to recognize as NaN (missing values),
-    in addition to a default set.
+  - ``na_values``: optional list of strings to recognize as NaN (missing
+    values), in addition to a default set. If you pass an empty list or an
+    empty list for a particular column, no values (including empty strings)
+    will be considered NA
   - ``parse_dates``: if True then index will be parsed as dates
     (False by default). You can specify more complicated options to parse
     a subset of columns or a combination of columns into a single date column

diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst
@@ -297,7 +297,7 @@ We could have done the same thing with ``DateOffset``:
 
 .. ipython:: python
 
-   from pandas.core.datetools import *
+   from pandas.tseries.offsets import *
    d + DateOffset(months=4, days=5)
 
 The key features of a ``DateOffset`` object are:

diff --git a/doc/source/whatsnew/v0.4.x.txt → doc/source/v0.4.x.txt b/doc/source/whatsnew/v0.4.x.txt → doc/source/v0.4.x.txt
diff --git a/doc/source/whatsnew/v0.5.0.txt → doc/source/v0.5.0.txt b/doc/source/whatsnew/v0.5.0.txt → doc/source/v0.5.0.txt
diff --git a/doc/source/whatsnew/v0.6.0.txt → doc/source/v0.6.0.txt b/doc/source/whatsnew/v0.6.0.txt → doc/source/v0.6.0.txt
diff --git a/doc/source/whatsnew/v0.6.1.txt → doc/source/v0.6.1.txt b/doc/source/whatsnew/v0.6.1.txt → doc/source/v0.6.1.txt
diff --git a/doc/source/whatsnew/v0.7.0.txt → doc/source/v0.7.0.txt b/doc/source/whatsnew/v0.7.0.txt → doc/source/v0.7.0.txt
diff --git a/doc/source/whatsnew/v0.7.1.txt → doc/source/v0.7.1.txt b/doc/source/whatsnew/v0.7.1.txt → doc/source/v0.7.1.txt
diff --git a/doc/source/whatsnew/v0.7.2.txt → doc/source/v0.7.2.txt b/doc/source/whatsnew/v0.7.2.txt → doc/source/v0.7.2.txt
diff --git a/doc/source/whatsnew/v0.7.3.txt → doc/source/v0.7.3.txt b/doc/source/whatsnew/v0.7.3.txt → doc/source/v0.7.3.txt
diff --git a/doc/source/whatsnew/v0.8.0.txt → doc/source/v0.8.0.txt b/doc/source/whatsnew/v0.8.0.txt → doc/source/v0.8.0.txt
@@ -67,15 +67,16 @@ Time series changes and improvements
   PeriodIndex and DatetimeIndex
 - New Timestamp data type subclasses `datetime.datetime`, providing the same
   interface while enabling working with nanosecond-resolution data. Also
-  provides **easy time zone conversions**
-- Enhanced support for **time zones**. Add `tz_convert` methods to TimeSeries
-  and DataFrame. All timestamps are stored as UTC; Timestamps from
-  DatetimeIndex objects with time zone set will be localized to localtime. Time
-  zone conversions are therefore essentially free. User needs to know very
-  little about pytz library now; only time zone names as as strings are
-  required. Timestamps are equal if and only if their UTC timestamps
-  match. Operations between time series with different time zones will result
-  in a UTC-indexed time series
+  provides :ref:`easy time zone conversions <timeseries.timezone>`.
+- Enhanced support for :ref:`time zones <timeseries.timezone>`. Add
+  `tz_convert` and ``tz_lcoalize`` methods to TimeSeries and DataFrame. All
+  timestamps are stored as UTC; Timestamps from DatetimeIndex objects with time
+  zone set will be localized to localtime. Time zone conversions are therefore
+  essentially free. User needs to know very little about pytz library now; only
+  time zone names as as strings are required. Time zone-aware timestamps are
+  equal if and only if their UTC timestamps match. Operations between time
+  zone-aware time series with different time zones will result in a UTC-indexed
+  time series.
 - Time series **string indexing conveniences** / shortcuts: slice years, year
   and month, and index values with strings
 - Enhanced time series **plotting**; adaptation of scikits.timeseries
@@ -111,8 +112,11 @@ index duplication in many-to-many joins)
 Other new features
 ~~~~~~~~~~~~~~~~~~
 
-- New :ref:`cut <reshaping.tile.cut>` function (like R's cut function) for
-  computing a categorical variable from a continuous variable by binning values
+- New :ref:`cut <reshaping.tile.cut>` and ``qcut`` functions (like R's cut
+  function) for computing a categorical variable from a continuous variable by
+  binning values either into value-based (``cut``) or quantile-based (``qcut``)
+  bins
+- Rename ``Factor`` to ``Categorical`` and add a number of usability features
 - Add :ref:`limit <missing_data.fillna.limit>` argument to fillna/reindex
 - More flexible multiple function application in GroupBy, and can pass list
   (name, function) tuples to get result in particular order with given names
@@ -133,8 +137,8 @@ Other new features
   memory usage than Python's dict
 - Add first, last, min, max, and prod optimized GroupBy functions
 - New :ref:`ordered_merge <merging.ordered_merge>` function
-- Add flexible :ref:`comparison <basics.binop>` instance methods eq, ne, lt, gt, etc. to DataFrame,
-  Series
+- Add flexible :ref:`comparison <basics.binop>` instance methods eq, ne, lt,
+  gt, etc. to DataFrame, Series
 - Improve :ref:`scatter_matrix <visualization.scatter_matrix>` plotting
   function and add histogram or kernel density estimates to diagonal
 - Add :ref:`'kde' <visualization.kde>` plot option for density plots
@@ -146,6 +150,42 @@ Other new features
 - Can select multiple columns from GroupBy
 - Add :ref:`update <merging.combine_first.update>` methods to Series/DataFrame
   for updating values in place
+- Add ``any`` and ``all method to DataFrame
+
+New plotting methods
+~~~~~~~~~~~~~~~~~~~~
+
+.. ipython:: python
+   :suppress:
+
+   import pandas as pd
+   fx = pd.load('data/fx_prices')
+   import matplotlib.pyplot as plt
+
+``Series.plot`` now supports a ``secondary_y`` option:
+
+.. ipython:: python
+
+   plt.figure()
+
+   fx['FR'].plot(style='g')
+
+   @savefig whatsnew_secondary_y.png width=4.5in
+   fx['IT'].plot(style='k--', secondary_y=True)
+
+Vytautas Jancauskas, the 2012 GSOC participant, has added many new plot
+types. For example, ``'kde'`` is a new option:
+
+.. ipython:: python
+
+   s = Series(np.concatenate((np.random.randn(1000),
+                              np.random.randn(1000) * 0.5 + 3)))
+   plt.figure()
+   s.hist(normed=True, alpha=0.2)
+   @savefig whatsnew_kde.png width=4.5in
+   s.plot(kind='kde')
+
+See :ref:`the plotting page <visualization.other>` for much more.
 
 Other API changes
 ~~~~~~~~~~~~~~~~~

diff --git a/doc/source/visualization.rst b/doc/source/visualization.rst
@@ -91,6 +91,20 @@ You may pass ``logy`` to get a log-scale Y axis.
    @savefig series_plot_logy.png width=4.5in
    ts.plot(logy=True)
 
+Plotting on a Secondary Y-axis
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To plot data on a secondary y-axis, use the ``secondary_y`` keyword:
+
+.. ipython:: python
+
+   plt.figure()
+
+   df.A.plot()
+
+   @savefig series_plot_secondary_y.png width=4.5in
+   df.B.plot(secondary_y=True, style='g')
+
 
 Targeting different subplots
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -107,6 +121,8 @@ You can pass an ``ax`` argument to ``Series.plot`` to plot on a particular axis:
    @savefig series_plot_multi.png width=4.5in
    df['D'].plot(ax=axes[1,1]); axes[1,1].set_title('D')
 
+.. _visualization.other:
+
 Other plotting features
 -----------------------
 

diff --git a/doc/source/whatsnew.rst b/doc/source/whatsnew.rst
@@ -16,21 +16,21 @@ What's New
 
 These are new features and improvements of note in each release.
 
-.. include:: whatsnew/v0.8.0.txt
+.. include:: v0.8.0.txt
 
-.. include:: whatsnew/v0.7.3.txt
+.. include:: v0.7.3.txt
 
-.. include:: whatsnew/v0.7.2.txt
+.. include:: v0.7.2.txt
 
-.. include:: whatsnew/v0.7.1.txt
+.. include:: v0.7.1.txt
 
-.. include:: whatsnew/v0.7.0.txt
+.. include:: v0.7.0.txt
 
-.. include:: whatsnew/v0.6.1.txt
+.. include:: v0.6.1.txt
 
-.. include:: whatsnew/v0.6.0.txt
+.. include:: v0.6.0.txt
 
-.. include:: whatsnew/v0.5.0.txt
+.. include:: v0.5.0.txt
 
-.. include:: whatsnew/v0.4.x.txt
+.. include:: v0.4.x.txt
 
diff --git a/pandas/core/api.py b/pandas/core/api.py
@@ -15,7 +15,8 @@
 from pandas.core.frame import DataFrame
 from pandas.core.panel import Panel
 from pandas.core.groupby import groupby
-from pandas.core.reshape import pivot_simple as pivot, get_dummies
+from pandas.core.reshape import (pivot_simple as pivot, get_dummies,
+                                 lreshape)
 
 WidePanel = Panel
 

diff --git a/pandas/core/common.py b/pandas/core/common.py
@@ -914,3 +914,15 @@ def writerow(self, row):
             self.stream.write(data)
             # empty queue
             self.queue.truncate(0)
+
+
+_NS_DTYPE = np.dtype('M8[ns]')
+
+def _concat_compat(to_concat):
+    if all(x.dtype == _NS_DTYPE for x in to_concat):
+        # work around NumPy 1.6 bug
+        new_values = np.concatenate([x.view(np.int64) for x in to_concat])
+        return new_values.view(_NS_DTYPE)
+    else:
+        return np.concatenate(to_concat)
+
diff --git a/pandas/core/format.py b/pandas/core/format.py
@@ -594,19 +594,7 @@ def _format_datetime64(x, tz=None):
         return 'NaT'
 
     stamp = lib.Timestamp(x, tz=tz)
-    base = stamp.strftime('%Y-%m-%d %H:%M:%S')
-
-    fraction = stamp.microsecond * 1000 + stamp.nanosecond
-    digits = 9
-
-    if fraction == 0:
-        return base
-
-    while (fraction % 10) == 0:
-        fraction /= 10
-        digits -= 1
-
-    return base + ('.%%.%id' % digits) % fraction
+    return stamp._repr_base
 
 
 def _make_fixed_width(strings, justify='right'):