Errors with nosetests-3.3 pandas.io.tests.test_yahoo.TestYahoo pandas, version b0ee363408336 #2847

selasley · 2013-02-12T02:20:34Z

I think get_quote_yahoo and get_components_yahoo in data.py need to decode the bytes returned by urllib.request.urlopen(urlStr).readlines() and urllib.urlopen(urlStr).read().strip().strip('"').split('"\r\n"') in python3.3

Also, the expected INTC values in test_get_data() in test_yahoo.py do not agree with the values Yahoo returns. The Adj Close value for 1-18-12 is 24.28 and the calculated Ret_Index values do not agree with the values in the d array.

The tests pass after making these changes. I have not tested the changes in python2.x

diff --git a/pandas/io/data.py b/pandas/io/data.py
index 5e92fca..dfba151 100644
--- a/pandas/io/data.py
+++ b/pandas/io/data.py
@@ -115,7 +115,7 @@ def get_quote_yahoo(symbols):
         return None

     for line in lines:
-        fields = line.strip().split(',')
+        fields = line.decode().strip().split(',')
         for i, field in enumerate(fields):
             if field[-2:] == '%"':
                 data[header[i]].append(float(field.strip('"%')))
@@ -241,7 +241,7 @@ def get_components_yahoo(idx_sym):
     #break when no new components are found
     while (True in mask):
         urlStr = url.format(idx_mod, stats,  comp_idx)
-        lines = (urllib.urlopen(urlStr).read().strip().
+        lines = (urllib.urlopen(urlStr).read().decode().strip().
                  strip('"').split('"\r\n"'))

         lines = [line.strip().split('","') for line in lines]


diff --git a/pandas/io/tests/test_yahoo.py b/pandas/io/tests/test_yahoo.py
index 1f25e3c..d07c5fd 100644
--- a/pandas/io/tests/test_yahoo.py
+++ b/pandas/io/tests/test_yahoo.py
@@ -92,15 +92,15 @@ class TestYahoo(unittest.TestCase):

         pan = web.get_data_yahoo(dfi, 'JAN-01-12', 'JAN-31-12',
                                  adjust_price=True)
-        expected = [18.38, 27.45, 24.54]
+        expected = [18.38, 27.45, 24.28]
         result = pan.Close.ix['01-18-12'][['GE', 'MSFT', 'INTC']].tolist()
         assert result == expected

         pan = web.get_data_yahoo(dfi, '2011', ret_index=True)
-        d = [[ 1.01757469,  1.01130524,  1.02414183],
-             [ 1.00292912,  1.00770812,  1.01735194],
-             [ 1.00820152,  1.00462487,  1.01320257],
-             [ 1.08025776,  0.99845838,  1.00113165]]
+        d = [[ 1.01757469,  1.01142857,  1.02414183],
+             [ 1.00292912,  1.00779221,  1.01735194],
+             [ 1.00820152,  1.00519481,  1.01320257],
+             [ 1.08025776,  0.99896104,  1.00113165]]

         expected = pd.DataFrame(d)
         result = pan.Ret_Index.ix['01-18-11':'01-21-11'][['GE', 'INTC', 'MSFT']]

original errors
nosetests-3.3 pandas.io.tests.test_yahoo.TestYahoo pandas

.EEE

ERROR: test_get_quote (pandas.io.tests.test_yahoo.TestYahoo)

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/tests/test_yahoo.py", line 49, in test_get_quote
df = web.get_quote_yahoo(pd.Series(['GOOG', 'AAPL', 'GOOG']))
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/data.py", line 118, in get_quote_yahoo
fields = line.strip().split(',')
TypeError: Type str doesn't support the buffer API

ERROR: test_get_components (pandas.io.tests.test_yahoo.TestYahoo)

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/tests/test_yahoo.py", line 57, in test_get_components
df = web.get_components_yahoo('^DJI') #Dow Jones
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/data.py", line 245, in get_components_yahoo
strip('"').split('"\r\n"'))
TypeError: Type str doesn't support the buffer API

ERROR: test_get_data (pandas.io.tests.test_yahoo.TestYahoo)

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/tests/test_yahoo.py", line 87, in test_get_data
dfi = web.get_components_yahoo('^DJI')
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/data.py", line 245, in get_components_yahoo
strip('"').split('"\r\n"'))
TypeError: Type str doesn't support the buffer API

Ran 4 tests in 0.962s

FAILED (errors=3)

nehalecky · 2013-02-12T05:52:30Z

Interesting, I developed those tests (and the methods behind them), and they were passing previously, and certainly before I submitted the PR.

Regarding the decode issue, I know there were changes to how text and data are interpreted in Python 3 (see: http://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit), but I mostly develop in Python 2.7, and was believing so much that Travis would catch things like that for me. It didn't, I guess. Thanks for the heads up and suggested fixes, I'll implement them soon to see how they affect code in 2.7. Anyone's thoughts on this would be appreciated.

Regarding the failed tests to incorrect hardcoded data, I am somewhat at a loss, as I referenced specific dates and date ranges to ensure that these particular values would be absolute. These also previously passed tests and I can only think that there was either a copy-paste slip right before submitting or I am overlooking some aspect of data queries from Yahoo! Finance.

Either way, your values seem correct, and I'll included them in a PR soon to see if we can squash theses errors. Still, I am going to dig a bit deeper to make sure I'm not missing something more fundamental.

Thanks!

lodagro · 2013-02-13T08:30:43Z

test_get_data() in test_yahoo.py also fails for me, see also #2853
Failure matches with description of @selasley, i don`t get the other failures (but i am on python 2.7)

nehalecky · 2013-02-13T08:58:22Z

Thanks @lodagro and @selasley for the feedback.

It was something fundamental:heavy_exclamation_mark:

With Yahoo's details regarding their adjusted close calculations being quite lacking, I spent a little bit more time digging around. Based off the number of threads, blog posts, and white papers I discovered, it turns out I wasn't alone, and I though it helpful to comment a bit on the topic.

I was incorrect in my assumption of how yahoo calculates its adjusted close price, which I had previously assumed to be forward adjusted (i.e., based off of the starting date of the date range of interest and projected forward). As it turns out, it's a blanket backwards adjusted calculation from the present, and includes adjustments for all dividends and stock splits from the present day.

The error from test_get_data() with the hardcoded value in the line:

expected = [18.38, 27.45, 24.54]  #error
expected = [18.38, 27.45, 24.28]  #corrected value

refers to 'INTC' (Intel's) adjusted close price, the result from the query:

result = pan.Close.ix['01-18-12'][['GE', 'MSFT', 'INTC']].tolist()

The story is, the same day I submitted PR #2795 on Feb. 5th, Intel had a dividend payout of 0.225, which then caused all adjusted close prices before this date to be recalculated and updated, resulting in the small change of value observed, from 24.54 to 24.28, and resulting in everyone's develop install of pandas to be throwing these errors (apologies). For the exact same reason, the error from the query:

result = pan.Ret_Index.ix['01-18-11':'01-21-11'][['GE', 'INTC', 'MSFT']]

is a result of the return index values being dependent on Intel's adjusted close price.

As such, note that references to any value associated with the adjusted close price from yahoo are not absolute, and I will be updating the tests in test_yahoo.py to reflect this soon, with PR to follow. In the meantime, you can comment out those two specific assert statements as they aren't good test. Again, apologies about the hassle, and thanks.

Oh, if you'd like to learn more, checkout some helpful discussions on the topic:

Lots of detail on how Yahoo calculates adjusted close:
http://marubozu.blogspot.com/2006/09/how-yahoo-calculates-adjusted-closing.html
Discussion regarding quality of Yahoo data with adjusted close:
http://quant.stackexchange.com/questions/7216/daily-returns-using-adjusted-close

EDIT: Also, I'll checkout using decode() in 2.7 soon, thanks.

ghost · 2013-02-13T13:53:47Z

travis uses nosetests --exe -w /tmp -A "not slow" pandas;
and these tests are marked @slow.

lodagro · 2013-02-13T13:58:25Z

Could add the -v flag to nosetests, to see in the travis logs which tests ran/skip.

ghost · 2013-02-13T14:08:46Z

I have done for debugging in the past,but travis's UI is sluggish at best
and the log is huge already, adding 3000 lines would not improve things IMO.

ghost · 2013-02-15T18:51:22Z

@selasley, would you like to open a PR for the decode('utf8') bit? I'll merge it.

IIUC, fixing the test vectors will still have the test failing come next dividend
So either the test needs to be less sensitive or the asserts disabled for now.

ghost · 2013-02-16T20:54:31Z

closed via 44b2495

nehalecky mentioned this issue Feb 15, 2013

BUG/TST: Yahoo Finance (a foundational resolve of GH: #2847, #2853) #2878

Closed

ghost closed this as completed Feb 16, 2013

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors with nosetests-3.3 pandas.io.tests.test_yahoo.TestYahoo pandas, version b0ee363408336 #2847

Errors with nosetests-3.3 pandas.io.tests.test_yahoo.TestYahoo pandas, version b0ee363408336 #2847

selasley commented Feb 12, 2013

nehalecky commented Feb 12, 2013

lodagro commented Feb 13, 2013

nehalecky commented Feb 13, 2013

ghost commented Feb 13, 2013

lodagro commented Feb 13, 2013

ghost commented Feb 13, 2013

ghost commented Feb 15, 2013

ghost commented Feb 16, 2013

Errors with nosetests-3.3 pandas.io.tests.test_yahoo.TestYahoo pandas, version b0ee363408336 #2847

Errors with nosetests-3.3 pandas.io.tests.test_yahoo.TestYahoo pandas, version b0ee363408336 #2847

Comments

selasley commented Feb 12, 2013

.EEE

ERROR: test_get_quote (pandas.io.tests.test_yahoo.TestYahoo)

ERROR: test_get_components (pandas.io.tests.test_yahoo.TestYahoo)

ERROR: test_get_data (pandas.io.tests.test_yahoo.TestYahoo)

nehalecky commented Feb 12, 2013

lodagro commented Feb 13, 2013

nehalecky commented Feb 13, 2013

ghost commented Feb 13, 2013

lodagro commented Feb 13, 2013

ghost commented Feb 13, 2013

ghost commented Feb 15, 2013

ghost commented Feb 16, 2013