Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors with nosetests-3.3 pandas.io.tests.test_yahoo.TestYahoo pandas, version b0ee363408336 #2847

Closed
selasley opened this issue Feb 12, 2013 · 8 comments

Comments

@selasley
Copy link
Contributor

I think get_quote_yahoo and get_components_yahoo in data.py need to decode the bytes returned by urllib.request.urlopen(urlStr).readlines() and urllib.urlopen(urlStr).read().strip().strip('"').split('"\r\n"') in python3.3

Also, the expected INTC values in test_get_data() in test_yahoo.py do not agree with the values Yahoo returns. The Adj Close value for 1-18-12 is 24.28 and the calculated Ret_Index values do not agree with the values in the d array.

The tests pass after making these changes. I have not tested the changes in python2.x

diff --git a/pandas/io/data.py b/pandas/io/data.py
index 5e92fca..dfba151 100644
--- a/pandas/io/data.py
+++ b/pandas/io/data.py
@@ -115,7 +115,7 @@ def get_quote_yahoo(symbols):
         return None

     for line in lines:
-        fields = line.strip().split(',')
+        fields = line.decode().strip().split(',')
         for i, field in enumerate(fields):
             if field[-2:] == '%"':
                 data[header[i]].append(float(field.strip('"%')))
@@ -241,7 +241,7 @@ def get_components_yahoo(idx_sym):
     #break when no new components are found
     while (True in mask):
         urlStr = url.format(idx_mod, stats,  comp_idx)
-        lines = (urllib.urlopen(urlStr).read().strip().
+        lines = (urllib.urlopen(urlStr).read().decode().strip().
                  strip('"').split('"\r\n"'))

         lines = [line.strip().split('","') for line in lines]


diff --git a/pandas/io/tests/test_yahoo.py b/pandas/io/tests/test_yahoo.py
index 1f25e3c..d07c5fd 100644
--- a/pandas/io/tests/test_yahoo.py
+++ b/pandas/io/tests/test_yahoo.py
@@ -92,15 +92,15 @@ class TestYahoo(unittest.TestCase):

         pan = web.get_data_yahoo(dfi, 'JAN-01-12', 'JAN-31-12',
                                  adjust_price=True)
-        expected = [18.38, 27.45, 24.54]
+        expected = [18.38, 27.45, 24.28]
         result = pan.Close.ix['01-18-12'][['GE', 'MSFT', 'INTC']].tolist()
         assert result == expected

         pan = web.get_data_yahoo(dfi, '2011', ret_index=True)
-        d = [[ 1.01757469,  1.01130524,  1.02414183],
-             [ 1.00292912,  1.00770812,  1.01735194],
-             [ 1.00820152,  1.00462487,  1.01320257],
-             [ 1.08025776,  0.99845838,  1.00113165]]
+        d = [[ 1.01757469,  1.01142857,  1.02414183],
+             [ 1.00292912,  1.00779221,  1.01735194],
+             [ 1.00820152,  1.00519481,  1.01320257],
+             [ 1.08025776,  0.99896104,  1.00113165]]

         expected = pd.DataFrame(d)
         result = pan.Ret_Index.ix['01-18-11':'01-21-11'][['GE', 'INTC', 'MSFT']]

original errors
nosetests-3.3 pandas.io.tests.test_yahoo.TestYahoo pandas

.EEE

ERROR: test_get_quote (pandas.io.tests.test_yahoo.TestYahoo)

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/tests/test_yahoo.py", line 49, in test_get_quote
df = web.get_quote_yahoo(pd.Series(['GOOG', 'AAPL', 'GOOG']))
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/data.py", line 118, in get_quote_yahoo
fields = line.strip().split(',')
TypeError: Type str doesn't support the buffer API

ERROR: test_get_components (pandas.io.tests.test_yahoo.TestYahoo)

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/tests/test_yahoo.py", line 57, in test_get_components
df = web.get_components_yahoo('^DJI') #Dow Jones
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/data.py", line 245, in get_components_yahoo
strip('"').split('"\r\n"'))
TypeError: Type str doesn't support the buffer API

ERROR: test_get_data (pandas.io.tests.test_yahoo.TestYahoo)

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/tests/test_yahoo.py", line 87, in test_get_data
dfi = web.get_components_yahoo('^DJI')
File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas-0.11.0.dev_b0ee363-py3.3-macosx-10.6-intel.egg/pandas/io/data.py", line 245, in get_components_yahoo
strip('"').split('"\r\n"'))
TypeError: Type str doesn't support the buffer API


Ran 4 tests in 0.962s

FAILED (errors=3)

@nehalecky
Copy link
Contributor

Interesting, I developed those tests (and the methods behind them), and they were passing previously, and certainly before I submitted the PR.

Regarding the decode issue, I know there were changes to how text and data are interpreted in Python 3 (see: http://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit), but I mostly develop in Python 2.7, and was believing so much that Travis would catch things like that for me. It didn't, I guess. Thanks for the heads up and suggested fixes, I'll implement them soon to see how they affect code in 2.7. Anyone's thoughts on this would be appreciated.

Regarding the failed tests to incorrect hardcoded data, I am somewhat at a loss, as I referenced specific dates and date ranges to ensure that these particular values would be absolute. These also previously passed tests and I can only think that there was either a copy-paste slip right before submitting or I am overlooking some aspect of data queries from Yahoo! Finance.

Either way, your values seem correct, and I'll included them in a PR soon to see if we can squash theses errors. Still, I am going to dig a bit deeper to make sure I'm not missing something more fundamental.

Thanks!

@lodagro
Copy link
Contributor

lodagro commented Feb 13, 2013

test_get_data() in test_yahoo.py also fails for me, see also #2853
Failure matches with description of @selasley, i don`t get the other failures (but i am on python 2.7)

@nehalecky
Copy link
Contributor

Thanks @lodagro and @selasley for the feedback.

It was something fundamental:heavy_exclamation_mark:

With Yahoo's details regarding their adjusted close calculations being quite lacking, I spent a little bit more time digging around. Based off the number of threads, blog posts, and white papers I discovered, it turns out I wasn't alone, and I though it helpful to comment a bit on the topic.

I was incorrect in my assumption of how yahoo calculates its adjusted close price, which I had previously assumed to be forward adjusted (i.e., based off of the starting date of the date range of interest and projected forward). As it turns out, it's a blanket backwards adjusted calculation from the present, and includes adjustments for all dividends and stock splits from the present day.

The error from test_get_data() with the hardcoded value in the line:

expected = [18.38, 27.45, 24.54]  #error
expected = [18.38, 27.45, 24.28]  #corrected value

refers to 'INTC' (Intel's) adjusted close price, the result from the query:

result = pan.Close.ix['01-18-12'][['GE', 'MSFT', 'INTC']].tolist()

The story is, the same day I submitted PR #2795 on Feb. 5th, Intel had a dividend payout of 0.225, which then caused all adjusted close prices before this date to be recalculated and updated, resulting in the small change of value observed, from 24.54 to 24.28, and resulting in everyone's develop install of pandas to be throwing these errors (apologies). For the exact same reason, the error from the query:

result = pan.Ret_Index.ix['01-18-11':'01-21-11'][['GE', 'INTC', 'MSFT']]

is a result of the return index values being dependent on Intel's adjusted close price.

As such, note that references to any value associated with the adjusted close price from yahoo are not absolute, and I will be updating the tests in test_yahoo.py to reflect this soon, with PR to follow. In the meantime, you can comment out those two specific assert statements as they aren't good test. Again, apologies about the hassle, and thanks.

Oh, if you'd like to learn more, checkout some helpful discussions on the topic:

EDIT: Also, I'll checkout using decode() in 2.7 soon, thanks.

@ghost
Copy link

ghost commented Feb 13, 2013

travis uses nosetests --exe -w /tmp -A "not slow" pandas;
and these tests are marked @slow.

@lodagro
Copy link
Contributor

lodagro commented Feb 13, 2013

Could add the -v flag to nosetests, to see in the travis logs which tests ran/skip.

@ghost
Copy link

ghost commented Feb 13, 2013

I have done for debugging in the past,but travis's UI is sluggish at best
and the log is huge already, adding 3000 lines would not improve things IMO.

@ghost
Copy link

ghost commented Feb 15, 2013

@selasley, would you like to open a PR for the decode('utf8') bit? I'll merge it.

IIUC, fixing the test vectors will still have the test failing come next dividend
So either the test needs to be less sensitive or the asserts disabled for now.

@ghost
Copy link

ghost commented Feb 16, 2013

closed via 44b2495

@ghost ghost closed this as completed Feb 16, 2013
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants