Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv should accept unicode objects as urls #2564

Closed
richbwood opened this issue Dec 19, 2012 · 5 comments
Closed

read_csv should accept unicode objects as urls #2564

richbwood opened this issue Dec 19, 2012 · 5 comments
Labels
IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite Unicode Unicode strings
Milestone

Comments

@richbwood
Copy link

Hi,

I have tried the following with
pandas-f014b01af7bd2d03266697e672c2d41daded3fca.zip
and
pandas-0.10.0.tar.gz

The unit tests on my Pandas build are failing with:

FAIL: test_read_csv (pandas.io.tests.test_parsers.TestCParserHighMemory)

Traceback (most recent call last):
File "/home/woodri/build/out/lib/python2.7/site-packages/pandas-0.10.0-py2.7-linux-x86_64.egg/pandas/io/tests/test_parsers.py", line 109, in test_read_csv
assert(False), "read_csv should accept unicode objects as urls"
AssertionError: read_csv should accept unicode objects as urls

FAIL: test_read_csv (pandas.io.tests.test_parsers.TestCParserLowMemory)

Traceback (most recent call last):
File "/home/woodri/build/out/lib/python2.7/site-packages/pandas-0.10.0-py2.7-linux-x86_64.egg/pandas/io/tests/test_parsers.py", line 109, in test_read_csv
assert(False), "read_csv should accept unicode objects as urls"
AssertionError: read_csv should accept unicode objects as urls

FAIL: test_read_csv (pandas.io.tests.test_parsers.TestPythonParser)

Traceback (most recent call last):
File "/home/woodri/build/out/lib/python2.7/site-packages/pandas-0.10.0-py2.7-linux-x86_64.egg/pandas/io/tests/test_parsers.py", line 109, in test_read_csv
assert(False), "read_csv should accept unicode objects as urls"
AssertionError: read_csv should accept unicode objects as urls


Ran 2934 tests in 193.600s

FAILED (SKIP=44, failures=3)

I have compiled python, cython, numpy, etc. myself. Are there any dependencies or compile options that I am likely to be missing?

Please let me know if there is any other information I can provide to help identify the problem.

Thank you

@wesm
Copy link
Member

wesm commented Dec 19, 2012

It looks like a red herring, something specific to your system is causing this function to not work:

    def test_read_csv(self):
        if not py3compat.PY3:
            fname=u"file:///"+unicode(self.csv1)
            try:
                df1 = read_csv(fname, index_col=0, parse_dates=True)
            except IOError:
                assert(False), "read_csv should accept unicode objects as urls"

Try loading a CSV from from a unicode HTTP link, e.g.:

In [34]: read_csv(u'https://raw.github.com/pydata/pandas/master/pandas/io/tests/test1.csv')
Out[34]: 
                 index         A         B         C         D
0  2000-01-03 00:00:00  0.980269  3.685731 -0.364217 -1.159738
1  2000-01-04 00:00:00  1.047916 -0.041232 -0.161812  0.212549
2  2000-01-05 00:00:00  0.498581  0.731168 -0.537677  1.346270
3  2000-01-06 00:00:00  1.120202  1.567621  0.003641  0.675253
4  2000-01-07 00:00:00 -0.487094  0.571455 -1.611639  0.103469
5  2000-01-10 00:00:00  0.836649  0.246462  0.588543  1.062782
6  2000-01-11 00:00:00 -0.157161  1.340307  1.195778 -1.097007

Please try that and report back here if it does not work. We'll have to figure out how to make the unit test more portable to other platforms (this is the first failure I've seen)

@richbwood
Copy link
Author

I can't use the unicode HTTP link because my machine doesn't have internet access.

However, the following works:

In [109]: read_csv(u'file:///home/woodri/python-work/xy.csv')
Out[109]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 536267 entries, 0 to 536266
Data columns:
timeStamp    536267  non-null values
x            536267  non-null values
y            536267  non-null values
dtypes: float64(2), object(1)

Thank you

@richbwood
Copy link
Author

The unit test is trying to get

 fname=u'file:////sbclocal/fmat_ir_local/64-bit/lib/python2.7/site-packages/pandas-0.10.0-py2.7-linux-x86_64.egg/pandas/io/tests/test1.csv'

(note the 4 forward slashes)
and throws the exception:

 <urlopen error ftp error: no host given>

Removing one of the slashes fixes the unit test for me, but probably breaks it for windows.

@wesm
Copy link
Member

wesm commented Dec 20, 2012

that's helpful, thanks. will fix the test

@wesm wesm closed this as completed in dcd9df7 Dec 29, 2012
@wesm
Copy link
Member

wesm commented Dec 29, 2012

Should be fixed now. Removed the extra slash on non-Windows platforms-- thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite Unicode Unicode strings
Projects
None yet
Development

No branches or pull requests

2 participants