Test failures on FreeBSD 9.1 #3360

neirbowj · 2013-04-14T17:17:32Z

% uname -a
FreeBSD XXXX.saltant.net 9.1-STABLE FreeBSD 9.1-STABLE #0 r248078: Fri Mar  8 20:36:00 EST 2013     root@XXXX.saltant.net:/usr/obj/usr/src/sys/NIPPL  amd64
% pkg_info -xE pandas
py27-pandas-0.11.0.r1
% pkg_info -xE ^python
python27-2.7.3_6
% pkg_info -xE numpy
py27-numpy-1.6.2_1,1
% nosetests pandas.tests.test_index:TestMultiIndex.test_legacy_pickle
E
======================================================================
ERROR: test_legacy_pickle (pandas.tests.test_index.TestMultiIndex)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/tests/test_index.py", line 1060, in test_legacy_pickle
    obj = pickle.load(open(ppath, 'r'))
  File "/usr/local/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/usr/local/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/local/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/lib/python2.7/pickle.py", line 1124, in find_class
    __import__(module)
ImportError: No module named multiarray

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (errors=1)

The text was updated successfully, but these errors were encountered:

jreback · 2013-04-14T18:35:50Z

are there other failures?
what does
test_fast.sh return?

neirbowj · 2013-04-14T19:04:04Z

I installed from the rc1 tarball, so I don't have test_fast.sh handy. Give me a moment to bring up a dev't environment and I'll get back to you about test_fast. In the mean time, here's the complete output from nose.

% nosetests --exe pandas

** (process:86069): WARNING **: Trying to register gtype 'GMountMountFlags' as enum when in fact it is of type 'GFlags'

** (process:86069): WARNING **: Trying to register gtype 'GDriveStartFlags' as enum when in fact it is of type 'GFlags'

** (process:86069): WARNING **: Trying to register gtype 'GSocketMsgFlags' as enum when in fact it is of type 'GFlags'
..........................SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.........................................................................................................................................................................................................................................................................................SSSSSSSSSSS..........SS...................................................................................................................................................................................S..................S..S.SSSSSSS......................................S.....SSSSSSSSSSSSSSSSSSSSSSSS..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................S.S....................................................E................................................................................................................................................................................................................................SSS.............................................................................................................................................................................................................................S.........................S.......................................................................................................................................................................................................................................................................................................................................E..............................................................................................................................................................S.............................................................................................................................................................S..................S...S............SSS.........S............SS...........SS.....SSSS.......................................................................................................................S......................................................................................................E.......................................S............................................................................................................
======================================================================
ERROR: test_to_string_repr_unicode (pandas.tests.test_format.TestDataFrameFormatting)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/tests/test_format.py", line 237, in test_to_string_repr_unicode
    rs = repr(ser).split('\n')
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1138, in __repr__
    return str(self)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1097, in __str__
    return self.__bytes__()
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1107, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1125, in __unicode__
    dtype=True)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1214, in _get_repr
    result = formatter.to_string()
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/format.py", line 146, in to_string
    idx = k.ljust(pad_space + _encode_diff(k))
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/format.py", line 168, in _encode_diff
    return len(x) - len(x.decode(encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)

======================================================================
ERROR: test_legacy_pickle (pandas.tests.test_index.TestMultiIndex)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/tests/test_index.py", line 1060, in test_legacy_pickle
    obj = pickle.load(open(ppath, 'r'))
  File "/usr/local/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/usr/local/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/local/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/lib/python2.7/pickle.py", line 1124, in find_class
    __import__(module)
ImportError: No module named multiarray

======================================================================
ERROR: http://docs.python.org/py3k/reference/datamodel.html#object.__repr__
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/tests/test_series.py", line 1361, in test_repr_should_return_str
    self.assertTrue(type(df.__repr__() == str))  # both py2 / 3
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1138, in __repr__
    return str(self)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1097, in __str__
    return self.__bytes__()
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1107, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1125, in __unicode__
    dtype=True)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1214, in _get_repr
    result = formatter.to_string()
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/format.py", line 146, in to_string
    idx = k.ljust(pad_space + _encode_diff(k))
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/format.py", line 168, in _encode_diff
    return len(x) - len(x.decode(encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)

----------------------------------------------------------------------
Ran 3125 tests in 297.710s

FAILED (SKIP=115, errors=3)

I'm looking into the GFlags warnings now. TestDataFrameFormatting.test_eng_float_formatter generates them in the final call to fmt.reset_printoptions()

I'm also working on a minimal failing test for TestDataFrameFormatting.test_to_string_repr_unicode, because:

% nosetests pandas.tests.test_format:TestDataFrameFormatting
.
** (process:86110): WARNING **: Trying to register gtype 'GMountMountFlags' as enum when in fact it is of type 'GFlags'

** (process:86110): WARNING **: Trying to register gtype 'GDriveStartFlags' as enum when in fact it is of type 'GFlags'

** (process:86110): WARNING **: Trying to register gtype 'GSocketMsgFlags' as enum when in fact it is of type 'GFlags'
...............................................E................
======================================================================
ERROR: test_to_string_repr_unicode (pandas.tests.test_format.TestDataFrameFormatting)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/tests/test_format.py", line 237, in test_to_string_repr_unicode
    rs = repr(ser).split('\n')
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1138, in __repr__
    return str(self)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1097, in __str__
    return self.__bytes__()
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1107, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1125, in __unicode__
    dtype=True)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/series.py", line 1214, in _get_repr
    result = formatter.to_string()
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/format.py", line 146, in to_string
    idx = k.ljust(pad_space + _encode_diff(k))
  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/format.py", line 168, in _encode_diff
    return len(x) - len(x.decode(encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)

----------------------------------------------------------------------
Ran 65 tests in 3.022s

FAILED (errors=1)

But...

% nosetests pandas.tests.test_format:TestDataFrameFormatting.test_to_string_repr_unicode
.
----------------------------------------------------------------------
Ran 1 test in 0.024s

OK

test_to_string_repr_unicode uses np.random.randn to generate test data, so pass/fail depends on earlier consumption of random numbers.

neirbowj · 2013-04-14T20:05:28Z

OK, when I

git clone --branch v0.11.0rc1 https://github.com/pydata/pandas pandas-0.11.0rc1
cd pandas-0.11.0rc1
python setup.py build_ext --inplace
./test_fast.sh

I get the warnings, the failure in pandas.tests.test_format:TestDataFrameFormatting.test_to_string_repr_unicode and the associated http://docs.python.org/py3k/reference/datamodel.html#object.__repr__ error, but pandas.tests.test_index:TestMultiIndex.test_legacy_pickle apparently passes.

I don't know what to do next, but will gladly take direction and respond to requests.

ghost · 2013-04-15T10:44:10Z

Please pull the latest master (I pushed some changes today), and post the output ofci/print_versions.py.

also, what is the value of pandas.options.display.encoding?

edit: Tha test name above is now fixed in master.

neirbowj · 2013-04-15T13:30:46Z

% git describe
v0.11.0rc1-27-g9c05da7
% ci/print_versions.py

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: FreeBSD 9.1-STABLE FreeBSD 9.1-STABLE #0 r248078: Fri Mar  8 20:36:00 EST 2013     root@XXXX.saltant.net:/usr/obj/usr/src/sys/NIPPL amd64

Cython: 0.17.1
Numpy: 1.6.2
Scipy: 0.11.0
statsmodels: Not installed
    patsy: Not installed
scikits.timeseries: Not installed
dateutil: 2.1
pytz: 2013b
PyTables: Not Installed
    numexpr: Not Installed
matplotlib: 1.2.0
openpyxl: Not installed
xlrd: Not installed
xlwt: Not installed
sqlalchemy: Not installed

And furthermore...

>>> import pandas
/usr/local/lib/python2.7/site-packages/pytz-2013b-py2.7.egg/pytz/__init__.py:35: UserWarning: Module pandas was already imported from pandas/__init__.pyc, but /usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg is being added to sys.path
>>> pandas.options.display.encoding
'US-ASCII'
>>>

ghost · 2013-04-15T13:46:51Z

Your terminal doesn't support utf-8/unicode, or at least doesn't report it in a way pandas
can discern (the code is in core/format.py:detect_console_encoding()).

Obviously, the test needs to fail more gracefuly in such a situation, but really it's signalling
that while pandas supports unicode, your environment does not.

If you're only working with ASCII data, your should be fine in principle btw.

ghost · 2013-04-15T14:32:48Z

But there's something else going on here.

  File "/usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/format.py", line 168, in _encode_diff
    return len(x) - len(x.decode(encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)

That should have been a UnicodeEncodeError (which would have been caught and the test passed).

I expected to see:

In [7]: x=u'\u03c3'
   ...: len(x)-len(x.decode('ascii'))
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-7-33671dd44d61> in <module>()
      1 x=u'\u03c3'
----> 2 len(x)-len(x.decode('ascii'))

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c3' in position 0: ordinal not in range(128)

x is a unicode string, which gets implicitly encoded into ASCII when decoded,
it doesn't make sense that that would succeed and the decode immediately
following would fail.

Can you break in with pdb and print out x at the point of failure?

nosetests pandas/tests/test_fomat -m repr --pdb
<break in>
(pdb) print x

also, what are the values of the $LC_ALL and $LANG envars?

neirbowj · 2013-04-15T15:12:47Z

% nosetests pandas/tests/test_format.py -m repr --pdb
.
** (process:2949): WARNING **: Trying to register gtype 'GMountMountFlags' as enum when in fact it is of type 'GFlags'

** (process:2949): WARNING **: Trying to register gtype 'GDriveStartFlags' as enum when in fact it is of type 'GFlags'

** (process:2949): WARNING **: Trying to register gtype 'GSocketMsgFlags' as enum when in fact it is of type 'GFlags'
.............> /home/obrienjw/src/github/pandas/pandas/core/format.py(168)_encode_diff()
-> return len(x) - len(x.decode(encoding))
(Pdb) print x
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c3' in position 0: ordinal not in range(128)
(Pdb)

Neither $LC_ALL nor $LANG are defined in the environment.

This has been vexing me but I just breathed a sigh of relief since you just pointed out the UnicodeEncodeError vs. UnicodeDecodeError difference. I hadn't noticed it, and was racking my brain to understand why the except block wasn't catching it. D'oh! At least now there is an explanation for why this test passes sometimes and fails other times (i.e. different exceptions).

As for the UTF-8 support, I'll see about turning on support in my terminal so that I can run tests in a richer environment. I'm the pandas maintainer for FreeBSD so it's not just my data I need to worry about.

ghost · 2013-04-15T15:21:28Z

I'm pretty sure the code that's misbehaving (although the wrong exception is puzzling)
can actually be removed (#3364), it was a stopgap that's no longer necessary after other
unicode-related changes. But that's not a change I'll merge a few days before a major release.

I can't repro any of this on my box, but if you figure out why the DecodeError is happening
instead of the EncodeError that's 99% of it.

ghost · 2013-04-15T18:29:30Z

was the pickle issue a false alarm?

neirbowj · 2013-04-15T18:53:44Z

Oops. Not sure about the pickle test. Will see if I can reproduce it tonight.

jreback · 2013-04-15T18:55:28Z

fyi....do a clean git clone and try.....the error message on there, can't find multiarray is a numpy module (so also make sure that it is loading correct numpy), etc.

ghost · 2013-04-15T19:27:24Z

Relavent perhaps:
http://stackoverflow.com/questions/9641916/python-pandas-cant-find-numpy-core-multiarray-when-importing-pandas

might you be using a 32bit compiled numpy on an amd64 system?

neirbowj · 2013-04-16T17:44:13Z

OK, here's the reproducibility pattern I'm seeing for the test_legacy_pickle failure.

build from tarball (via FreeBSD math/py-pandas port with my draft patch applied) and test installed: fails
build from git at 0.11.0rc1 tag and test in-place: passes
build from git at latest master (a9f3f6d) and test in-place: passes

Next I'm going to try installing from git without assistance from the FreeBSD ports machinery and/or point FreeBSD ports at the repo instead of the tarball. Depending on the result I will probably suspect Cython or FreeBSD ports, but hopefully not both.

As for word-length, I see no possibility that pandas and numpy are mismatched on this machine.

% uname -m
amd64
% find \
    /usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg \
    /usr/local/lib/python2.7/site-packages/numpy -name "*.so" \
    | xargs file -b \
    | sort | uniq
ELF 64-bit LSB shared object, x86-64, version 1 (FreeBSD), dynamically linked, not stripped

jreback · 2013-04-16T17:49:51Z

so the py-pandas port is the FreeBSD packing mechanism? looks like it installs 0.10.1?

if you install 0.10.1 from the pypi tarball does it work for you?

neirbowj · 2013-04-17T03:09:39Z

@jreback: Yes, "port" in this context is like the .spec file and associated cruft to generate an RPM. I published a working draft for 0.11.0rc1 for testing to a mailing list. After 0.11.0 is released, I will revise the patch and submit it as a pandas maintainer update. All versions published to FreeBSD ports so far have had all tests passing, including 0.10.1.

neirbowj · 2013-04-17T04:28:32Z

build/install from FreeBSD port + patch for 0.11.0rc1 tarball: fails (as before)
build/install from FreeBSD port + patch for 0.11.0rc1 github tag: passes

It's long past my bedtime, but when I get a chance tomorrow I will take a look at the diff between the tarball-provided .C files and my cython-generated .C files. That is, unless there is something else I should do first.

At this point I think I have a fallback plan. In the event that we cannot determine and resolve the root cause of this issue prior to release, I could convert the port to build from github. I would rather not do that, though, because it adds a build-time dependency (cython), and increases the number of things that could vary across user build environments.

jreback · 2013-04-17T09:21:50Z

what's your cython version?

neirbowj · 2013-04-17T14:04:34Z

Cython 0.17.1. See above for other versions.

ghost · 2013-04-17T18:46:42Z

any new leads on the unexplained UnicodeDecodeError? that's pretty straightforward
to tackle using pdb.

neirbowj · 2013-04-17T21:33:22Z

@y-p: Sorry, but no. I've only had about an hour a day to work on this so far. I should have a good chunk of time this evening, so I hope to be able to report more substantial progress soon.

ghost · 2013-04-17T21:36:25Z

I don't see this blocking 0.11, all the evidence points at the issue being somewhere along
the freebsd pipeline. If there's a freebsd vagrant box and you post steps to reproduce,
maybe we can speed things along towards resolution.

neirbowj · 2013-04-17T23:34:44Z

@y-p: Vagrant is a good idea, but it's not working out in practice. Using xironix/freebsd-vagrant, I got to vagrant ssh before it broke. I'm not about to insert this as a prerequisite problem to solve before I get back to work on solving the pandas problem. If you feel like standing up a stock FreeBSD 9.1 system in VirtualBox, I'll gladly provide steps to reproduce.

neirbowj · 2013-04-18T01:13:05Z

@y-p: For the UnicodeDecodeError in TestDataFrameFormatting.test_to_string_repr_unicode, here's what I've learned so far with the help of pdb.

When the test passes, and the UnicodeEncodeError is correctly caught, the offending value is u'\u03c3a' and the display encoding is 'US-ASCII'. When the test fails, and the unexpected UnicodeDecodeError is raised, the offending value and display encoding are the same as they are when the test passes. Terminal captures available on request.

Next up: I need to find out what else could cause python to raise a different exception.

ghost · 2013-04-18T09:02:42Z

I think I figured out the problem, the encoding detection routine was too rigid.
Pushed 729d333 to master, please see if it resolves the unicode issues.

That also possibly solves the failed unicode support detection on your
system, you can confirm by checking whether display.encoding is now utf-8 or similar.

ghost · 2013-04-20T09:52:49Z

@neirbowj , was the fix effective for you? would be good to clear this up prior to 0.11.0

neirbowj · 2013-04-20T16:20:03Z

@y-p: Sorry for the delay. No, 729d333, when applied in situ to my tarball build does not change the exception behaviour inside of pandas.core.format:SeriesFormatter.to_string.

This is what is currently bending my brain. In addition to applying 729d333, I have instrumented to_string with pdb so that it will drop me into the debugger at essentially the same point under both passing and failing conditions.

When TestDataFrameFormatting.test_to_string_repr_unicode passes:

% nosetests -s pandas.tests.test_format:TestDataFrameFormatting.test_to_string_repr_unicode  
> /usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/format.py(152)to_string()
-> idx = k.ljust(pad_space)
(Pdb) import codecs
(Pdb) codecs.getdecoder('ascii') 
<built-in function ascii_decode>
(Pdb) shrubbery = u'\u03c3'
(Pdb) shrubbery.decode('ascii')
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c3' in position 0: ordinal not in range(128)
(Pdb)

When TestDataFrameFormatting.test_to_string_repr_unicode fails:

% nosetests -s pandas.tests.test_format:TestDataFrameFormatting
.
** (process:84153): WARNING **: Trying to register gtype 'GMountMountFlags' as enum when in fact it is of type 'GFlags'

** (process:84153): WARNING **: Trying to register gtype 'GDriveStartFlags' as enum when in fact it is of type 'GFlags'

** (process:84153): WARNING **: Trying to register gtype 'GSocketMsgFlags' as enum when in fact it is of type 'GFlags'
...............................................> /usr/local/lib/python2.7/site-packages/pandas-0.11.0rc1-py2.7-freebsd-9.1-STABLE-amd64.egg/pandas/core/format.py(155)to_string()
-> result[i] = result[i] % (idx, v)
(Pdb) import codecs
(Pdb) codecs.getdecoder('ascii')
<built-in function ascii_decode>
(Pdb) shrubbery = u'\u03c3'
(Pdb) shrubbery.decode('ascii')
*** UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)
(Pdb)

What I think this shows is that some internal state that is non-obvious to me is affecting the behaviour of the ascii_decode builtin. Do any of the tests monkey patch something that ascii_decode uses?

ghost · 2013-04-20T16:25:18Z

There's the stdin_encoding context manager in util/testing.py, but
I think it's used anywhere.

you need to check the value of sys.getdefaultencoding() in either situation.
That's what python uses to coerce between string and unicode.

it's ascii on every system I've ever seen.

neirbowj · 2013-04-21T01:20:35Z

FreeBSD is off the hook for meddling with line endings. If you recall, test_legacy_pickle passes when I build from GH. We had considered the possibility of a Cython-related root-cause. In fact, it looks like the 0.11.0rc1 tarball may contain nothing but files with CRLF line endings, while the GH repo doesn't.

neirbowj · 2013-04-21T02:49:24Z

Options for test_legacy_pickle failure:

Update the test to use obj = pickle.load(open(ppath, 'rU')) to make it insensitive to line endings.
Update the test (or add a new one) to ensure that multiindex_v1.pickle is installed with correct newlines.

ghost · 2013-04-21T07:34:58Z

On MPL issues.

Changing sys.setdefaultencoding() back is a no go. might break GTK backend, and general
anti-pattern. edit: dislike changing display encoding midrun as well. see below.

Try #3409. I think it solves the problem.
Basically, the code that's failing is there to handle bytestrings, not unicode, so the
dependency on the value of sys.getdefaultencoding() is optional.

That doesn't solve the issue of MPL changing sys.getdefaultencoding, but the contract is that
your system reports it's capabilities at import time, and that's what we go by.

The rest of pandas shouldn't depend on getdefaultencoding() directly, so the end result
is no unicode support when the system reports it only supports ascii, but no exceptions.

neirbowj · 2013-04-21T15:01:50Z

#3409 does it for me. No more test failures.

ghost · 2013-04-21T15:15:15Z

Good, merged.

Now, about the pickle issues. @jreback what do you say?

jreback · 2013-04-21T15:16:49Z

this code was not changed at all in 0.11, maybe the filename ha the embedded /r when it was generated?

ghost · 2013-04-21T15:31:43Z

@neirbowj , when you says "remember that it passes when I built from GH", do you mean that you
ran python ./setup.py sdist, got a tarball, installed it and the test passed?

ghost · 2013-04-21T17:12:04Z

Ok, Github repo clone:

λ ll pandas/tests/data/

drwxr-xr-x 2 user1 user1 4096 Apr 13 05:02 ./
drwxr-xr-x 3 user1 user1 4096 Apr 13 05:02 ../
-rw-r--r-- 1 user1 user1 4750 Jun  4  2012 iris.csv
-rw-r--r-- 1 user1 user1  670 Jun 28  2012 mindex_073.pickle
-rw-r--r-- 1 user1 user1 1249 Feb  9  2012 multiindex_v1.pickle
-rw-r--r-- 1 user1 user1 8188 Apr 10 10:02 tips.csv
-rw-r--r-- 1 user1 user1  595 Apr 26  2012 unicode_series.csv

tarball from (windows?) build box:

/tmp/pandas-0.11.0rc1  
λ ll ~/src/pandas/pandas/tests/data/
total 36
drwxr-xr-x 2 user1 user1 4096 Apr 21 16:32 ./
drwxr-xr-x 3 user1 user1 4096 Apr 21 18:29 ../
-rw-r--r-- 1 user1 user1 4600 Apr 21 16:32 iris.csv
-rw-r--r-- 1 user1 user1  670 Apr 21 16:32 mindex_073.pickle
-rw-r--r-- 1 user1 user1 1101 Apr 21 16:32 multiindex_v1.pickle
-rw-r--r-- 1 user1 user1 7943 Apr 21 16:32 tips.csv
-rw-r--r-- 1 user1 user1  577 Apr 21 16:32 unicode_series.csv

NB the changed file size(s) of the pickle file multiindex_v1.pickle
I did a setup.py sdist on linux, and the pickle files are unchanged.

The idea that windows would mangle binary files for line termination just
boggles the mind.

cc @changhiskhan
edit: those are actually reversed

ghost · 2013-04-21T17:14:28Z

possibly related

git config  core.autocrlf

neirbowj · 2013-04-21T17:44:35Z

@y-p: When I talk about building from a tarball, I'm referring to the published sdist. When I refer to building from GH, FreeBSD fetches a GH-produced archive, and thereafter treats it like a regular sdist (i.e. extract into a working directory, configure, build, install). The GH method does not actually use git locally, but it can be configured to ask GH for a tarball from an arbitrary ref.

ghost · 2013-04-21T17:50:33Z

Ok. looks like There's something wonky with the build box, don't think it's a pandas issue per se.

changhiskhan · 2013-04-22T19:08:41Z

I downloaded the tarball from http://pandas.pydata.org/pandas-build/dev/
this is what i get. The file sizes seem to match the github repo clone rather than the one you got from the tarball.

 ~/Downloads/pandas-0.11.0rc1/pandas/tests/data $ ll
total 28
-rw-rw-r-- 1 chang chang 4750 Jun  3  2012 iris.csv
-rw-rw-r-- 1 chang chang  670 Jun 28  2012 mindex_073.pickle
-rw-rw-r-- 1 chang chang 1249 Feb  9  2012 multiindex_v1.pickle
-rw-rw-r-- 1 chang chang 8188 Apr 10 00:02 tips.csv
-rw-rw-r-- 1 chang chang  595 Apr 26  2012 unicode_series.csv

ghost · 2013-04-22T19:29:14Z

those file sizes match what I get from the tarball as well,
but not the sizes of my repo clone., which are the sizes correlated
with passing tests.

total 36
drwxr-xr-x 2 user1 user1 4096 Apr 22 20:38 ./
drwxr-xr-x 3 user1 user1 4096 Apr 22 20:55 ../
-rw-r--r-- 1 user1 user1 4600 Apr 22 20:38 iris.csv
-rw-r--r-- 1 user1 user1  670 Apr 22 20:38 mindex_073.pickle
-rw-r--r-- 1 user1 user1 1101 Apr 22 20:38 multiindex_v1.pickle
-rw-r--r-- 1 user1 user1 7943 Apr 22 20:38 tips.csv
-rw-r--r-- 1 user1 user1  577 Apr 22 20:38 unicode_series.csv

note also

λ wget https://github.com/pydata/pandas/raw/master/pandas/tests/data/multiindex_v1.pickle
λ ll
total 2736
drwxr-xr-x  3 user1 user1    4096 Apr 22 22:28 ./
drwxrwxrwt 16 root  root    36864 Apr 22 22:26 ../
-rw-r--r--  1 user1 user1    1101 Apr 22 22:28 multiindex_v1.pickle

neirbowj · 2013-04-22T20:15:41Z

You should be able to replicate the test failure on any system just by

import pickle
obj = pickle.load(open('multiindex_v1.pickle', 'r'))

If your pickle has '\r's this load should always fail, no matter what size it is.

I'm not sure it really matters for anything other than this v0 pickle file, but wouldn't it be best to follow the common practice of shipping source code archives that match the likely convention of their intended platforms (.zip:CRLF, .tar.gz:LF)? If so, there should be a test.

ghost · 2013-04-22T20:19:05Z

The contents of binary files should not change to match the platform you're on.

neirbowj · 2013-04-22T21:11:10Z

Absolutely not, but a significant majority of what's in a source code archive is not binary.

% find ./ -path "./.git*" -prune -o -type f -print | sed 's/.*\(\.[^.]*\)/\1/' | sort | uniq -c
   1 ./LICENSE
   1 ./LICENSES/NUMPY_LICENSE
   1 ./LICENSES/OTHER
   1 ./LICENSES/PSF_LICENSE
   1 ./LICENSES/SCIPY_LICENSE
   1 ./Makefile
   1 ./doc/data/fx_prices                       # binary
   1 ./doc/source/_static/stub                  # empty
   1 ./examples/data/SOURCES                    # empty
   1 ./pandas/src/parser/Makefile
   1 ./scripts/git-mrb
   1 ./vb_suite/source/_static/stub             # empty
   5 .R
   7 .bat
   5 .c
   2 .conf
   1 .coveragerc
   2 .css_t
  10 .csv
   1 .data
   2 .gitignore
  17 .h
   6 .h5                                        # binary
   2 .html
   2 .in
   2 .ini
   1 .md
  10 .pickle                                    # binary... except multiindex_v1.pickle (sort of)
   2 .png                                       # binary
   7 .pxd
 263 .py
  16 .pyx
  27 .rst
  12 .sh
   1 .table
  19 .txt
   4 .xls                                       # binary
   1 .xlsx                                      # binary
   1 .yml

So there are two issues:

Automation is confused by multiindex_v1.pickle, because it looks like non-binary, and therefore is subject to newline conversion, but it should be treated as if it were binary, because pickle protocol v0 cannot handle CRLF newlines, even (I think) on platforms where CRLF is the norm. pandas should either perform the test using universal newlines so that it CRLFs in the file don't cause the test to fail, or pandas should perform a new test as described next.
In general, newline conversion to suit the target platform is a good practice, so for files that really are text, pandas should test to ensure that the newlines in its text files match the convention of the host.

The former is the best way to resolve this issue. If you agree with the latter, I will open a new issue targeted to 0.12.

jreback · 2013-04-22T21:14:28Z

FYI I am not sure when/how the multindex.pickle was generated (it is possible it was written not in binary mode)
and I think it is pretty old anyhow, not even sure which version it is supposed to be testing

we have in place pickle compat tests going forward

changhiskhan · 2013-04-22T22:09:11Z

@y-p oh I see. I think you flipped the file sizes in your original comparison? What I got for the sizes on linux is what you posted under windows and vice versa.

changhiskhan · 2013-04-22T22:26:11Z

at the very least we should switch to creating the tarball/zip on linux instead.

ghost · 2013-04-22T22:32:40Z

I sure did. sorry.

doesn't

git config  core.autocrlf false

cure it?

wesm · 2013-04-23T00:09:15Z

Sorry for the trouble I caused by building the source distros on Windows, had been all linux til now =) Chang is uploading new rc1 tarballs and I'm going to work on cutting the 0.11 final now

changhiskhan · 2013-04-23T00:12:59Z

@y-p sorry, it did fix it and I put the new tarball and zip files up there. File sizes look alright to me now.
@neirbowj if you haven't reached your pain tolerance yet, please give it one more shot. Should be fine but can reopen the issue if it's still a problem

neirbowj · 2013-04-23T02:17:28Z

@changhiskhan You've probably noticed that I have what some might call an unhealthy tolerance for pain. I appreciate you and @y-p hanging in and resolving these test failures.

@wesm No trouble. Just another opportunity for me to learn something about something, and a new corner case that might admit a new test or two. Congrats on the latest release.

All tests passing (skipped 115) on FreeBSD 9.1-STABLE (r248078), with:

SHA256 (pandas-0.11.0rc1.tar.gz) = d7adf3cbd7febe4d3ad35cd5cd13f464c0aa9add58b5cf3a19c2444f6dbe1014

Off to update the port for 0.11.0 release.

changhiskhan · 2013-04-23T02:20:27Z

hooray! thank @y-p for the fix.

edit: yp -> y-p

ghost · 2013-04-23T02:22:06Z

no damn it. thank me.

neirbowj · 2013-04-23T03:12:01Z

My work here is done (for now). Good night.

neirbowj closed this as completed Apr 15, 2013

neirbowj reopened this Apr 15, 2013

This was referenced Apr 21, 2013

BUG: don't rely on sys.getdefaultencoding if we don't need to #3409

Merged

test failures and errors with v0.11rc1 #3363

Closed

wesm closed this as completed Apr 23, 2013

ghost mentioned this issue Jun 17, 2013

CLN: Fix CRLFs in repo + add .gitattributes #3915

Merged

Test failures on FreeBSD 9.1 #3360

Test failures on FreeBSD 9.1 #3360

Comments

neirbowj commented Apr 14, 2013

jreback commented Apr 14, 2013

neirbowj commented Apr 14, 2013

neirbowj commented Apr 14, 2013

ghost commented Apr 15, 2013

neirbowj commented Apr 15, 2013

ghost commented Apr 15, 2013

ghost commented Apr 15, 2013

neirbowj commented Apr 15, 2013

ghost commented Apr 15, 2013

ghost commented Apr 15, 2013

neirbowj commented Apr 15, 2013

jreback commented Apr 15, 2013

ghost commented Apr 15, 2013

neirbowj commented Apr 16, 2013

jreback commented Apr 16, 2013

neirbowj commented Apr 17, 2013

neirbowj commented Apr 17, 2013

jreback commented Apr 17, 2013

neirbowj commented Apr 17, 2013

ghost commented Apr 17, 2013

neirbowj commented Apr 17, 2013

ghost commented Apr 17, 2013

neirbowj commented Apr 17, 2013

neirbowj commented Apr 18, 2013

ghost commented Apr 18, 2013

ghost commented Apr 20, 2013

neirbowj commented Apr 20, 2013

ghost commented Apr 20, 2013

neirbowj commented Apr 21, 2013

neirbowj commented Apr 21, 2013

ghost commented Apr 21, 2013

neirbowj commented Apr 21, 2013

ghost commented Apr 21, 2013

jreback commented Apr 21, 2013

ghost commented Apr 21, 2013

ghost commented Apr 21, 2013

ghost commented Apr 21, 2013

neirbowj commented Apr 21, 2013

ghost commented Apr 21, 2013

changhiskhan commented Apr 22, 2013

ghost commented Apr 22, 2013

neirbowj commented Apr 22, 2013

ghost commented Apr 22, 2013

neirbowj commented Apr 22, 2013

jreback commented Apr 22, 2013

changhiskhan commented Apr 22, 2013

changhiskhan commented Apr 22, 2013

ghost commented Apr 22, 2013

wesm commented Apr 23, 2013

changhiskhan commented Apr 23, 2013

neirbowj commented Apr 23, 2013

changhiskhan commented Apr 23, 2013

ghost commented Apr 23, 2013

neirbowj commented Apr 23, 2013