Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: np.array([0, max_uint64]) has float64 dtype #19146

Open
jbrockmendel opened this issue May 31, 2021 · 6 comments
Open

BUG/API: np.array([0, max_uint64]) has float64 dtype #19146

jbrockmendel opened this issue May 31, 2021 · 6 comments

Comments

@jbrockmendel
Copy link
Contributor

I expected to get uint64

umax = np.iinfo(np.uint64).max

np.array([umax]).dtype   # <-- uint64, as expected

np.array([0, umax]).dtype  # <-- float64, surprising

There seems to be something special going on inference-wise around the int64 bound:

imax = np.iinfo(np.int64).max

np.array([imax, umax]).dtype  # <-- float64

np.array([imax+1, umax]).dtype  # <-- uint64
@charris
Copy link
Member

charris commented May 31, 2021

This is the downside of value based type inference of Python scalars. The zero is converted as signed, umax as unsigned. Then int64 plus uint64 -> float.

In [5]: array(np.iinfo(np.uint64).max).dtype                                    
Out[5]: dtype('uint64')

In [6]: array(0).dtype                                                          
Out[6]: dtype('int64')

It isn't a bug, but definitely a wart. The best option when mixing unsigned and signed is to specify the dtype.

EDIT: The way it works is that first conversion to signed is tried. If it fails, then conversion to unsigned is tried.

@jbrockmendel
Copy link
Contributor Author

The way it works is that first conversion to signed is tried. If it fails, then conversion to unsigned is tried.

can you point me to the relevant part of the code?

context: im trying to make pandas inference/constructors do fewer passes

@charris
Copy link
Member

charris commented Jun 1, 2021

can you point me to the relevant part of the code?

Heh, I saw it once upon a time . . . There is similar code in three files: convert.c, scalarapi.c, and abstractdtypes.c. The repetition with some slight differences is unsettling. The last is probably what you are looking for.

static PyArray_Descr *
discover_descriptor_from_pyint(
        PyArray_DTypeMeta *NPY_UNUSED(cls), PyObject *obj)
{
    assert(PyLong_Check(obj));
    /*
     * We check whether long is good enough. If not, check longlong and
     * unsigned long before falling back to `object`.
     */
    long long value = PyLong_AsLongLong(obj);
    if (error_converting(value)) {
        PyErr_Clear();
    }
    else {
        if (NPY_MIN_LONG <= value && value <= NPY_MAX_LONG) {
            return PyArray_DescrFromType(NPY_LONG);
        }
        return PyArray_DescrFromType(NPY_LONGLONG);
    }

    unsigned long long uvalue = PyLong_AsUnsignedLongLong(obj);
    if (uvalue == (unsigned long long)-1 && PyErr_Occurred()){
        PyErr_Clear();
    }
    else {
        return PyArray_DescrFromType(NPY_ULONGLONG);
    }

    return PyArray_DescrFromType(NPY_OBJECT);
}

The topic of value based conversion has been discussed as part of the new dtype work, @seberg might have more to say.

@seberg
Copy link
Member

seberg commented Jun 2, 2021

Hmm, the last one should be the important one, yeah. There are two things to note here:

  1. The integer "ladder" is a bit distinct from value-based promotion/casting. This is long -> long long -> uint long -> object
  2. np.array([1, np.uint64(3)]) does not really use value-based promotion. It uses the "default" for the first integer. (I had a first version once that was capable of using value-based promotion, the current code is not. Right now I think that is probably for the better.)

I.e. np.array([1, np.uint64(3)]) just promotes whatever the 1 is considered based on the "ladder". I suppose that is also a form of value-based promotion, but it is distinct from typical value-based promotion.

Future:

My opinion is currently:

  1. We actually attempt to get rid of value-based promotion entirely (hopefully in the next 1-2 months and then see how that goes) – There may be quite a bit of updating in pandas necessary.
  2. We could try to remove that "integer ladder" and always go to the default integer. An error would be raised if assignment fails in that case.

@jbrockmendel
Copy link
Contributor Author

We actually attempt to get rid of value-based promotion entirely (hopefully in the next 1-2 months and then see how that goes) – There may be quite a bit of updating in pandas necessary.

Do you mean you wouldn't do any inference in np.array when passed a list and no dtype? I must be misunderstanding.

We could try to remove that "integer ladder" and always go to the default integer. An error would be raised if assignment fails in that case.

Is a "best lossless" option on the table? (basically what clean_index_list described below aims for)

can you point me to the relevant part of the code?

Poor wording on my part. I was actually asking about the part of the code that iterates over a not-yet-ndarray sequence to infer a dtype as part of the constructor. This would correspond to some combination of pd._libs' lib.infer_dtype and lib.maybe_convert_objects.

The kind of pattern that im looking to avoid is in e.g. lib.clean_index_list where we do

    inferred = infer_dtype(obj, skipna=False)
    [...]
    elif inferred in ['integer']:
        # we infer an integer but it *could* be a uint64

        arr = np.asarray(obj)
        if arr.dtype.kind not in ["i", "u"]:
            # eg [0, uint64max] gets cast to float64,
            #  but then we know we have either uint64 or object
            if (arr < 0).any():
                # TODO: similar to maybe_cast_to_integer_array
                return np.asarray(obj, dtype="object"), 0

            # GH#35481
            guess = np.asarray(obj, dtype="uint64")
            return guess, 0

infer_dtype does a pass through the array, then np.asarray is N-pass for N of I'm guessing 1 or 2, then the (arr < 0).any() is 2 passes plus an allocation, ...

This seems like it should be doable in way fewer passes.

@seberg
Copy link
Member

seberg commented Jun 2, 2021

Do you mean you wouldn't do any inference in np.array when passed a list and no dtype?

Sorry, "value-based promotion" is not currently used for np.array, it just uses "plain" promotion! The value-based part comes in mainly for np.result_type and operations, such as float32_array + 4. returning a float32 and not a float64!

EDIT: So there will be no change here, inference of course happens, the question is how smart it is. The different integers being used when integers are large may go away though.

Is a "best lossless" option on the table?

Some thoughts below, but maybe we should chat about this a bit? I think that pandas could leverage NumPy in principle, and that may well be worth the trouble. Although, I am a bit worried that it will also be a bit of a hack. On the other hand, I am not sure how well pandas could currently deal with NumPy user DType, and this might go a long way to that?

A correct "best" lossless for signed or unsigned integers seems pretty tough (but I guess you do not need that?). Also the current NumPy implementation inside of np.array(...) is slightly limited compared to other promotion. That is, it currently only with dtype instances/descriptors and not really directly with DType (type/cass).
That means, it doesn't actually support all of the value-based promotion, np.array([np.float32(3.), 3.]) cannot result in a float32. That is, because this would currently be implemented as an abstract DType (for float32_arr + 3. the 3. is an abstract integer DType). But: even there, I do not want value-based support in any case (it can be done, but would be hackish to fully support and it doesn't seem like anyone actually likes it anyway.)

Now for pandas? Even in the above np.array([np.float32(3.), 3.]) could be hacked, if we wrote something like:

np.array([np.float32(3.), 3.], dtype=PythonFloatAsFloat16DType)

the problem is, that if you only have python floats, you get a float16 result ;). (The smaller problem is, that I am not sure I have implemented enough of casting yet to do the above.)

For you, even that can't possibly be enough. You would need to track the current state in form of a dtype instance. That would be something like an "abstract dtype instance". And instance that cannot be attached to an actual array (or if it was, will always result in errors)!

That feels hackish, but to be honest, should work just fine. All we need to ensure is that an error is raised when its used (we need that anyway probably). And a way to convert that instance to the actual one that can be attached to the NumPy array.

In theory, that could be a method that is automatically called, i.e. dtype = dtype.as_concrete() (where as_concrete() is only called when necessary and a no-op for any normal dtype/descriptor). In practice, it doesn't have to be, since this would be hidden in the pandas internals.

raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Dec 24, 2021
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Jan 16, 2022
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Feb 6, 2022
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Mar 12, 2022
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Apr 18, 2022
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Jul 14, 2022
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Nov 20, 2022
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Jan 26, 2023
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

pandas/tests/window/test_rolling.py also gets an i386 xfail for
rounding error that may be x87 excess precision

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Jan 26, 2023
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

pandas/tests/window/test_rolling.py also gets an i386 xfail for
rounding error that may be x87 excess precision

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Feb 2, 2023
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

pandas/tests/window/test_rolling.py also gets an i386 xfail for
rounding error that may be x87 excess precision

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Mar 1, 2023
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

pandas/tests/window/test_rolling.py also gets an i386 xfail for
rounding error that may be x87 excess precision

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Dec 19, 2023
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

pandas/tests/window/test_rolling.py also gets an i386 xfail for
rounding error that may be x87 excess precision

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Dec 19, 2023
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

pandas/tests/window/test_rolling.py also gets an i386 xfail for
rounding error that may be x87 excess precision

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Feb 6, 2024
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

pandas/tests/window/test_rolling.py also gets an i386 xfail for
rounding error that may be x87 excess precision

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Feb 15, 2024
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Feb 22, 2024
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue May 25, 2024
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Jun 11, 2024
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Jul 23, 2024
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Sep 26, 2024
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Oct 17, 2024
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Nov 6, 2024
We test on more architectures, so upstream's xfails are not always
correct everywhere.  On those known to fail:
arm64 xfail -> all non-x86 xfail
x86 or unconditional strict xfail -> unconditional nonstrict xfail

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Bug: pandas-dev/pandas#38921, pandas-dev/pandas#38798, pandas-dev/pandas#41740, numpy/numpy#19146
Forwarded: no


Gbp-Pq: Name fix_overly_arch_specific_xfails.patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants