Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature coordinates accept numpy float64 but float32 is "not a JSON compliant number" #173

Closed
60south opened this issue Dec 30, 2021 · 10 comments

Comments

@60south
Copy link

60south commented Dec 30, 2021

Hi. I'm new to GeoJSON and still learning Python.

I'm reading in coordinate data from various sources. Some arrives as numpy float32 type. Attempting to create a geojson point or polygon using float64 works fine, while float32 produces an error: "ValueError: 1.0 is not a JSON compliant number" (see example below).

The error is very confusing, and it took a long time to figure out what it was complaining about. It's easy enough to convert, but any floating point number should be valid. Thanks.

import geojson
import numpy as np

a = np.float64(1.0)
b = np.float64(2.0)
point = geojson.Point([a, b])

# Successful
print(point)

#------------------------------

a = np.float32(1.0)
b = np.float32(2.0)
point = geojson.Point([a, b])

# Produces error

@trcmohitmandokhot
Copy link
Contributor

I've been staring at this behavior for a bit. I would like to take this on. However, this is my first time tackling an issue. At the very minimum, I would like to advance the conversation around whether what @60south saw was expected behavior and is compliant with RFC 7946.

@60south
Copy link
Author

60south commented Nov 7, 2022

Thanks @trcmohitmandokhot

In defense of the GeoJSON module, the problem appears related to the cpython 'isinstance' function which is used on line 50 of the GeoJSON geometry.py function.

Specifically, there is an inconsistency in how isinstance() treats numpy floats:

import numpy as np
isinstance(np.float64(1), float)

returns True, even though it's actually of type <class 'numpy.float64'>, not 'float', whereas:

import numpy as np
isinstance(np.float32(1), float)

returns False. Ditto for isinstance(np.float128(1), float), which should also be True.

I'm too new at this to say whether this is a problem with isinstance() or the numpy implementation of floats, or not really a problem at all and I'm just a whiner, but it does seem kinda bogus: a float is a float, we're just dickering over precision.

It would be easy enough to import numpy into GeoJSON and test for those types, but then that adds numpy as a dependency to GeoJSON which it shouldn't need. Ugh.

The best suggestion I have is to use the numbers module, i.e., instead of:

elif isinstance(coord, (float, int, Decimal)):

do this:

import numbers
...
elif isinstance(coord, (numbers.Real, Decimal)):
...

If a Decimal is a Number, then the statement could be shortened even further.

I'm sure someone with more knowledge than I will shred this idea, but it's the best I got.

thanx.

@trcmohitmandokhot
Copy link
Contributor

trcmohitmandokhot commented Nov 8, 2022

@60south.
While, I don't yet know what to do about the issue, I looked some more into the numpy library and this is what I found.
I think the built-in python library function isinstance() is doing its job right.
However, the definition of numpy.dtypes appears to be inconsistent.
numpy.double
The numpy.double (which on Linux x86_64 is numpy.float64) is an alias of numpy.float_. The numpy.float_ is a built-in scalar data-type, which inherits from python's float. Built-in Types
Hence your insinstance(np.float64(1),float) returns True.
However, all other numpy.dtypes i.e. numpy.float32 and numpy.float128 are custom data types, which do not have Python equivalents. Hence calls to isinstance() fail and you run into the "ValueError: 1.0 is not a JSON compliant number".
I will look at the code-logic next to see if using the numbers module or Decimals shows some clues to solving this interesting problem.

@trcmohitmandokhot
Copy link
Contributor

There appear to be some previous interesting numpy issues, which also relate to this dtype inconsistency topic. I am attaching links here for some side-reading.
Data type precision problems? #5272

@rayrrr
Copy link
Member

rayrrr commented Nov 8, 2022

Thanks for reporting this, @60south. In section 3.1.1 of RFC 7946 it says (emphasis added):

A position is an array of numbers. There MUST be two or more
elements. The first two elements are longitude and latitude, or
easting and northing, precisely in that order and using decimal
numbers
. Altitude or elevation MAY be included as an optional third
element.

This implies that checking that the coordinates are decimal numbers is the intended behavior for this logic. @trcmohitmandokhot I encourage you to submit a PR to resolve this bug, which I'd be happy to review. Here are some breadcrumbs:

  1. The LoC causing the exception is

    elif isinstance(coord, (float, int, Decimal)):

  2. isinistance(np.float32, float) is false numpy/numpy#13133 (comment) offers a potential solution

@trcmohitmandokhot
Copy link
Contributor

Thank you for the guidance. I will take this up.

@rayrrr
Copy link
Member

rayrrr commented Nov 8, 2022

One other note on the proposed solution, @trcmohitmandokhot. It uses numpy. We would rather not add numpy as a dependency. Let's try to find a way to fix it without numpy.

@60south
Copy link
Author

60south commented Nov 8, 2022

@rayrrr, thanks. I agree that numpy should not be a dependency. Rather, we need an expansive, holistic solution where any Real floating point number passes the test, as it should.

I am cautious about equating the traditional RFC 7946 definition of decimal (i.e., not a fraction) with the python Decimal class, since I don't think they mean the same thing. In fact, looking at the Decimal class definition, I see they actually warn against treating Decimals as Real numbers:

Do not subclass Decimal from numbers.Real and do not register it as such
(because Decimals are not interoperable with floats). See the notes in
numbers.py for more detail.

That shouldn't preclude their use as coordinates, of course.

After some further testing, it looks like any numpy.floatXX, numpy.intXX, and numpy.uintXX, etc., do pass the isinstance(x, numbers.Real) test, so they apparently inherit from the Numbers abstract base class, but not the Decimal class. Complex numbers fail the isinstance() test, as they should.

Seems like:
elif isinstance(coord, (numbers.Real, Decimal)):
should work.

Regardless, I'm very happy to see this issue getting some attention!

@trcmohitmandokhot
Copy link
Contributor

I agree with @60south's and @rayrrr's analysis and suggestions.
The recommendation to stay away from adding numpy as a dependency is a valid point. The suggestion to handle this via Numbers abstract base class seems to me to be the most reasonable. I did some more testing for my own documentation purposes and here is summary of results to support @60south's proposed PR change.

PEP 3141 lays out a really nice Type Hierarchy for Numbers, where Number :> Complex :> Real :> Rational :> Integral. PEP also acknowledges that Decimal does not fit into the hierarchy.

  • Current geojson type-check elif isinstance(coord, (float,int,Decimal))
  • Proposed geojson type-check elif isinstance(coord, (numbers.Real, Decimal))

Results of testing are tabulated below.

Current type-check Proposed type-check
Input (var) Type Class per PEP 3141 Hierarchy isinstance(coords,(float,int,Decimal)) isinstance(coords,(numbers.Real,Decimal))
x = 1 int Integral Number True True
x = np.uint(1) numpy.uint64 Integral Number False True
x = 1.0 float Real Number True True
x = Decimal(1.0) decimal.Decimal Number True True
x = np.float32(1.0) numpy.float32 Real Number False True
x = np.float64(1.0) numpy.float64 Real Number True True
x = np.float128(1.0) numpy.float128 Real Number False True

@rayrrr
Copy link
Member

rayrrr commented Feb 3, 2023

Fix included in version 3.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants