Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More roundoff problems #78

Closed
aidanheerdegen opened this issue Nov 8, 2018 · 16 comments
Closed

More roundoff problems #78

aidanheerdegen opened this issue Nov 8, 2018 · 16 comments

Comments

@aidanheerdegen
Copy link
Contributor

I don't know if this is related to #54 but I have noticed what I would call "unwanted precision" when interconverting between dates and numbers:

In [1]: from cftime import num2date, date2num
In [2]: calendar = 'noleap'
In [3]: u1970 = 'days since 1970-01-01' 
In [4]: u1980 = 'days since 1980-01-01' 
In [5]: date_1980 = num2date(0,u1980,calendar)
In [6]: date_1980
Out[6]: cftime.DatetimeNoLeap(1980, 1, 1, 0, 0, 0, 0, 6, 1)
In [7]: date2num(date_1980,u1970,calendar)
Out[7]: 3650.000000000001

I think this could be characterised as erroneous, The return value of the last date2num call should be exactly 3650.

@aidanheerdegen
Copy link
Contributor Author

This may seem trivial, but I'm using num2date and date2num to rebase the time axes in netCDF files. When I do this operation it doesn't round trip. When I rebase the time dimension of a netCDF file and then rebase back to the original time units they are no longer the same.

So in the case below, ds2 is ds rebased, and then the reverse rebase applied, so they should be the same, and they appear to be on casual inspection:

(Pdb) ds.time
<xarray.DataArray 'time' (time: 120)>
array([693150.5, 693180. , 693209.5, 693240. , 693270.5, 693301. , 693331.5,
       693362.5, 693393. , 693423.5, 693454. , 693484.5, 693515.5, 693545. ,
       693574.5, 693605. , 693635.5, 693666. , 693696.5, 693727.5, 693758. ,
       693788.5, 693819. , 693849.5, 693880.5, 693910. , 693939.5, 693970. ,
       694000.5, 694031. , 694061.5, 694092.5, 694123. , 694153.5, 694184. ,
       694214.5, 694245.5, 694275. , 694304.5, 694335. , 694365.5, 694396. ,
       694426.5, 694457.5, 694488. , 694518.5, 694549. , 694579.5, 694610.5,
       694640. , 694669.5, 694700. , 694730.5, 694761. , 694791.5, 694822.5,
       694853. , 694883.5, 694914. , 694944.5, 694975.5, 695005. , 695034.5,
       695065. , 695095.5, 695126. , 695156.5, 695187.5, 695218. , 695248.5,
       695279. , 695309.5, 695340.5, 695370. , 695399.5, 695430. , 695460.5,
       695491. , 695521.5, 695552.5, 695583. , 695613.5, 695644. , 695674.5,
       695705.5, 695735. , 695764.5, 695795. , 695825.5, 695856. , 695886.5,
       695917.5, 695948. , 695978.5, 696009. , 696039.5, 696070.5, 696100. ,
       696129.5, 696160. , 696190.5, 696221. , 696251.5, 696282.5, 696313. ,
       696343.5, 696374. , 696404.5, 696435.5, 696465. , 696494.5, 696525. ,
       696555.5, 696586. , 696616.5, 696647.5, 696678. , 696708.5, 696739. ,
       696769.5])
Coordinates:
  * time     (time) float64 6.932e+05 6.932e+05 ... 6.967e+05 6.968e+05
Attributes:
    long_name:       time
    units:           days since 0001-01-01 00:00:00
    cartesian_axis:  T
    calendar_type:   NOLEAP
    calendar:        NOLEAP
    bounds:          time_bounds
(Pdb) ds2.time
<xarray.DataArray 'time' (time: 120)>
array([693150.5, 693180. , 693209.5, 693240. , 693270.5, 693301. , 693331.5,
       693362.5, 693393. , 693423.5, 693454. , 693484.5, 693515.5, 693545. ,
       693574.5, 693605. , 693635.5, 693666. , 693696.5, 693727.5, 693758. ,
       693788.5, 693819. , 693849.5, 693880.5, 693910. , 693939.5, 693970. ,
       694000.5, 694031. , 694061.5, 694092.5, 694123. , 694153.5, 694184. ,
       694214.5, 694245.5, 694275. , 694304.5, 694335. , 694365.5, 694396. ,
       694426.5, 694457.5, 694488. , 694518.5, 694549. , 694579.5, 694610.5,
       694640. , 694669.5, 694700. , 694730.5, 694761. , 694791.5, 694822.5,
       694853. , 694883.5, 694914. , 694944.5, 694975.5, 695005. , 695034.5,
       695065. , 695095.5, 695126. , 695156.5, 695187.5, 695218. , 695248.5,
       695279. , 695309.5, 695340.5, 695370. , 695399.5, 695430. , 695460.5,
       695491. , 695521.5, 695552.5, 695583. , 695613.5, 695644. , 695674.5,
       695705.5, 695735. , 695764.5, 695795. , 695825.5, 695856. , 695886.5,
       695917.5, 695948. , 695978.5, 696009. , 696039.5, 696070.5, 696100. ,
       696129.5, 696160. , 696190.5, 696221. , 696251.5, 696282.5, 696313. ,
       696343.5, 696374. , 696404.5, 696435.5, 696465. , 696494.5, 696525. ,
       696555.5, 696586. , 696616.5, 696647.5, 696678. , 696708.5, 696739. ,
       696769.5])
Coordinates:
  * time     (time) float64 6.932e+05 6.932e+05 ... 6.967e+05 6.968e+05
Attributes:
    long_name:       time
    units:           days since 0001-01-01 00:00:00
    cartesian_axis:  T
    calendar_type:   NOLEAP
    calendar:        NOLEAP
    bounds:          time_bounds

but

(Pdb) ds2.equals(ds1)
False
(Pdb) ds.time.values - ds2.time.values
array([-1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10,
       -1.16415322e-10, -1.16415322e-10, -1.16415322e-10, -1.16415322e-10])

My cftime and python versions:

$ conda list cftime
cftime                    1.0.2.1          py36h7eb728f_0    conda-forge
$ python --version
Python 3.6.6

@aidanheerdegen
Copy link
Contributor Author

Another simple example (suitable for testing?)

(Pdb) cftime.date2num(cftime.datetime(1, 12, 1, 0, 0, 0, 0, -1, 1), units='days since 01-01-01',calendar='noleap') 
334.00000000000006

@jswhit
Copy link
Collaborator

jswhit commented Nov 8, 2018

Pull request #79 improves the precision, so now you will get 3650.0000000000007958 instead of 3650.000000000001. I don't see what I can do about those extra annoying digits - that's just the nature of floating point arithmetic.

@aidanheerdegen
Copy link
Contributor Author

I was hoping there might some clever way to enforce the same precision on the number output from date2num as the precision of the date itself. In the example above:

cftime.date2num(cftime.datetime(1, 12, 1, 0, 0, 0, 0, -1, 1), units='days since 01-01-01',calendar='noleap') 

the precision is microseconds. So we need less precision, not more. Or at least, specified precision.

I think it is an error that this doesn't roundtrip:

>>> day=1
>>> roundtripday = cftime.date2num(cftime.num2date(day,units='days since 01-01-01',calendar='noleap'),units='days since 01-01-01',calendar='noleap')
>>> roundtripday
1.0000000000000002
>>> assert(day == roundtripday)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

Do you agree? In which case, is there a solution?

@jswhit
Copy link
Collaborator

jswhit commented Nov 9, 2018

I'm working on splitting the Julian day into an integer and a floating point remainder, which will reduce round-off error. There won't be a general solution though - there will always be cases where the round-off error prevents the roundtrip from matching.

@aidanheerdegen
Copy link
Contributor Author

Thanks for the feedback.

Now I can't get it to reproduce the above behaviour. In my python3 environment it will roundtrip ok:

$ conda list cftime
# Name                    Version                   Build  Channel
cftime                    1.0.2.1          py36h7eb728f_0    conda-forge
$ python
Python 3.6.6 | packaged by conda-forge | (default, Oct 12 2018, 14:08:43) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cftime
>>> day=1
>>> print(cftime.date2num(cftime.num2date(day,units='days since 01-01-01',calendar='noleap'),units='days since 01-01-01',calendar='noleap'))
1.0

but not python2:

$ conda list cftime
# Name                    Version                   Build  Channel
cftime                    1.0.2.1          py27h7eb728f_0    conda-forge
$ python
Python 2.7.15 | packaged by conda-forge | (default, Oct 12 2018, 14:10:50) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cftime
>>> day=1
>>> print(cftime.date2num(cftime.num2date(day,units='days since 01-01-01',calendar='noleap'),units='days since 01-01-01',calendar='noleap'))
1.0000000000000002
>>> 

But yesterday I am sure it was failing in all environments (the python3 environment has been updated today). I am mystified.

@jswhit
Copy link
Collaborator

jswhit commented Nov 9, 2018

In this particular case the failure in the roundtrip is caused by Unidata/netcdf4-python#433. If you comment out the code in cftime/_cftime.pyx after

# Add a small offset (proportional to Julian date) for correct re-conversion.

and

# remove the offset from the microsecond calculation.

you will get exactly 3650.0. However, many of the tests will then fail (for reasons described in that pull request).

@jswhit
Copy link
Collaborator

jswhit commented Nov 9, 2018

I think pull request #80 addresses this issue (at least for all the cases you presented). @aidanheerdegen, could you test this (branch msround)?

jswhit added a commit that referenced this issue Nov 9, 2018
@aidanheerdegen
Copy link
Contributor Author

Hi Jeff, thanks for looking into this so promptly and providing a potential solution. I won't be able to test this until Monday (Australian time), but will get back to you when I have. Cheers

@aidanheerdegen
Copy link
Contributor Author

Hi Jeff,

I'm having issues building the package

gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march -DNDEBUG -D_FORTIFY_SOURCE -fPIC -I/g/data3/hh5/public/apps/miniconda3/envs/analysis27-18.10/include/python2.7 -c cftime/_cftime.c -o build/temp.linux-x86_64-2.7/cftime/_cftime.o
gcc: error: unrecognized command line option '-march'; did you mean '-march='?
error: command 'gcc' failed with exit status 1

Some limited googling didn't come up with a quick solution. Do you have any idea of the best way to overcome this?

@jswhit
Copy link
Collaborator

jswhit commented Nov 13, 2018

Something is amiss with your build environment. Are you using 'python setup.py build' or conda?

@jswhit
Copy link
Collaborator

jswhit commented Nov 13, 2018

python -c 'import sysconfig; print(sysconfig.get_config_var("CFLAGS"))' will tell you what your default python compiler build flags are.

@aidanheerdegen
Copy link
Contributor Author

I am using a conda python and

python setup.py build 

Is this a bad combination?

$ python -c 'import sysconfig; print(sysconfig.get_config_var("CFLAGS"))'
-fno-strict-aliasing  -m64 -fPIC -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes

@jswhit
Copy link
Collaborator

jswhit commented Nov 14, 2018

No, that looks fine. The problem is the empty '-march', it should be '-march=platform'. Don't know where conda is getting that from then - maybe from your environment? Do you have a CPPFLAGS env var set?

@aidanheerdegen
Copy link
Contributor Author

aidanheerdegen commented Nov 15, 2018

Thanks Jeff, that was the answer. My module load conda is putting all sorts of frankly "interesting" things in my environment:

$ env | grep -i FLAGS
LDFLAGS=-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now
CPPFLAGS=-DNDEBUG -D_FORTIFY_SOURCE
DEBUG_CPPFLAGS=-D_DEBUG -D_FORTIFY_SOURCE
CFLAGS=-march
DEBUG_CFLAGS=-march

I 've switched to another conda environment which doesn't do this, and it is compiling ok, thanks for the assistance.

And yes, that does seem to behave as I would wish. Round-tripping works as do other whole number issues I had before.

Thanks for the fix. Does that break anything else?

@jswhit
Copy link
Collaborator

jswhit commented Nov 15, 2018

No, all the tests are passing. I will merge the pull request now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants