Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading SAS time variables #22720

Open
sasutils opened this issue Sep 15, 2018 · 9 comments
Open

Reading SAS time variables #22720

sasutils opened this issue Sep 15, 2018 · 9 comments
Labels

Comments

@sasutils
Copy link

Code Sample, a copy-pastable example if possible

Problem description

I tried to search to see if this issue was raised before, but could not find anything.

I notice that read_sas can convert SAS numbers formatted as DATE values (number of days since 1960) or DATETIME values (number of seconds since 1960) to datetime values supported by the datetime class in Python.

But it does not include any logic to convert numbers formatted as TIME values (number of seconds since midnight) even though it looks like the datetime class in Python does support time of day values.

Why does read_sas not also convert those numbers? Is it because nobody actually uses that Python data type? Nobody asked for that feature?

@TomAugspurger
Copy link
Contributor

But it does not include any logic to convert numbers formatted as TIME values (number of seconds since midnight)

Is it actually seconds since midnight? If so that could be a timedelta. But if its supposed to represent the actual time of day, then no, pandas (nor numpy) have a type for that.

@TomAugspurger TomAugspurger added the IO SAS SAS: read_sas label Sep 15, 2018
@sasutils
Copy link
Author

Yes TIME is the number of seconds in 24 hours, ie the number of seconds since midnight.

Looks like the definition for datetime.time. https://docs.python.org/3/library/datetime.html

class datetime.time
An idealized time, independent of any particular day, assuming that every day has exactly 246060 seconds (there is no notion of “leap seconds” here). Attributes: hour, minute, second, microsecond, and tzinfo.

Perhaps I am misunderstanding the Python documentation?

@troels
Copy link
Contributor

troels commented Sep 16, 2018

It could be converted to an ndarray of dtype object with time objects. It wouldn't be efficiently stored like datetime dtype=datetime, but it would be correct.

@TomAugspurger
Copy link
Contributor

@sasutils yes, I was asking whether SAS's TIME matches Python's datetime.time (no leap seconds) or whether it was more of a datetime.timedelta (with leap seconds).

Loading it as a ndarray of time objects would be unusual for pandas. AFAIK, we only do this type of conversion for dtypes that are actually supported.

I'd recommend users do this themselves by converting the loaded data into an object-dtype column holding datetime.time values (and if someone really wants to they can make an extension array for this).

@TomAugspurger TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Sep 16, 2018
@TomAugspurger TomAugspurger added this to the No action milestone Sep 16, 2018
@troels
Copy link
Contributor

troels commented Sep 16, 2018

Both e.g. the excel parser and the SQL module handle and do the conversion to and from time just fine, and as far as I know that is true for all other file formats supporting time as a type.
Why should sas-files be treated differently?

Doing the conversion themselves force the users to manually keep track of the types of the respective columns in a way that is unnecessary, as all the information required is right there in the sas-file.
Why would we force the user to do that?

@TomAugspurger
Copy link
Contributor

I didn’t realize we did that for excel and sql.

@ofajardo
Copy link

ofajardo commented Dec 9, 2020

pyreadstat converts SAS TIME to python datetime.time objects. You could add pyreadstat as a backend for read_sas (already used for read_spss) and solve this issue as well as others (#37088, #35545, #31243, #18198)

@jbrockmendel
Copy link
Member

might be able to use new pyarrow time dtype here

@PasamonteG
Copy link

PasamonteG commented Oct 9, 2023

I am having an error with read_sas:

Traceback (most recent call last):
File "pandas/_libs/tslibs/timedeltas.pyx", line 372, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit
File "pandas/_libs/tslibs/conversion.pyx", line 126, in pandas._libs.tslibs.conversion.cast_from_unit
OverflowError: Python int too large to convert to C long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pandas/_libs/tslibs/timedeltas.pyx", line 372, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit
  File "pandas/_libs/tslibs/conversion.pyx", line 126, in pandas._libs.tslibs.conversion.cast_from_unit
OverflowError: Python int too large to convert to C long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/u/workspaces/pythononzos/src/python/sas_reader_pandas_1.py", line 25, in <module>
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/io/sas/sasreader.py", line 161, in read_sas
    reader = SAS7BDATReader(
             ^^^^^^^^^^^^^^^
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/io/sas/sas7bdat.py", line 207, in __init__
    self._get_properties()
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/io/sas/sas7bdat.py", line 294, in _get_properties
    self.date_created = epoch + pd.to_timedelta(x, unit="s")
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/core/tools/timedeltas.py", line 211, in to_timedelta
    return _coerce_scalar_to_timedelta_type(arg, unit=unit, errors=errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/core/tools/timedeltas.py", line 219, in _coerce_scalar_to_timedelta_type
    result = Timedelta(r, unit)
             ^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1695, in pandas._libs.tslibs.timedeltas.Timedelta.__new__
  File "pandas/_libs/tslibs/timedeltas.pyx", line 351, in pandas._libs.tslibs.timedeltas.convert_to_timedelta64
  File "pandas/_libs/tslibs/timedeltas.pyx", line 374, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit
pandas._libs.tslibs.np_datetime.OutOfBoundsTimedelta: Cannot cast 1.3033002481878809e+41 from s to 'ns' without overflow.

This is the code:

df = pd.read_sas(url,format = 'sas7bdat')

where url contains the path for the file.

The problem here is that I can’t avoid casting. This is what the table looks like:
image

There might be a way to read the SAS file and assign a data type later.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants