Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: date_format and datetime_format arguments to ExcelWriter are currently ignored by default openpyxl engine #44284

Open
3 tasks done
gaibo opened this issue Nov 2, 2021 · 13 comments
Assignees
Labels
Bug Datetime Datetime data dtype IO Excel read_excel, to_excel

Comments

@gaibo
Copy link
Contributor

gaibo commented Nov 2, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
with pd.ExcelWriter('TEST.xlsx', datetime_format='YYYY-MM-DD') as writer:
    test_df = pd.DataFrame([9000]*10, index=[pd.Timestamp('2021-11-02')]*10)
    test_df.to_excel(writer)

Issue Description

In pandas version 0.24.2, the above example would output nicely formatted dates. In pandas version 1.3.4, it outputs YYYY-MM-DD HH:MM:SS format, ignoring the datetime_format field.
This is probably as simple as openpyxl becoming the new default engine but not being configured for date formatting.

Expected Behavior

In written Excel sheet, YYYY-MM-DD format dates instead of YYYY-MM-DD HH:MM:SS format dates.

Installed Versions

INSTALLED VERSIONS

commit : 73c6825
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 1.3.3
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@gaibo gaibo added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 2, 2021
@asishm
Copy link
Contributor

asishm commented Nov 2, 2021

This is probably as simple as openpyxl becoming the new default engine but not being configured for date formatting.

The default engine for writing hasn't changed - it's still xlsxwriter, but in this case you don't have xlsxwriter so it uses openpyxl. If you install xlsxwriter, you'll see the previous behavior.

as for the issue - it seems to be because these (date_format, datetime_format) args are not being propagated to the base class

class OpenpyxlWriter(ExcelWriter):
engine = "openpyxl"
supported_extensions = (".xlsx", ".xlsm")
def __init__(
self,
path,
engine=None,
date_format=None,
datetime_format=None,
mode: str = "w",
storage_options: StorageOptions = None,
if_sheet_exists: str | None = None,
engine_kwargs: dict[str, Any] | None = None,
**kwargs,
):
# Use the openpyxl module as the Excel writer.
from openpyxl.workbook import Workbook
engine_kwargs = combine_kwargs(engine_kwargs, kwargs)
super().__init__(
path,
mode=mode,
storage_options=storage_options,
if_sheet_exists=if_sheet_exists,
engine_kwargs=engine_kwargs,
)

@gaibo
Copy link
Contributor Author

gaibo commented Nov 2, 2021

The default engine for writing hasn't changed - it's still xlsxwriter, but in this case you don't have xlsxwriter so it uses openpyxl. If you install xlsxwriter, you'll see the previous behavior.

Hey thanks for the quick response! You are absolutely right, I somehow missed the following in ExcelWriter class while glancing at source yesterday:
Default is to use : * xlwt for xls * xlsxwriter for xlsx if xlsxwriter is installed otherwise openpyxl * odf for ods. See DataFrame.to_excel for typical usage.
I CTRL-F'd for "engine" and read the defaults for the reader instead. Well, this solved my use case haha but hope someone is up for the actual issue.

@lithomas1 lithomas1 added IO Excel read_excel, to_excel Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 2, 2021
@lithomas1 lithomas1 added this to the Contributions Welcome milestone Nov 2, 2021
@sebastian-de
Copy link

sebastian-de commented Dec 13, 2021

The above example works fine for me until v1.2.5.
Since v1.3.0 date_format and datetime_format are ignored. I suspect this is due to PR #40430
In 1.3.x, XlsxWriter super().init() has

            date_format=date_format,
            datetime_format=datetime_format,

which is missing in OpenpyxlWriter super().init()
I'm off to a short holiday, but next week I'll try if adding these lines to OpenpyxlWriter fixes the problem.

@driedwater
Copy link

take

@TimoMorris
Copy link

@driedwater feel free to have a look at my PR as a starting point if you like: #47315 it was all working, had just dragged on a bit waiting for code reviews and by the time the latest one came back I was too busy to sort it and then haven't had a chance to pick it up again.

@driedwater
Copy link

Sure, I'll take a look at it.

@CDWimmer
Copy link

did this ever get anywhere?

@khuseynov-daymon
Copy link

Any Reason why this issue is still being ignored? nobody use openpyxl ?

@CDWimmer
Copy link

CDWimmer commented Apr 4, 2024

Plenty do use it!
@driedwater @TimoMorris Don't suppose you'd have time to round this one up now? Looks like you're still the last ones to have given it a go unless I'm being blind

@asishm
Copy link
Contributor

asishm commented Apr 4, 2024

As a workaround, you can use writer = pd.Excelwriter(..., engine='openpyxl') and override the default date_format and datetime_format attributes using writer.date_format = '...'. For a, fix #47315 was nearly there if someone wants to take it up!

@ma-ef
Copy link

ma-ef commented May 30, 2024

@asishm Unfortunately this does not work:
Traceback (most recent call last):
writer.date_format= "DD.MM.YYYY"
^^^^^^^^^^^^^^^^^^
AttributeError: property 'date_format' of 'OpenpyxlWriter' object has no setter

@asishm
Copy link
Contributor

asishm commented May 30, 2024

Ah you're right. These attributes were changed to properties without setters in 1.5.0. While not recommended, you can set writer._date_format and writer._datetime_format instead. Note that since these use attributes that are not public, it's not guaranteed to work in the future and they may get changed/removed at any time.

@Mitmischer
Copy link

I want to bump this issue as I was also bitten by it recently. Maybe reverting the default engine to xlsxwriter would be a proper hotfix for the time being?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.