-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DataFrame.write_excel #5568
Comments
We have this requirement as well. Currently, we convert the Polars dataframe to Pandas and export it from there. Would be great, of course, if we didn't have to take this detour. |
Another option would be to use the pure rust rust_xlsxwriter library. The rust_xlsxwriter roadmap explains the rationale and the current features. I wrote the initial |
Hello, I'm currently working on this feature using a wrapper around opepyxl. MR should come still this week, as all is working and I'm fixing tests. I'm also trying to leave the API exatcly the same as write_csv with the same transformations. Currently the blocking point is that the read_excel previously implemented cannot handle the inputs well. |
Made a draft merge request, if any of you can take a look and suggest changes/improvements. |
@jmcnamara / @ritchie46: I believe I have a one line PR in the python XlsxWriter repository, heh ;) Kudos on making a Rust port - the python version is excellent! I used it to add a surprisingly comprehensive Excel export option to one of our major internal data APIs when I worked back at JPMorgan, and it was very well thought of. I'd certainly be interested in helping shape our usage, having done it once before - could start with the python API, and once that looks good we could think about how best to adapt it on the Rust side. Having the same core library features available in both languages seems like a win, and I can speak to the quality/utility of XlsxWriter. |
Cool. :-)
That sounds like a good approach. |
Finally started on this (in Python) ... |
@alexander-beedie good news. If there are any enhancements to XlsxWriter (within reason) that would make integrations with Polars easier/better let me know. |
@jmcnamara: so far it's a breeze, much as I remember ;) Have integrated dtype and/or per-column formatting, float precision, conditional formatting (all flavours), table styling, total row, autoformat/autofit, and so on... Full Almost ready for a first cut; need to polish-up what's there and then take care of docs and do some more validation/testing. Sample output from a single call to (Sparklines can probably wait for a second iteration, though I definitely want to integrate those too). |
Wow. Looks great. I'm looking forward to it. :-) |
Added a few last features today, and polished it all up along with reasonably detailed docstrings and some tests... First iteration is ready to ship: #7251 :) |
@alexander-beedie That is great work. Really strong option support from the start. |
@jmcnamara: I am a conduit for your remarkable library, orchestrating access across the breadth of the xlsxwriter API... ;) Once it has settled in a version or two, what would you think about adding some Polars-specific help pages to the xlsxwriter site, equivalent to the existing Pandas ones? (I'd be more than happy to write/commit them, assuming that the docs are part of the repository). |
Absolutely. I had actually typed a suggestion like that with my previous comment and then thought that might be insensitive because you have already provided some nice examples in your polar docs. :-) From my point of view the Working with Python Pandas and XlsxWriter were necessary because I kept seeing/answering the same types of questions on StackOverflow. I'll take a stab at creating a "Working with Polars and XlsxWriter" chaper in the next week or two and hook you in. When do you think this feature will be in a public Polars release? |
Perfect; shouldn't be more than a few days until |
@jmcnamara: it's out now - just had time to squeeze-in sparkline support too ;) |
@alexander-beedie Excellent. I'll start on the docs and hook you in once I have a basic framework (in the next couple of days). |
I've added initial docs for this at Working with Polars and XlsxWriter in the main documentation. See also jmcnamara/XlsxWriter#961 |
Any performance measurements against xlsxwriter (in pandas) and especially against PyExcelerate? |
@leonkosak: Feel free to run some and let us know how it goes ;) Update: a quick & dirty timing check vs pandas shows we're roughly 50-60% faster to write the same amount of data (with default settings) while also creating a "real" Excel table object (with autofilter/etc) and adding default column formats, which pandas doesn't seem to do. (I made sure not to write the extra index col from pandas, so as to make it a fair comparison). from codetiming import Timer
from datetime import date
import polars as pl
# quickly spin-up a 1,000,000 element DataFrame
df = pl.DataFrame({
"idx": range(250_000),
"x": 123.456789,
"y": date.today(),
"z":"testing,1.2.3.4."}
)
# export to Excel from polars
with Timer():
df.write_excel( "dataframe_pl.xlsx" )
# export to Excel from pandas
pf = df.to_pandas()
with Timer():
pf.to_excel( "dataframe_pd.xlsx", index=False ) Results:
Polars ~60% faster. |
I've been working on a data handling section for the It doesn't have a fraction of the functionality of |
I've uploaded a new Rust crate called It provides two interfaces for writing a Polars Rust dataframe to an Excel Xlsx file:
Note, this is for Rust dataframes rather than Python dataframes so folks on this thread may not be as interested. If you are and you try it out you can leave some feedback here. |
Just to note, in case anyone finds it useful, with version 3.2.2 of XlsxWriter you can format boolean values in Polars dataframes as checkboxes in Excel: import polars as pl
# Create a Pandas dataframe with some sample data.
df = pl.DataFrame(
{
"Region": ["North", "South", "East", "West"],
"Target": [100, 70, 90, 120],
"On-track": [False, True, True, False],
}
)
# Write the dataframe to a new Excel file with formatting options.
df.write_excel(
workbook="polars_checkbox.xlsx",
# Set the checkbox format for the "On-track" boolean column.
column_formats={"On-track": {"checkbox": True}},
# Set an alternative table style.
table_style="Table Style Light 9",
# Autofit the column widths.
autofit=True,
) |
Problem description
The read_excel file was added in #3567 and it would be nice if there was also a
DataFrame.write_excel
function, maybe using xlsxwriter-rs.The text was updated successfully, but these errors were encountered: