Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: FileCreateError for large excel file created with ExcelWriter #40302

Closed
2 of 3 tasks
roryburnham opened this issue Mar 8, 2021 · 4 comments
Closed
2 of 3 tasks
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@roryburnham
Copy link

roryburnham commented Mar 8, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

def generate_multi_mpan_shape(shape_data, discount_data, mpan_eac_data, days_used):
    # Evaluate shape volume
    shape_volume = shape_data.fillna(0).to_numpy().sum()

    num_of_mpans = mpan_eac_data.shape[0]

    options = {}
    options["strings_to_formulas"] = False
    options["strings_to_urls"] = False

    excel_obj = io.BytesIO()

    writer = pd.ExcelWriter(excel_obj, options=options, engine="xlsxwriter")

    for index in range(num_of_mpans):
        # Generate a shape for a given MPAN
        mpan = mpan_eac_data.iloc[index, 0]
        eac = float(mpan_eac_data.iloc[index, 1])
        volume_used = eac * days_used / 365
        volume_ratio = volume_used / shape_volume

        output_data = shape_data.copy(True)
        output_data *= volume_ratio

        # Apply discounts
        if discount_data.iloc[0] != False:
            output_data = apply_discounts(output_data, discount_data)

        output_data = output_data.round(9)
        output_data.insert(0, "MPAN", mpan)
        output_data.insert(1, "Date", output_data.index.values)

        output_data.to_excel(writer, sheet_name=f"{mpan}", index=False, header=False)
        print(index)

    writer.close()

    excel_obj.seek(0)

    return excel_obj

Problem description

I'm using the following code to generate a multi worksheet excel file using ExcelWriter. This code is being run in a lambda function, which only has 500MB available in the /tmp folder so I've tried to create the file completely in memory. The issue arises when I try to create an excel file that is larger than about 100 MB.
The following error is returned when writer.close() is called

Traceback (most recent call last):\n  File \"/opt/python/xlsxwriter/workbook.py\", line 320, in close\n    self._store_workbook()\n  File \"/opt/python/xlsxwriter/workbook.py\", line 685, in _store_workbook\n    xml_files = packager._create_package()\n  File \"/opt/python/xlsxwriter/packager.py\", line 135, in _create_package\n    self._write_worksheet_files()\n  File \"/opt/python/xlsxwriter/packager.py\", line 190, in _write_worksheet_files\n    worksheet._assemble_xml_file()\n  File \"/opt/python/xlsxwriter/worksheet.py\", line 3875, in _assemble_xml_file\n    self._write_sheet_data()\n  File \"/opt/python/xlsxwriter/worksheet.py\", line 5483, in _write_sheet_data\n    self._write_rows()\n  File \"/opt/python/xlsxwriter/worksheet.py\", line 5676, in _write_rows\n    self._write_cell(row_num, col_num, col_ref)\n  File \"/opt/python/xlsxwriter/worksheet.py\", line 5857, in _write_cell\n    self._xml_number_element(cell.number, attributes)\n  File \"/opt/python/xlsxwriter/xmlwriter.py\", line 137, in _xml_number_element\n    self.fh.write(\"\"\"<c%s><v>%.16G</v></c>\"\"\" % (attr, number))\n  File \"/var/lang/lib/python3.8/codecs.py\", line 721, in write\n    return self.writer.write(data)\n  File \"/var/lang/lib/python3.8/codecs.py\", line 378, in write\n    self.stream.write(data)\nOSError: [Errno 28] No space left on device\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/var/task/src/forecasting/shape_tool/shape_tool_controller.py\", line 133, in generate_multi_shape\n    output_file = service.generate_multi_mpan_shape(\n  File \"/var/task/src/forecasting/forecasting_logging.py\", line 32, in wrapper_func\n    obj = func(*args)\n  File \"/var/task/src/forecasting/shape_tool/shape_tool_service.py\", line 520, in generate_multi_mpan_shape\n    writer.close()\n  File \"/opt/python/pandas/io/excel/_base.py\", line 898, in close\n    content = self.save()\n  File \"/opt/python/pandas/io/excel/_xlsxwriter.py\", line 198, in save\n    return self.book.close()\n  File \"/opt/python/xlsxwriter/workbook.py\", line 322, in close\n    raise FileCreateError(e)\nxlsxwriter.exceptions.FileCreateError: [Errno 28] No space left on device"

I'm not sure what is causing the issue here. The error seems to be saying that it is trying to save the file to disk but there is no space, but this is what I was trying to avoid by creating the file completely in memory

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 67a3d42
python : 3.8.2.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0-17763-Microsoft
Version : #1432-Microsoft Mon Aug 18 18:18:00 PST 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.4
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : 6.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

@roryburnham roryburnham added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 8, 2021
@twoertwein
Copy link
Member

Does xlsxwriter create a temporary file? You might want to try other engines to see whether the issue is specific to an engine.

@roryburnham
Copy link
Author

No - there's no changes to \tmp while the code runs. Have tried openpyxl but that has efficiency issues. It takes too long which causes the lambda function to time out and uses up all of the allocated memory. The file that is created for my test case shouldn't use up all the disk space as is - which makes me think there's something else going on.

@twoertwein
Copy link
Member

https://github.com/jmcnamara/XlsxWriter/blob/6c3ea23a410e8216eab8f5751e5544ffb444b3da/xlsxwriter/workbook.py#L697
can create temporary files. There is an in_memory attribute which can be set by calling Workbook(..., options={'in_memory': True}). I assume pandas would need an option to forward engine-specific arguments to support this.

@roryburnham
Copy link
Author

roryburnham commented Mar 8, 2021

Yes, this was the issue! I added {'in_memory': True} to my options dictionary and am no longer getting the error. Seems Pandas can already forward this option to xlsxwriter. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants