Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_excel with openpyxl engine doesn't close file after reading #29803

Open
konst-ivanov opened this issue Nov 22, 2019 · 11 comments
Open

read_excel with openpyxl engine doesn't close file after reading #29803

konst-ivanov opened this issue Nov 22, 2019 · 11 comments
Labels
Bug IO Excel read_excel, to_excel

Comments

@konst-ivanov
Copy link

konst-ivanov commented Nov 22, 2019

import pandas as pd

data = pd.read_excel('excel_file.xlsx', engine='openpyxl')

Problem description:

After running the above code in console, I switch to Excel and try to save 'excel_file.xlsx'. Excel raises an access error.

I tried to figure out if the file remains open using file descriptor. As output shows, it does:

import os

import pandas as pd

xl_file_stat = os.stat(os.path.relpath('excel_file.xlsx'))

try:
    os.fstat(3)
except OSError:
    print('File is not open')

print('Reading excel file...')
data = pd.read_excel('excel_file.xlsx', engine='openpyxl')
print('File has been read')

if os.fstat(3) == xl_file_stat:
    print('File is open')
Output:
File is not open
Reading excel file...
File has been read
File is open

The problem may arise from the openpyxl's class ExcelReader. Its' read() method closes the archive if read-only attribute is set to False (openpyxl/reader/excel.py, rows 281-282):

        if not self.read_only:
            self.archive.close()

I'm not sure if it's a normal behavior of ExcelReader, but I expect pd.read_excel() to close the file in this case.

pandas: 0.25.3
openpyxl: 3.0.0
@Rik-de-Kort
Copy link
Contributor

The class_OpenpyxlReader leaves the file handling of the file to openpyxl, it simply imports load_workbook and calls it.
Sounds like a thing to implement in Openpyxl.

@jbrockmendel jbrockmendel added the IO Excel read_excel, to_excel label Nov 30, 2019
@jbrockmendel
Copy link
Member

Is this fixed by #30096?

@DeoLeung
Copy link

the fix creates a new problem, if I pass in a file object, it closes it automatically...

maybe need to have a check on whether it's a file object when passed in read_excel

@roberthdevries
Copy link
Contributor

@DeoLeung See my PR #32544 for issue #31467

@nora-illanes
Copy link

Same here

@mroeschke mroeschke added the Bug label May 8, 2020
@roberthdevries
Copy link
Contributor

This cannot be reproduced anymore with the latest pandas master.
I changed the code to try to reproduce the error as follows:

import psutil
import pandas as pd

p = psutil.Process()
print(p.open_files())

print('Reading excel file...')
data = pd.read_excel('some_file.xlsx', engine='openpyxl')
print('File has been read')

print(p.open_files())

No left over open file handles are detected. The output is this and means no open file handles before and after the test:

[]
Reading excel file...
File has been read
[]

@johnalupton
Copy link

johnalupton commented Feb 18, 2021

I have this issue persisiting with following
numpy 1.19.2
openpyxl 3.0.5
pandas 1.1.3
Python 3.7.4

I am loading a dataframe into pandas using following:

import pandas as pd

df_factor_histories=pd.read_excel("./eco_factor/eco_factor_test_data_builder.xlsx",
                                  engine='openpyxl', sheet_name=0)

The dataframe loads just fine but the file is left open:

import psutil

p = psutil.Process()
print(p.open_files())

OUTPUT
[popenfile(path='C:\\Users\\xx\\.ipython\\profile_default\\history.sqlite', fd=-1), 
popenfile(path='C:\\Windows\\System32\\en-US\\KernelBase.dll.mui', fd=-1), 
popenfile(path='C:\\Windows\\System32\\en-US\\kernel32.dll.mui', fd=-1), 
popenfile(path='D:\\xxxxx\\data modelling\\eco_factor\\eco_factor_test_data_builder.xlsx', fd=-1)]

I am running Anaconda/Jupyter).

Relevant versions I am running:

numpy                         1.19.2
openpyxl                      3.0.5
pandas                        1.1.3
Python 3.7.4

I would appreciate some suggestions on how to close the files/best work around this, thanks

@JeffryErickson
Copy link

JeffryErickson commented Jan 12, 2022

I can still reliably duplicate this problem. I am on Windows using Pandas 1.3.5. Python version 3.10.0

In this code, the function get_excel() leaves the file locked while it is running (before you dismiss the messagebox). The function get_excel2 clears the lock on the file. I can use the get_excel2 code to fix the problem but it seems like fixing the root of the problem in Pandas read_excel would be a better solution.


import pandas as pd
import tkinter.messagebox as mbox
import io
import psutil


def get_excel():
    str_file = 'C:/.../TestPython.xlsx'
    worksheet_name = "Sheet1"
    goahead = False
    try:
        df_input_file = pd.read_excel(open(str_file, "rb"), sheet_name=worksheet_name, header=0, index_col=None)
        goahead = True
    except Exception:
        mbox.showinfo("Error encountered", 'opening spreadsheet')
    if goahead:
        return df_input_file
    else:
        return None


def get_excel2():
    str_file = 'C:/.../TestPython.xlsx'
    worksheet_name = "Sheet1"
    goahead = False
    # https://stackoverflow.com/questions/66260433/python-pandas-read-excel-engine-openpyxl-not-closing-file
    with open(str_file, "rb") as f:
        file_io_obj = io.BytesIO(f.read())
    df_input_file = pd.read_excel(file_io_obj, sheet_name="Sheet1")
    return df_input_file


def open_files_report():
    # https://github.com/pandas-dev/pandas/issues/29803
    p = psutil.Process()
    print('Open files:')
    print(p.open_files())


open_files_report()  # Are any files open?
df = get_excel()  # Leaves the file locked
# df = get_excel2()  # Unlocks the file
open_files_report()  # Are any files open?
if isinstance(df, bool):
    mbox.showinfo("Error reading Excel", "Error")
else:
    print(df)
mbox.showinfo('Test Excel closing', 'Is Excel file still open before dismissing this?')

@jbrockmendel
Copy link
Member

@JeffryErickson does your example depend on using tkinter?

@JeffryErickson
Copy link

@jbrockmendel No it doesn't depend on using tkinter. When I convert the messagebox lines to print statements and comment out tkinter, psutil.Process() reports the spreadsheet is open the second time it runs before it gets to a closing print() function.

@roberthdevries
Copy link
Contributor

@JeffryErickson In get_excel() you explicitly open the excel file, but you fail to close it. Hence your example code leaves the file locked. pd.read_excel() does not automatically close the file for you if you are the owner of the open file.
Instead always use the following construct:

with open(str_file, "rb") as f:
     df_input_file = pd.read_excel(f, sheet_name=worksheet_name, header=0, index_col=None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

9 participants