-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_excel prevent from further save of the Excel file while python session is running #2465
Comments
Hi @GentilsTo, thanks for posting! We should make sure that Modin is not hanging on to any open file handles just in case, but I cannot see any cases of this at a glance in the code. It is a perfectly reasonable workflow to do this, and is safe as long as you are not saving a new copy and loading at the same time. Out of curiosity, what kinds of changes do you make to the excel files? We are prototyping an excel interface that shares the execution, and I'm interested to understand your workflow because it seems like it would fit this interface quite well. |
Scratch that, it only applies to single-threaded reading. We're having issues with parallel read here. |
Self-contained reproducer: import pandas
import modin.pandas as pd
from pandas.util._test_decorators import check_file_leaks
import os
@check_file_leaks
def reproduce(name):
return pd.read_excel(name)
def main():
df = pd.DataFrame({'a': [1,2], 'b': [3,4]})
name = '2465-test.xlsx'
df.to_excel(name)
try:
df2 = reproduce(name)
assert df2.equals(df)
finally:
os.unlink(name)
if __name__ == '__main__':
main() Currently fails like this:
|
After digging further I believe this is a bug / design decision (as implied by SO post) of |
…eak files Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>
… if needed Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>
Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>
… psutil limitation Signed-off-by: Vasilij Litvinov <vasilij.n.litvinov@intel.com>
System information
modin.__version__
): 0.8.2An really simple example file to reproduce:
test.xlsx
Then open the Excel file (if not already open), and try to save it.
Describe the problem
Contrary to behavior in Pandas, using pd.read_excel with modin, prevent any further saving of the Excel files afterwards, until the python session is closed.
While trying to do it in Excel, get multiple error pop-up talking about "sharing issue", given some random number as Excel file then.
For my use case, it's quite handy to get the Excel file and the python session open in the same time, loading the data, working on it with pandas, and potentially (especially during developpement) making small change in Excel and reloading them right away in python. Also, that doesn't seem like a safe behavior and might hide something bigger ? don't know.
Source code / logs
(sorry it's in french)
The text was updated successfully, but these errors were encountered: