-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: check for memory errors on reading xlsx files and proper formatting of csv sample code #432
Conversation
Check the Vec size against the memory allocated to the machine.
Catch the error from the range variable without a panic fail. Change the writing to write a proper csv with a comma delimiter and quoting.
Could you add a test showing how this can fail? |
tafia, |
tests are failing. |
May be we should check cells.len() == isize::MAX in loop, when add cell to |
First, I should have opened this by saying I think calamine is a great piece of work, and I'm impressed with its performance. |
@SgtPepper47, sorry, cells.len() == usize::MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand why you would reserve some capacity on cells
. If anything you actually want to shrink_to_fit
. The capacity and the checks should be reserved in the from_sparse
function instead.
An alternative would also be to enhance the Range
to be an enum, either sparse
or dense
. Basically there should be some checks to determine when using a sparse representation is better (like here, very large dimension but very few cells).
This is of course much more work but it would be better.
@@ -823,6 +824,13 @@ impl<RS: Read + Seek> Xlsx<RS> { | |||
Err(e) => return Err(e), | |||
} | |||
} | |||
/// Chcek for machine memory error | |||
if cells.capacity() < usize_len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
I think this PR makes things worse as it allocates a potentially very large |
Closing it for now, feel free to reopen and come up with an actual improvement. |
Updates to the Excel range creation to check the machine memory against the memory required for the range.
Updates to the sample Excel to csv writer to identify the memory error and to write out a properly formatted csv file with commas instead of semicolons.
Resolves #433