-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite Loop in DeflatePipelineFilter #318
Comments
Thanks for raising this issue. If you can provide a file to reproduce it that would be best, otherwise im not clear how I can investigate this. Do you have many files with this behaviour or just one? I'm pretty sure scoping the decoder to a single chunk is the correct approach. If this was not the case the ability to read slices (hyperslabs) from datasets would be broken. Maybe it's possible there is a chuck containing no data and maybe this case is handled incorrectly. Have you confirmed the dataset is actually readable? Can you try opening it using some other tool backed by the HDF5 libs (not jHDF), it's possible the file is corrupted? |
I will try to get you a copy of a file, but might be later this week.
|
I extracted a single DataSet from a larger file using HDFView, because Github wouldn't take the original file. |
Thanks a lot for the file. I have reproduced an issue, the abridged stack I get is
So I didn't get an infinite loop? Is this the same issue you saw? Either way though, it is a bug in jHDF. I successfully opened the file with HDFView. I will have a more in depth look soon. |
Sounds like a different issue than what I was seeing which was a zero read at DeflatePipeLineFilter line 50. |
Thanks I have reproduced the infinite loop issue now as well, so 2 issues to investigate.... |
Possible fix #321. It's a strange case, the datasets affected seem to have If you get chance to have a look at the fix that would be great. I still want to look at this a bit more and add some testing and maybe improve logging. |
It looks reasonable to me, but I'm not that familiar with the code. However, I think you should also check the return value from Inflater.inflate() because it's documented to return zero in certain cases. Maybe, just throw a HdfFilterException if it reads 0 bytes but Inflater.isFinished() == false? |
Incidentally, I don't understand why they would even use a shuffle filter. It's output from this commercial product: https://www.mscsoftware.com/product/msc-nastran |
I applied your patch to my fork, built, deployed, and ran against the few h5 files I currently have, and it now loads them successfully. It's also about twice as fast as the HDF5_Java library, so that's nice. |
#318 Add handling in shuffle filter for mismatched length
Fix for this has been merged. Will make a release soon to include this. Thanks. |
Released v0.6.4 https://github.com/jamesmudd/jhdf/releases/tag/v0.6.4 containing this fix. |
There is a problem in DeflatePipelineFilter.java:
while (! inflater.finished()) {
int read = inflater.inflate(buffer);
...
This loop will never exit in cases where read == 0. When I try to load a particular compressed HDF5 file, I quickly run into this case, and in a debugger I can see that inflater.needsInput() == true. Does this mean that compressedData is not self-contained, and we need bytes from the next block? Does this mean that the whole scheme is broken where Inflater is only scoped to decode a single Chunk? Or maybe it's just that it really is finished, but the native code never set finished=true?
I'm going to try simply exiting the loop when read==0 and inflater.needsInput() to see what affect that has, but I suspect it's just incorrect to assume that you can decompress a chunk at a time. It may be that the Inflater needs to be allocated once to be used for the whole DataSet so that it can resume decompression using leftovers from the previous call as it's processing Chunks.
Steps to reproduce the behaviour:
Please complete the following information:
The text was updated successfully, but these errors were encountered: