Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite Loop in DeflatePipelineFilter #318

Closed
justin-michel-boeing opened this issue Oct 30, 2021 · 12 comments
Closed

Infinite Loop in DeflatePipelineFilter #318

justin-michel-boeing opened this issue Oct 30, 2021 · 12 comments
Labels
bug Something isn't working

Comments

@justin-michel-boeing
Copy link

There is a problem in DeflatePipelineFilter.java:
while (! inflater.finished()) {
int read = inflater.inflate(buffer);
...

This loop will never exit in cases where read == 0. When I try to load a particular compressed HDF5 file, I quickly run into this case, and in a debugger I can see that inflater.needsInput() == true. Does this mean that compressedData is not self-contained, and we need bytes from the next block? Does this mean that the whole scheme is broken where Inflater is only scoped to decode a single Chunk? Or maybe it's just that it really is finished, but the native code never set finished=true?

I'm going to try simply exiting the loop when read==0 and inflater.needsInput() to see what affect that has, but I suspect it's just incorrect to assume that you can decompress a chunk at a time. It may be that the Inflater needs to be allocated once to be used for the whole DataSet so that it can resume decompression using leftovers from the previous call as it's processing Chunks.

Steps to reproduce the behaviour:

  • This is problematic. I'm not sure I can find a file that shows that problem that I'm allowed to share. I do have one file that consistently fails, but it's 38MB compressed. I'll verify next week whether I can share it, or maybe we'll find a smaller more generic sample that shows the issue.

Please complete the following information:

  • jhdf 0.6.3
  • AdoptOpenJDK 8u265
  • Windows
@jamesmudd
Copy link
Owner

Thanks for raising this issue. If you can provide a file to reproduce it that would be best, otherwise im not clear how I can investigate this. Do you have many files with this behaviour or just one?

I'm pretty sure scoping the decoder to a single chunk is the correct approach. If this was not the case the ability to read slices (hyperslabs) from datasets would be broken. Maybe it's possible there is a chuck containing no data and maybe this case is handled incorrectly.

Have you confirmed the dataset is actually readable? Can you try opening it using some other tool backed by the HDF5 libs (not jHDF), it's possible the file is corrupted?

@justin-michel-boeing
Copy link
Author

I will try to get you a copy of a file, but might be later this week.
In the meantime I have a little more info.

  1. I am able to open the Dataset with HDFView.
  2. The chunk in question was not empty. It read the first part of it, then Inflater.needsInput()==true while bytes read == 0.

@justin-michel-boeing
Copy link
Author

I extracted a single DataSet from a larger file using HDFView, because Github wouldn't take the original file.
Hope this helps
tube_cquad4.zip

@jamesmudd
Copy link
Owner

Thanks a lot for the file. I have reproduced an issue, the abridged stack I get is

io.jhdf.exceptions.HdfFilterException: Inflating failed

	at io.jhdf.filter.DeflatePipelineFilter.decode(DeflatePipelineFilter.java:66)
	at io.jhdf.filter.FilterPipeline$PipelineFilterWithData.decode(FilterPipeline.java:35)
	at io.jhdf.filter.FilterPipeline$PipelineFilterWithData.access$100(FilterPipeline.java:24)
	at io.jhdf.filter.FilterPipeline.decode(FilterPipeline.java:59)
	at io.jhdf.dataset.chunked.ChunkedDatasetBase.decompressChunk(ChunkedDatasetBase.java:243)
Caused by: java.util.zip.DataFormatException: incorrect data check
	at java.base/java.util.zip.Inflater.inflateBytesBytes(Native Method)
	at java.base/java.util.zip.Inflater.inflate(Inflater.java:378)
	at java.base/java.util.zip.Inflater.inflate(Inflater.java:464)
	at io.jhdf.filter.DeflatePipelineFilter.decode(DeflatePipelineFilter.java:50)
	... 96 more

So I didn't get an infinite loop? Is this the same issue you saw? Either way though, it is a bug in jHDF. I successfully opened the file with HDFView.

I will have a more in depth look soon.

@justin-michel-boeing
Copy link
Author

Sounds like a different issue than what I was seeing which was a zero read at DeflatePipeLineFilter line 50.
But sounds like it's some help in finding AN issue, even it it's a different one.
I pushed the original problem file to https://github.com/justin-michel-boeing/jhdf_test_files.
Hopefully that will cause the infinite loop.

@jamesmudd
Copy link
Owner

Thanks I have reproduced the infinite loop issue now as well, so 2 issues to investigate....

@jamesmudd
Copy link
Owner

Possible fix #321. It's a strange case, the datasets affected seem to have shuffle filter applied after deflate. This doesn't make sense as the idea of shuffle to reorder the data to improve the compression. Anyway it seems like somehow this is possible in the HDF libs so had added handling in the shuffle filter for this case.

If you get chance to have a look at the fix that would be great. I still want to look at this a bit more and add some testing and maybe improve logging.

@justin-michel-boeing
Copy link
Author

It looks reasonable to me, but I'm not that familiar with the code. However, I think you should also check the return value from Inflater.inflate() because it's documented to return zero in certain cases. Maybe, just throw a HdfFilterException if it reads 0 bytes but Inflater.isFinished() == false?

@justin-michel-boeing
Copy link
Author

Incidentally, I don't understand why they would even use a shuffle filter. It's output from this commercial product: https://www.mscsoftware.com/product/msc-nastran

@justin-michel-boeing
Copy link
Author

I applied your patch to my fork, built, deployed, and ran against the few h5 files I currently have, and it now loads them successfully. It's also about twice as fast as the HDF5_Java library, so that's nice.

@jamesmudd jamesmudd added the bug Something isn't working label Nov 9, 2021
jamesmudd added a commit that referenced this issue Nov 11, 2021
#318 Add handling in shuffle filter for mismatched length
@jamesmudd
Copy link
Owner

Fix for this has been merged. Will make a release soon to include this. Thanks.

@jamesmudd
Copy link
Owner

Released v0.6.4 https://github.com/jamesmudd/jhdf/releases/tag/v0.6.4 containing this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants