Infinite Loop in DeflatePipelineFilter #318

justin-michel-boeing · 2021-10-30T16:55:46Z

There is a problem in DeflatePipelineFilter.java:
while (! inflater.finished()) {
int read = inflater.inflate(buffer);
...

This loop will never exit in cases where read == 0. When I try to load a particular compressed HDF5 file, I quickly run into this case, and in a debugger I can see that inflater.needsInput() == true. Does this mean that compressedData is not self-contained, and we need bytes from the next block? Does this mean that the whole scheme is broken where Inflater is only scoped to decode a single Chunk? Or maybe it's just that it really is finished, but the native code never set finished=true?

I'm going to try simply exiting the loop when read==0 and inflater.needsInput() to see what affect that has, but I suspect it's just incorrect to assume that you can decompress a chunk at a time. It may be that the Inflater needs to be allocated once to be used for the whole DataSet so that it can resume decompression using leftovers from the previous call as it's processing Chunks.

Steps to reproduce the behaviour:

This is problematic. I'm not sure I can find a file that shows that problem that I'm allowed to share. I do have one file that consistently fails, but it's 38MB compressed. I'll verify next week whether I can share it, or maybe we'll find a smaller more generic sample that shows the issue.

Please complete the following information:

jhdf 0.6.3
AdoptOpenJDK 8u265
Windows

jamesmudd · 2021-10-31T13:31:42Z

Thanks for raising this issue. If you can provide a file to reproduce it that would be best, otherwise im not clear how I can investigate this. Do you have many files with this behaviour or just one?

I'm pretty sure scoping the decoder to a single chunk is the correct approach. If this was not the case the ability to read slices (hyperslabs) from datasets would be broken. Maybe it's possible there is a chuck containing no data and maybe this case is handled incorrectly.

Have you confirmed the dataset is actually readable? Can you try opening it using some other tool backed by the HDF5 libs (not jHDF), it's possible the file is corrupted?

justin-michel-boeing · 2021-11-01T19:43:13Z

I will try to get you a copy of a file, but might be later this week.
In the meantime I have a little more info.

I am able to open the Dataset with HDFView.
The chunk in question was not empty. It read the first part of it, then Inflater.needsInput()==true while bytes read == 0.

justin-michel-boeing · 2021-11-06T12:50:04Z

I extracted a single DataSet from a larger file using HDFView, because Github wouldn't take the original file.
Hope this helps
tube_cquad4.zip

jamesmudd · 2021-11-07T18:40:11Z

Thanks a lot for the file. I have reproduced an issue, the abridged stack I get is

io.jhdf.exceptions.HdfFilterException: Inflating failed

	at io.jhdf.filter.DeflatePipelineFilter.decode(DeflatePipelineFilter.java:66)
	at io.jhdf.filter.FilterPipeline$PipelineFilterWithData.decode(FilterPipeline.java:35)
	at io.jhdf.filter.FilterPipeline$PipelineFilterWithData.access$100(FilterPipeline.java:24)
	at io.jhdf.filter.FilterPipeline.decode(FilterPipeline.java:59)
	at io.jhdf.dataset.chunked.ChunkedDatasetBase.decompressChunk(ChunkedDatasetBase.java:243)
Caused by: java.util.zip.DataFormatException: incorrect data check
	at java.base/java.util.zip.Inflater.inflateBytesBytes(Native Method)
	at java.base/java.util.zip.Inflater.inflate(Inflater.java:378)
	at java.base/java.util.zip.Inflater.inflate(Inflater.java:464)
	at io.jhdf.filter.DeflatePipelineFilter.decode(DeflatePipelineFilter.java:50)
	... 96 more

So I didn't get an infinite loop? Is this the same issue you saw? Either way though, it is a bug in jHDF. I successfully opened the file with HDFView.

I will have a more in depth look soon.

justin-michel-boeing · 2021-11-08T12:30:23Z

Sounds like a different issue than what I was seeing which was a zero read at DeflatePipeLineFilter line 50.
But sounds like it's some help in finding AN issue, even it it's a different one.
I pushed the original problem file to https://github.com/justin-michel-boeing/jhdf_test_files.
Hopefully that will cause the infinite loop.

jamesmudd · 2021-11-08T19:39:44Z

Thanks I have reproduced the infinite loop issue now as well, so 2 issues to investigate....

jamesmudd · 2021-11-08T21:13:29Z

Possible fix #321. It's a strange case, the datasets affected seem to have shuffle filter applied after deflate. This doesn't make sense as the idea of shuffle to reorder the data to improve the compression. Anyway it seems like somehow this is possible in the HDF libs so had added handling in the shuffle filter for this case.

If you get chance to have a look at the fix that would be great. I still want to look at this a bit more and add some testing and maybe improve logging.

justin-michel-boeing · 2021-11-09T12:00:50Z

It looks reasonable to me, but I'm not that familiar with the code. However, I think you should also check the return value from Inflater.inflate() because it's documented to return zero in certain cases. Maybe, just throw a HdfFilterException if it reads 0 bytes but Inflater.isFinished() == false?

justin-michel-boeing · 2021-11-09T12:06:32Z

Incidentally, I don't understand why they would even use a shuffle filter. It's output from this commercial product: https://www.mscsoftware.com/product/msc-nastran

justin-michel-boeing · 2021-11-09T15:59:26Z

I applied your patch to my fork, built, deployed, and ran against the few h5 files I currently have, and it now loads them successfully. It's also about twice as fast as the HDF5_Java library, so that's nice.

#318 Add handling in shuffle filter for mismatched length

jamesmudd · 2021-11-11T14:02:33Z

Fix for this has been merged. Will make a release soon to include this. Thanks.

jamesmudd · 2021-11-11T19:42:31Z

Released v0.6.4 https://github.com/jamesmudd/jhdf/releases/tag/v0.6.4 containing this fix.

jamesmudd added the bug Something isn't working label Nov 9, 2021

jamesmudd added a commit that referenced this issue Nov 11, 2021

Merge pull request #321 from jamesmudd/issue-318

c2e2453

#318 Add handling in shuffle filter for mismatched length

jamesmudd closed this as completed Nov 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite Loop in DeflatePipelineFilter #318

Infinite Loop in DeflatePipelineFilter #318

justin-michel-boeing commented Oct 30, 2021

jamesmudd commented Oct 31, 2021

justin-michel-boeing commented Nov 1, 2021

justin-michel-boeing commented Nov 6, 2021

jamesmudd commented Nov 7, 2021

justin-michel-boeing commented Nov 8, 2021

jamesmudd commented Nov 8, 2021

jamesmudd commented Nov 8, 2021

justin-michel-boeing commented Nov 9, 2021

justin-michel-boeing commented Nov 9, 2021

justin-michel-boeing commented Nov 9, 2021

jamesmudd commented Nov 11, 2021

jamesmudd commented Nov 11, 2021

Infinite Loop in DeflatePipelineFilter #318

Infinite Loop in DeflatePipelineFilter #318

Comments

justin-michel-boeing commented Oct 30, 2021

jamesmudd commented Oct 31, 2021

justin-michel-boeing commented Nov 1, 2021

justin-michel-boeing commented Nov 6, 2021

jamesmudd commented Nov 7, 2021

justin-michel-boeing commented Nov 8, 2021

jamesmudd commented Nov 8, 2021

jamesmudd commented Nov 8, 2021

justin-michel-boeing commented Nov 9, 2021

justin-michel-boeing commented Nov 9, 2021

justin-michel-boeing commented Nov 9, 2021

jamesmudd commented Nov 11, 2021

jamesmudd commented Nov 11, 2021