Dataset maximum dimension sizes #283

obermeier · 2021-07-08T08:52:14Z

Hi,

I did some experiments and realized that I get an Exception if the Dataset maximum dimension sizes is larger than max int.
Some of my files have much larger dimension sizes...

As fahr as I see it from the stack trace a long value is converted to an int.
For this reason I propose to change the max dimenson variable to long.

What do you think about this?
In this PR the API is still the same as before. Just one extension if somebody wants the value as long

Exception in thread "main" io.jhdf.exceptions.HdfException: Failed to load children of group '//' at address '96'
	at io.jhdf.GroupImpl.getChild(GroupImpl.java:265)
	at io.jhdf.GroupImpl.getByPath(GroupImpl.java:274)
	at io.jhdf.GroupImpl.getDatasetByPath(GroupImpl.java:294)
	at io.jhdf.HdfFile.getDatasetByPath(HdfFile.java:357)
	at com.seeburger.research.seamless.analytics.Hdf5LocalBatchAggregations.readServoData(Hdf5LocalBatchAggregations.java:124)
	at com.seeburger.research.seamless.analytics.hdf5.examples.ServoMain.main(ServoMain.java:19)
Caused by: io.jhdf.exceptions.HdfException: Failed to read object header at address: 77206
	at io.jhdf.ObjectHeader$ObjectHeaderV1.<init>(ObjectHeader.java:118)
	at io.jhdf.ObjectHeader$ObjectHeaderV1.<init>(ObjectHeader.java:78)
	at io.jhdf.ObjectHeader.readObjectHeader(ObjectHeader.java:355)
	at io.jhdf.GroupImpl$ChildrenLazyInitializer.createNode(GroupImpl.java:153)
	at io.jhdf.GroupImpl$ChildrenLazyInitializer.createOldStyleGroup(GroupImpl.java:131)
	at io.jhdf.GroupImpl$ChildrenLazyInitializer.initialize(GroupImpl.java:59)
	at io.jhdf.GroupImpl$ChildrenLazyInitializer.initialize(GroupImpl.java:44)
	at org.apache.commons.lang3.concurrent.LazyInitializer.get(LazyInitializer.java:106)
	at io.jhdf.GroupImpl.getChild(GroupImpl.java:262)
	... 5 more
Caused by: java.lang.ArithmeticException: integer overflow
	at java.base/java.lang.Math.toIntExact(Math.java:1071)
	at io.jhdf.Utils.readBytesAsUnsignedInt(Utils.java:129)
	at io.jhdf.object.message.DataSpace.<init>(DataSpace.java:63)
	at io.jhdf.object.message.DataSpace.readDataSpace(DataSpace.java:83)
	at io.jhdf.object.message.DataSpaceMessage.<init>(DataSpaceMessage.java:37)
	at io.jhdf.object.message.Message.readMessage(Message.java:91)
	at io.jhdf.object.message.Message.readObjectHeaderV1Message(Message.java:54)
	at io.jhdf.ObjectHeader$ObjectHeaderV1.readMessages(ObjectHeader.java:124)
	at io.jhdf.ObjectHeader$ObjectHeaderV1.readMessages(ObjectHeader.java:132)
	at io.jhdf.ObjectHeader$ObjectHeaderV1.<init>(ObjectHeader.java:113)
	... 13 more

jamesmudd · 2021-07-08T19:30:28Z

Thanks a lot for looking at jhdf and opening a PR. I think this change is along the right lines. I did originally have dimensions and max size as long[] I only really changed this as Java arrays can only be int indexed so having larger dimensions that that would currently stop jhdf opening the dataset anyway. But reconsidering this now I think it was wrong. It would be better to have maxSizes as long[] (and dimensions) then the getData method should throw if the dimensions are larger than int max. I this would fix your issue, and allow files with larger dimensions to be opened. Then larger datasets would be readable once slicing (hyperslabs) is implemented.

So to get your PR merged:

Could you add a test cases ideally a small file with large max size dataset
Just modify the api to return long[]

With this implemented can you open the dataset you want? Or are the actual dimensions also too large? Could you attach an example file?

obermeier · 2021-07-09T13:59:25Z

I updated the Dataset API to long[] getMaxSize and added a test case.

Changing int[] getDimensions() to long[] leads to may requierd changes... I tried it and converted at some points but I was not sure if this is allway a good solution?
Is it OK to change currently just getMaxSize()?

jamesmudd · 2021-07-09T14:49:57Z

Yes think this sounds great. Your right changing getDimensions will be a little harder probably good for another change. Will take a better look soon and merge this. Thanks!

jamesmudd · 2021-07-09T19:47:43Z

/AzurePipelines run

azure-pipelines · 2021-07-09T19:47:54Z

Azure Pipelines successfully started running 1 pipeline(s).

sonarqubecloud · 2021-07-09T19:51:08Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

100.0% Coverage
0.0% Duplication

jamesmudd · 2021-07-09T20:00:09Z

Thanks a lot, I will work towards a release soon.

obermeier · 2021-07-11T08:40:52Z

Yes think this sounds great. Your right changing getDimensions will be a little harder probably good for another change. Will take a better look soon and merge this. Thanks!

Thanks a lot, I will work towards a release soon.

Thank you for this quick reactions and this great project!!

Change max size type to long

172d0b5

obermeier added 7 commits July 9, 2021 13:29

Update API and add test

aa1b25a

Fix JavaDoc

abf9a74

Romove dead variable

5a759b4

Fix API retrun type

1fdaa9c

Fix tests

4bcb0f0

Remove broken test file

89345cb

Fix hdf5 file

19d47db

jamesmudd approved these changes Jul 9, 2021

View reviewed changes

jamesmudd merged commit 287566e into jamesmudd:master Jul 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset maximum dimension sizes #283

Dataset maximum dimension sizes #283

obermeier commented Jul 8, 2021

jamesmudd commented Jul 8, 2021 •

edited

Loading

obermeier commented Jul 9, 2021

jamesmudd commented Jul 9, 2021

jamesmudd commented Jul 9, 2021

azure-pipelines bot commented Jul 9, 2021

sonarqubecloud bot commented Jul 9, 2021

jamesmudd commented Jul 9, 2021

obermeier commented Jul 11, 2021

Dataset maximum dimension sizes #283

Dataset maximum dimension sizes #283

Conversation

obermeier commented Jul 8, 2021

jamesmudd commented Jul 8, 2021 • edited Loading

obermeier commented Jul 9, 2021

jamesmudd commented Jul 9, 2021

jamesmudd commented Jul 9, 2021

azure-pipelines bot commented Jul 9, 2021

sonarqubecloud bot commented Jul 9, 2021

jamesmudd commented Jul 9, 2021

obermeier commented Jul 11, 2021

jamesmudd commented Jul 8, 2021 •

edited

Loading