Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset maximum dimension sizes #283

Merged
merged 8 commits into from
Jul 9, 2021
Merged

Conversation

obermeier
Copy link
Contributor

Hi,

I did some experiments and realized that I get an Exception if the Dataset maximum dimension sizes is larger than max int.
Some of my files have much larger dimension sizes...

As fahr as I see it from the stack trace a long value is converted to an int.
For this reason I propose to change the max dimenson variable to long.

What do you think about this?
In this PR the API is still the same as before. Just one extension if somebody wants the value as long

Exception in thread "main" io.jhdf.exceptions.HdfException: Failed to load children of group '//' at address '96'
	at io.jhdf.GroupImpl.getChild(GroupImpl.java:265)
	at io.jhdf.GroupImpl.getByPath(GroupImpl.java:274)
	at io.jhdf.GroupImpl.getDatasetByPath(GroupImpl.java:294)
	at io.jhdf.HdfFile.getDatasetByPath(HdfFile.java:357)
	at com.seeburger.research.seamless.analytics.Hdf5LocalBatchAggregations.readServoData(Hdf5LocalBatchAggregations.java:124)
	at com.seeburger.research.seamless.analytics.hdf5.examples.ServoMain.main(ServoMain.java:19)
Caused by: io.jhdf.exceptions.HdfException: Failed to read object header at address: 77206
	at io.jhdf.ObjectHeader$ObjectHeaderV1.<init>(ObjectHeader.java:118)
	at io.jhdf.ObjectHeader$ObjectHeaderV1.<init>(ObjectHeader.java:78)
	at io.jhdf.ObjectHeader.readObjectHeader(ObjectHeader.java:355)
	at io.jhdf.GroupImpl$ChildrenLazyInitializer.createNode(GroupImpl.java:153)
	at io.jhdf.GroupImpl$ChildrenLazyInitializer.createOldStyleGroup(GroupImpl.java:131)
	at io.jhdf.GroupImpl$ChildrenLazyInitializer.initialize(GroupImpl.java:59)
	at io.jhdf.GroupImpl$ChildrenLazyInitializer.initialize(GroupImpl.java:44)
	at org.apache.commons.lang3.concurrent.LazyInitializer.get(LazyInitializer.java:106)
	at io.jhdf.GroupImpl.getChild(GroupImpl.java:262)
	... 5 more
Caused by: java.lang.ArithmeticException: integer overflow
	at java.base/java.lang.Math.toIntExact(Math.java:1071)
	at io.jhdf.Utils.readBytesAsUnsignedInt(Utils.java:129)
	at io.jhdf.object.message.DataSpace.<init>(DataSpace.java:63)
	at io.jhdf.object.message.DataSpace.readDataSpace(DataSpace.java:83)
	at io.jhdf.object.message.DataSpaceMessage.<init>(DataSpaceMessage.java:37)
	at io.jhdf.object.message.Message.readMessage(Message.java:91)
	at io.jhdf.object.message.Message.readObjectHeaderV1Message(Message.java:54)
	at io.jhdf.ObjectHeader$ObjectHeaderV1.readMessages(ObjectHeader.java:124)
	at io.jhdf.ObjectHeader$ObjectHeaderV1.readMessages(ObjectHeader.java:132)
	at io.jhdf.ObjectHeader$ObjectHeaderV1.<init>(ObjectHeader.java:113)
	... 13 more

@jamesmudd
Copy link
Owner

jamesmudd commented Jul 8, 2021

Thanks a lot for looking at jhdf and opening a PR. I think this change is along the right lines. I did originally have dimensions and max size as long[] I only really changed this as Java arrays can only be int indexed so having larger dimensions that that would currently stop jhdf opening the dataset anyway. But reconsidering this now I think it was wrong. It would be better to have maxSizes as long[] (and dimensions) then the getData method should throw if the dimensions are larger than int max. I this would fix your issue, and allow files with larger dimensions to be opened. Then larger datasets would be readable once slicing (hyperslabs) is implemented.

So to get your PR merged:

  • Could you add a test cases ideally a small file with large max size dataset
  • Just modify the api to return long[]

With this implemented can you open the dataset you want? Or are the actual dimensions also too large? Could you attach an example file?

@obermeier
Copy link
Contributor Author

I updated the Dataset API to long[] getMaxSize and added a test case.

Changing int[] getDimensions() to long[] leads to may requierd changes... I tried it and converted at some points but I was not sure if this is allway a good solution?
Is it OK to change currently just getMaxSize()?

@jamesmudd
Copy link
Owner

Yes think this sounds great. Your right changing getDimensions will be a little harder probably good for another change. Will take a better look soon and merge this. Thanks!

@jamesmudd
Copy link
Owner

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jul 9, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@jamesmudd jamesmudd merged commit 287566e into jamesmudd:master Jul 9, 2021
@jamesmudd
Copy link
Owner

Thanks a lot, I will work towards a release soon.

@obermeier
Copy link
Contributor Author

Yes think this sounds great. Your right changing getDimensions will be a little harder probably good for another change. Will take a better look soon and merge this. Thanks!

Thanks a lot, I will work towards a release soon.

Thank you for this quick reactions and this great project!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants