Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails for groups with order tracking #46

Closed
woutdenolf opened this issue Feb 15, 2021 · 5 comments · Fixed by #47
Closed

Fails for groups with order tracking #46

woutdenolf opened this issue Feb 15, 2021 · 5 comments · Fixed by #47

Comments

@woutdenolf
Copy link
Collaborator

h5py: 2.10.0, h5py: 1.10.4

filename = "test.h5"
n = 8  # PASSES
n = 9  # FAILS
with h5py.File(filename, mode="w", track_order=True) as f:
    for i in range(n):
        f.create_group(str(i))
with pyfive.File(filename) as f:
    assert len(f.keys()) == n

Perhaps related h5py issue: h5py/h5py#1385

@woutdenolf
Copy link
Collaborator Author

woutdenolf commented Feb 15, 2021

I have no idea how h5py gets hold of the top-level groups when more than 8 groups are added with tracking enabled for the root node. The v2 B-TREE addresses in LINK_INFO are 0xffffffffffffff so there are no links in there either. GROUP_INFO doesn't have links and then there is the NIL. No other dataobjects messages are present. Very weird.

Root dataobjects (9 top-level groups)

Superblock version: 0
Root dataobjects:
{'access_time': 1613422645,
 'birth_time': 1613422645,
 'change_time': 1613422645,
 'flags': 44,
 'modification_time': 1613422645,
 'signature': b'OHDR',
 'size_of_chunk_0': 224,
 'version': 2}
 LINK_INFO (flags:0):
 GROUP_INFO (flags:1):
   b'\x00\x00'
 NIL (flags:0):
   b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
ROOT LINKS: {}

Root dataobjects (8 top-level groups)

Superblock version: 0
Root dataobjects:
{'access_time': 1613422597,
 'birth_time': 1613422597,
 'change_time': 1613422597,
 'flags': 44,
 'modification_time': 1613422597,
 'signature': b'OHDR',
 'size_of_chunk_0': 224,
 'version': 2}
 LINK_INFO (flags:0):
 GROUP_INFO (flags:1):
   b'\x00\x00'
 OBJECT_CONTINUATION (flags:0):
   b'\x9b\x14\x00\x00\x00\x00\x00\x00<\x00\x00\x00\x00\x00\x00\x00'
 LINK (flags:0):
   b'\x01\x04\x01\x00\x00\x00\x00\x00\x00\x00\x011\x1b\x04\x00\x00\x00\x00\x00\x00'
 LINK (flags:0):
   b'\x01\x04\x02\x00\x00\x00\x00\x00\x00\x00\x012\xdb\x06\x00\x00\x00\x00\x00\x00'
 LINK (flags:0):
   b'\x01\x04\x03\x00\x00\x00\x00\x00\x00\x00\x013\x9b\t\x00\x00\x00\x00\x00\x00'
 LINK (flags:0):
   b'\x01\x04\x04\x00\x00\x00\x00\x00\x00\x00\x014[\x0c\x00\x00\x00\x00\x00\x00'
 LINK (flags:0):
   b'\x01\x04\x05\x00\x00\x00\x00\x00\x00\x00\x015\x1b\x0f\x00\x00\x00\x00\x00\x00'
 OBJECT_CONTINUATION (flags:0):
   b'\x97\x17\x00\x00\x00\x00\x00\x00"\x00\x00\x00\x00\x00\x00\x00'
 LINK (flags:0):
   b'\x01\x04\x00\x00\x00\x00\x00\x00\x00\x00\x010[\x01\x00\x00\x00\x00\x00\x00'
 LINK (flags:0):
   b'\x01\x04\x06\x00\x00\x00\x00\x00\x00\x00\x016\xdb\x11\x00\x00\x00\x00\x00\x00'
 LINK (flags:0):
   b'\x01\x04\x07\x00\x00\x00\x00\x00\x00\x00\x017\xd7\x14\x00\x00\x00\x00\x00\x00'
ROOT LINKS: {'1': 1051, '2': 1755, '3': 2459, '4': 3163, '5': 3867, '0': 347, '6': 4571, '7': 5335}

@jjhelmus
Copy link
Owner

HDF5 looks to be using "new style" groups when track_order=True and there are more than 8 groups. Information on these groups looks to be stored in Link Info (type 0x002) and Group Info (type 0x000A, decimal 10) messages which pyfive does not read but can see:

from pprint import pprint
import h5py
import pyfive

filename = "test.h5"
n = 9
with h5py.File(filename, mode="w", track_order=True) as f:
    for i in range(n):
        f.create_group(str(i))

with pyfive.File(filename) as f:
    pprint(f._dataobjects.msgs)
[OrderedDict([('type', 2),
              ('size', 34),
              ('flags', 0),
              ('offset_to_message', 6)]),
 OrderedDict([('type', 10),
              ('size', 2),
              ('flags', 1),
              ('offset_to_message', 46)]),
 OrderedDict([('type', 0),
              ('size', 170),
              ('flags', 0),
              ('offset_to_message', 54)])]

@woutdenolf
Copy link
Collaborator Author

Yes I thought so too so I tried reading the Link info message data but the heap and btree addresses are 0xffffffffffffff which means they are not used. I'll reinvestigate, maybe I missed something.

@jjhelmus
Copy link
Owner

jjhelmus commented Feb 22, 2021

I did some more exploring of a test file with nine groups in the root directory with order tracking enabled. It seems as if the HDF5 library stores these as "new style" groups with the Link Info Messages stored "densely". The Link Info message contains the address of a v2 BTree and Fractal Heap. The v2 BTree stores the HashID which can be used to locate the Link Info messages in the Fractal Heap. These Link Info messages contain the group names and offsets to their Data Objects.

I hope to have some time in the next week or so to add preliminary support for reading these densely packed new style groups to pyfive.

@woutdenolf
Copy link
Collaborator Author

I will try to implement the v2 BTree (at least the node types that I need).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants