Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HDF5 in tensorflow-io #174

Closed
yongtang opened this issue Apr 4, 2019 · 39 comments · Fixed by #681, #704, #708 or #709
Closed

Support HDF5 in tensorflow-io #174

yongtang opened this issue Apr 4, 2019 · 39 comments · Fixed by #681, #704, #708 or #709
Labels
feature New feature request

Comments

@yongtang
Copy link
Member

yongtang commented Apr 4, 2019

The follow is from tensorflow/tensorflow#27510

I am currently using HDF5 files (.h5 or .hdf5) to store my data, which is a data type frequently used in scientific research. (See #2089 for a similar but different request which makes the case for a HDF5 interface nicely). It is very convenient and widely used. MATLAB for example uses HDF5 files for large files. Indeed, it is much more convenient to use than Tensorflow's TFRecord format.

However, in Tensorflow, there is no native support for HDF5 files in the tf.data.Dataset API, which is supposed to be the new API for all data loading. Currently, I am using tf.py_funtion to load my data for the simple reason that tf Dataset is always in graph mode and hence cannot give out the values of the files that I want it to read.

Moreover, I have found that reading an HDF5 file in this way DRAMATICALLY slows down data I/O for unknown reasons. When I used the tf.keras.utils.Sequence API to read HDF5 files without the supposed optimizations that tensorflow is making, an operation that previously took hours now took just a few seconds. (However, I suspect that using tf.defun somehow got tangled up with this. I am not sure why but when I removed some lines, the code sped up, but was still much slower than even a single threaded for loop)

Therefore, I would like to propose creating a new API in tf Dataset for HDF5 files. It could be called HDF5Dataset, similar to TFRecordDataset or CSVDataset.

Moreover, this would allow Tensorflow to make I/O optimizations for reading using the C++ API for HDF5 instead of h5py, which has many limitations and factors that newbie users might not be familiar with.

For example, most builds of h5py cannot do multiprocessing. Also, most people do not know how to chunk their data slices, though this can make a 5-fold difference in read/write speed.

I believe that adding this API would make Tensorflow much friendlier to scientific calculation.

Will this change the current api? How?
This would add a new Dataset type in tf.data.Dataset, or a new method/function for making a dataset from an HDF5 file of an arbitrary format. This may require some low-level integration with the HDF5 format.

Who will benefit with this feature?
People in medical imaging, video datasets, astronomy, or any other type of very large dataset, which is often stored in HDF5. See their website for information on its utility. Also people who don't want to go through the difficult process of making TFRecord files.

Any Other info.
Perhaps integrating aspects of h5py will make the process easier.

@yongtang
Copy link
Member Author

yongtang commented Apr 4, 2019

/cc @veritas9872

@yongtang yongtang added the enhancement Enhancement request label Apr 4, 2019
@captain-pool
Copy link

captain-pool commented May 2, 2019

Hey @yongtang, @veritas9872 is this issue still open? Can I work on this?

@yongtang
Copy link
Member Author

yongtang commented May 2, 2019

@captain-pool Definitely! Let me know if you need any help. 👍

@captain-pool
Copy link

@captain-pool Definitely! Let me know if you need any help. +1

Thanks @yongtang. Can you point me some resources to write bazel BUILD files? I don't have much idea about writing BUILD Files.

@yongtang
Copy link
Member Author

yongtang commented May 2, 2019

@captain-pool Bazel has a pretty steep learning curve... The Dataset implementation pattern is also not exactly very straightforward. I have been trying to simplify the pattern for adding new ops for tensorflow-io. At the moment I think TextDataset is the easily pattern to follow. The BUILD file is also easier to understand.

In TextDataset, you can take a look at only one function ReadRecord:
https://github.com/tensorflow/io/blob/master/tensorflow_io/text/kernels/text_input.cc#L24

That is pretty much all you need to implement if you want to add a new Dataset op.

The BUILD file for TextDataset is also simple enough I think.

@captain-pool
Copy link

captain-pool commented May 3, 2019

Thanks for your response @yongtang , I've been looking through this.
https://www.tensorflow.org/guide/extend/formats
However, I've been wondering, how to include the HDF5 Library in the BUILD File. For HDF5, we need to download the Library, how can I include this external library in Bazel Build?

I'm downloading the Source code of HDF5 and building it.
https://www.hdfgroup.org/downloads/hdf5/source-code/
And I'm using the examples as reference.

@captain-pool
Copy link

Hey @yongtang , I did some digging, and I found these bazel BUILD files for HDF5 are being used by,

https://github.com/tensorflow/models/blob/58deb0599f10dc5b33570103339fb7fa5bb876c3/research/vid2depth/third_party/hdf5.BUILD#L1

I think reusing codes from these BUILD Files and the WORKSPACE File, would be good enough for this job.

@yongtang
Copy link
Member Author

yongtang commented May 4, 2019

@captain-pool Yes let's just reuse the one if already exists. I think cc_library(name = "hdf5", and its dependency might be all we need to link with hdf5.

@veritas9872
Copy link

@captain-pool @yongtang I found some documentation from PyTables, which also uses the HDF5 library and has its own set of optimizations.
I hope that this might help.

@yongtang
Copy link
Member Author

@veritas9872 With PR #236 merged in, the HDF5 support is almost done (partially supported with a couple of data types). I will take a look and see if #217 or a new PR could get the HDF5 support done.

@veritas9872
Copy link

@yongtang Hello. I was curious to know whether the current implementation of HDF5 for TF is compatible with common features of HDF5, such as compression filters, checksums, and chunking.

I am also curious about whether multi-processing or multi-threading would be implemented.

For example, I have found that chunking data for each slice makes reading data x5 faster as unnecessary data is not being read in.

I am not familiar with how #236 or #217 has been implemented and I was wondering whether the implementation was ironing out the complexities and optimizing HDF5 for people unfamiliar with the file system.

@yongtang yongtang added feature New feature request and removed enhancement Enhancement request labels Jun 2, 2019
@yongtang
Copy link
Member Author

yongtang commented Jun 2, 2019

@veritas9872 I added another PR #266 which fixes a few issues.

The HDF5 in TF-IO is implemented through tf.data pipeline, so the biggest advantage is that you could really provide data to tf.keras for training and inference purposes easily (with a few map operations). The tf.data itself also support distribution strategy so many features are available already.

The HDF5 implementation is based on HDF5's C++ library so I would assume checksums and chunking should be in place already.

Having said that, the HDF5 format itself is a big scope and some of the features and data types are not really compatible with TensorFlow's tensor types.

If you have a few sample files you work on, then it might be easier for us to check those files and makes sure they are compatible with the implementation in tensorflow.

@alexwal
Copy link

alexwal commented Jun 2, 2019

Here's an h5 file (90 MB) containing a sequence of MRI images (spatial and frequency domain) with a couple different data types:

https://drive.google.com/file/d/1OBsTnmS2KX3GcJumRD3w0Yj_DHh_CbnG/view?usp=sharing

@yongtang
Copy link
Member Author

yongtang commented Jun 3, 2019

Thanks @alexwal. I took a look at the sample file and think reconstruction_esc and reconstruction_rss are supported.

The kspace is a compound type which I think the original intention is to use it for complex data types.

It is not difficult to convert kspace to complex tensor in tensorflow. Though since HDF5 does not support complex types natively, we may need to come up with a clear way from api perspective.

@veritas9872
Copy link

veritas9872 commented Jun 6, 2019

@yongtang By coincidence, It happens to be the case that I was working on the same dataset (the fastMRI dataset) when I requested this feature.

That is also why I asked about compression filters and chunking (this accelerates data reading a lot).

I think that h5py is a good reference for the python API since it is the go-to library for HDF5 files. It supports the complex number type and it is the library used to create that particular dataset.

In fact, most HDF5 files will have been created with h5py (or maybe Pytables). I don't know anyone who uses the raw C API. So I think it would be a good idea to support the features in h5py.

MATLAB also has an HDF5 API (see here for details). However, it is much more limited than h5py and most people just use the automatic storage as .mat files. So if the features in h5py are supported, all features in MATLAB will also be supported.

@yongtang
Copy link
Member Author

@veritas9872 @alexwal sorry to get back late as was trying to get the TF 2.0's Dataset in place since last week. I will take a look at the HDF5 and get back soon.

@CaptainDuke
Copy link
Contributor

I met a problem when reading hdf5 files compressed with gzip,

HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: external/hdf5/src/H5Dio.c line 199 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#1: external/hdf5/src/H5Dio.c line 601 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#2: external/hdf5/src/H5Dchunk.c line 2259 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#3: external/hdf5/src/H5Dchunk.c line 3624 in H5D__chunk_lock(): data pipeline read failed
major: Dataset
minor: Filter operation failed
#4: external/hdf5/src/H5Z.c line 1301 in H5Z_pipeline(): required filter 'deflate' is not registered
major: Data filters
minor: Read failed

@alexwal
Copy link

alexwal commented Jun 19, 2019

@CaptainDuke is this related to TF-IO?

@CaptainDuke
Copy link
Contributor

@CaptainDuke is this related to TF-IO?

@alexwal
Yes.
When I use TF-IO to read normal hdf5 file, it works successfully. However, when this hdf5 file is compressed with gzip, TF-IO doesn't work and throw this error.

@veritas9872
Copy link

@CaptainDuke May I ask if there is also a problem with shuffle filtering? I use gzip level 1 with shuffle filter all the time because it has the best compression for floating point numbers and complex numbers. I am curious to know whether shuffle filters function in TF-IO as well.

@zaccharieramzi
Copy link

zaccharieramzi commented Nov 30, 2019

Hi, I wanted also to work on the fastMRI dataset, and so far was using Sequences. I now want to switch to tf datasets, and was wondering if someone had an example of a tf dataset working on HDF5 files.

Indeed, when using tfio in the following way: tfio.IODataset.from_hdf5(filename=filenames[0], dataset='reconstruction_esc')

I get the following error: InvalidArgumentError: unsupported data type: 216172782113784248 [Op:IO>HDF5ReadableInit]

@aspratyush
Copy link

aspratyush commented Dec 11, 2019

Similar issue as @zaccharieramzi hit. tfio version = 0.10.0
Traceback below:

2019-12-11 13:45:44.296754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6718 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-12-11 13:45:44.321259: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at io_interface.h:99 : Invalid argument: unsupported data type: 216172782113784248
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-3-ac7dfca1311d> in <module>
----> 1 tf_store = tfio.IOTensor.from_hdf5(filename)

~/workspace/venv-tf3.6/lib/python3.6/site-packages/tensorflow_io/core/python/ops/io_tensor.py in from_hdf5(cls, filename, **kwargs)
    393     """
    394     with tf.name_scope(kwargs.get("name", "IOFromHDF5")):
--> 395       return hdf5_io_tensor_ops.HDF5IOTensor(filename, internal=True)
    396 
    397   @classmethod

~/workspace/venv-tf3.6/lib/python3.6/site-packages/tensorflow_io/core/python/ops/hdf5_io_tensor_ops.py in __init__(self, filename, internal)
     37           filename,
     38           container=scope,
---> 39           shared_name="%s/%s" % (filename, uuid.uuid4().hex))
     40       columns = [column.decode() for column in columns.numpy().tolist()]
     41       elements = []

<string> in io_hdf5_readable_init(input, container, shared_name, name)

<string> in io_hdf5_readable_init_eager_fallback(input, container, shared_name, name, ctx)

~/workspace/venv-tf3.6/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     65     else:
     66       message = e.message
---> 67     six.raise_from(core._status_to_exception(e.code, message), None)
     68   except TypeError as e:
     69     keras_symbolic_tensors = [

~/workspace/venv-tf3.6/lib/python3.6/site-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: unsupported data type: 216172782113784248 [Op:IO>HDF5ReadableInit]

@yongtang
Copy link
Member Author

@zaccharieramzi @aspratyush Do you have a sample file I could take a look?

@aspratyush
Copy link

aspratyush commented Dec 11, 2019

@yongtang here you go (zip with the h5 causing the issue): test.zip

@yongtang
Copy link
Member Author

@aspratyush Added PR #681 which covers all common types (including string which causes the issue).

@zaccharieramzi
Copy link

With a file from the fastMRI database (I cannot attach it as it's too big even zipped), when I run the following code (with the latest tfio -pip install --no-deps --upgrade tensorflow-io-nightly-), Python is crashing with a segfault:

import tensorflow_io as tfio
f = 'file1000002.h5'
tfio.IODataset.from_hdf5(f, 'reconstruction_esc')

I can send you the file via mail if you want.

(Sorry for the late reply I have been focusing on other projects).

@yongtang yongtang reopened this Dec 19, 2019
@yongtang
Copy link
Member Author

@zaccharieramzi please send me through email or a link to download. You can find my email in git logs.

@yongtang
Copy link
Member Author

@zaccharieramzi I could not reproduce the set fault issue, I suspect it is related to the version mismatch of tf vs tfio. Are you using tensorflow-io-nightly with TF 2.0?

On another note, the complex data type was not supported in tensorflow-io. Supporting complex type is a little tricky as there is no native complex type in HDF5, only commonly used H5T_COMPOUND type (with 'r' and i) as the name for real and imaginary fields. The r and i could be something else, depending on the program (e.g., h5py, etc) that generates the h5 file.

I have added the support of complex64 and complex128 in PR #704 with 'r' and i. Other names could be added later if they are commonly used.

@zaccharieramzi
Copy link

No just tensorflow 2.0.
You will find attached a picture showing the chain of events I am facing.
This is done on my machine without GPUs.

segfault

@yongtang
Copy link
Member Author

@zaccharieramzi PR #704 has been merged and a new nightly build is available for Linux:
https://pypi.org/project/tensorflow-io-nightly/0.11.0.dev2383/#files

I think your issue should have been fixed with nightly 0.11.0.dev2383. You do need to use tensor flow 2.1rc1 for this nightly version.

I will close this issue but feel free to reopen if the issue persist.

@zaccharieramzi
Copy link

Well updating to this nightly version and to tf 2.1rc1 did fix the segfault issue.
However, it now doesn't find the dataset.

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-1-feafaa9cf8f7> in <module>
      1 import tensorflow_io as tfio
      2 f = 'file1000002.h5'
----> 3 tfio.IODataset.from_hdf5(f, 'reconstruction_esc')

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_io/core/python/ops/io_dataset.py in from_hdf5(cls, filename, dataset, **kwargs)
    202     with tf.name_scope(kwargs.get("name", "IOFromHDF5")):
    203       return hdf5_dataset_ops.HDF5IODataset(
--> 204           filename, dataset, internal=True)
    205 
    206   @classmethod

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_io/core/python/ops/hdf5_dataset_ops.py in __init__(self, filename, dataset, internal)
     39           container=scope,
     40           shared_name="%s/%s" % (filename, uuid.uuid4().hex))
---> 41       shape, dtype = core_ops.io_hdf5_readable_spec(resource, dataset)
     42       dtype = tf.as_dtype(dtype.numpy())
     43 

<string> in io_hdf5_readable_spec(input, component, name)

<string> in io_hdf5_readable_spec_eager_fallback(input, component, name, ctx)

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     65     else:
     66       message = e.message
---> 67     six.raise_from(core._status_to_exception(e.code, message), None)
     68   except TypeError as e:
     69     keras_symbolic_tensors = [

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: dataset reconstruction_esc not found [Op:IO>HDF5ReadableSpec]

@yongtang
Copy link
Member Author

@zaccharieramzi A prefix of '/' is needed as HDF5 dataset namespace could be recursive, e.g., /foo/bar/x/y/z.

The following should work (with '/' prefix):

tfio.IODataset.from_hdf5(f, '/reconstruction_esc')

@zaccharieramzi
Copy link

zaccharieramzi commented Dec 20, 2019

Ah I see ok thanks! It does work now, although not as I expected.

Indeed, in my case (fastMRI dataset), each HDF5 file is not a dataset but rather an example. Therefore here, I think I need tfio.IOTensor.from_hdf5 rather than IODataset and then call map on a list of file.

However, when I do:

tfio.IOTensor.from_hdf5(f)

I get the following error:

IndexError                                Traceback (most recent call last)
<ipython-input-23-c15d8aa6b442> in <module>
----> 1 tfio.IOTensor.from_hdf5(f)

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_io/core/python/ops/io_tensor.py in from_hdf5(cls, filename, **kwargs)
    372     """
    373     with tf.name_scope(kwargs.get("name", "IOFromHDF5")):
--> 374       return hdf5_io_tensor_ops.HDF5IOTensor(filename, internal=True)
    375 
    376   @classmethod

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_io/core/python/ops/hdf5_io_tensor_ops.py in __init__(self, filename, internal)
     67         function = _HDF5IOTensorFunction(
     68             core_ops.io_hdf5_readable_read,
---> 69             resource, column, shape, dtype)
     70         elements.append(
     71             io_tensor_ops.BaseIOTensor(

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_io/core/python/ops/hdf5_io_tensor_ops.py in __init__(self, function, resource, component, shape, dtype)
     30     self._resource = resource
     31     self._component = component
---> 32     self._length = shape[0]
     33     self._shape = tf.TensorShape([None]).concatenate(shape[1:])
     34     self._dtype = dtype

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_shape.py in __getitem__(self, key)
    861       else:
    862         if self._v2_behavior:
--> 863           return self._dims[key].value
    864         else:
    865           return self._dims[key]

IndexError: list index out of range

From the debugging I can see that it comes from the '/ismrmrd_header' column which doesn't have a proper shape. Probably there is a need to handle this kind of situation where a shape isn't proper.

Sorry for my misunderstanding of how IODataset would work for the HDF5 file.

@yongtang
Copy link
Member Author

@zaccharieramzi The issue is the scalar which was not taking into consideration before. Let me re-open this issue.

@yongtang yongtang reopened this Dec 20, 2019
@zaccharieramzi
Copy link

Cool! Do you need some help fixing this (if it's pure Python I can do it)?

@yongtang
Copy link
Member Author

@zaccharieramzi It is a little involved in C++. I had added a PR #708 for scalar support with HDF5.

@zaccharieramzi
Copy link

Thanks @yongtang , now in eager mode everything works just fine!

However at some point to get the values of the columns, you used .numpy() which makes it impossible to use in graph mode if I am not mistaken. For instance, as I want to use it in a dataset, I need to have the tensor created in a graph.

See the following minimal failing example (you need the file I sent you in your current dir).

import tensorflow as tf
import tensorflow_io as tfio
print(tf.__version__, tfio.__version__)

files_ds = tf.data.Dataset.list_files('./*.h5', seed=0)
hdf5_ds = files_ds.map(tfio.IOTensor.from_hdf5)

Gives the following error:

2.1.0-rc1 0.11.0
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-e201524a078e> in <module>
      4 
      5 files_ds = tf.data.Dataset.list_files('./*.h5', seed=0)
----> 6 hdf5_ds = files_ds.map(tfio.IOTensor.from_hdf5)

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in map(self, map_func, num_parallel_calls)
   1587     """
   1588     if num_parallel_calls is None:
-> 1589       return MapDataset(self, map_func, preserve_cardinality=True)
   1590     else:
   1591       return ParallelMapDataset(

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in __init__(self, input_dataset, map_func, use_inter_op_parallelism, preserve_cardinality, use_legacy_function)
   3887         self._transformation_name(),
   3888         dataset=input_dataset,
-> 3889         use_legacy_function=use_legacy_function)
   3890     variant_tensor = gen_dataset_ops.map_dataset(
   3891         input_dataset._variant_tensor,  # pylint: disable=protected-access

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in __init__(self, func, transformation_name, dataset, input_classes, input_shapes, input_types, input_structure, add_to_graph, use_legacy_function, defun_kwargs)
   3146       with tracking.resource_tracker_scope(resource_tracker):
   3147         # TODO(b/141462134): Switch to using garbage collection.
-> 3148         self._function = wrapper_fn._get_concrete_function_internal()
   3149 
   3150         if add_to_graph:

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py in _get_concrete_function_internal(self, *args, **kwargs)
   2393     """Bypasses error checking when getting a graph function."""
   2394     graph_function = self._get_concrete_function_internal_garbage_collected(
-> 2395         *args, **kwargs)
   2396     # We're returning this concrete function to someone, and they may keep a
   2397     # reference to the FuncGraph without keeping a reference to the

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   2387       args, kwargs = None, None
   2388     with self._lock:
-> 2389       graph_function, _, _ = self._maybe_define_function(args, kwargs)
   2390     return graph_function
   2391 

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   2701 
   2702       self._function_cache.missed.add(call_context_key)
-> 2703       graph_function = self._create_graph_function(args, kwargs)
   2704       self._function_cache.primary[cache_key] = graph_function
   2705       return graph_function, args, kwargs

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   2591             arg_names=arg_names,
   2592             override_flat_arg_shapes=override_flat_arg_shapes,
-> 2593             capture_by_value=self._capture_by_value),
   2594         self._function_attributes,
   2595         # Tell the ConcreteFunction to clean up its graph once it goes out of

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    976                                           converted_func)
    977 
--> 978       func_outputs = python_func(*func_args, **func_kwargs)
    979 
    980       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in wrapper_fn(*args)
   3139           attributes=defun_kwargs)
   3140       def wrapper_fn(*args):  # pylint: disable=missing-docstring
-> 3141         ret = _wrapper_helper(*args)
   3142         ret = structure.to_tensor_list(self._output_structure, ret)
   3143         return [ops.convert_to_tensor(t) for t in ret]

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in _wrapper_helper(*args)
   3081         nested_args = (nested_args,)
   3082 
-> 3083       ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
   3084       # If `func` returns a list of tensors, `nest.flatten()` and
   3085       # `ops.convert_to_tensor()` would conspire to attempt to stack

~/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py in wrapper(*args, **kwargs)
    235       except Exception as e:  # pylint:disable=broad-except
    236         if hasattr(e, 'ag_error_metadata'):
--> 237           raise e.ag_error_metadata.to_exception(e)
    238         else:
    239           raise

AttributeError: in converted code:

    /home/zaccharie/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_io/core/python/ops/io_tensor.py:374 from_hdf5  *
        return hdf5_io_tensor_ops.HDF5IOTensor(filename, internal=True)
    /home/zaccharie/workspace/fastmri-reproducible-benchmark/venv/lib/python3.6/site-packages/tensorflow_io/core/python/ops/hdf5_io_tensor_ops.py:60 __init__
        columns = [column.decode() for column in columns.numpy().tolist()]

    AttributeError: 'Tensor' object has no attribute 'numpy'

Do you now if there is a way to decode the column in a graph-friendly way?

@yongtang
Copy link
Member Author

@zaccharieramzi Some additional work is needed in order to support graph mode. I have created a new issue #710 to track this support.

@tkolarik
Copy link

tkolarik commented Jun 9, 2020

@yongtang I am unable to do anything meaningful with this using h5 files formatted for TASSEL variants

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature request
Projects
None yet
8 participants