-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF5DataLayer source now takes list of filenames, loads one at a time. #203
Conversation
LOG(INFO) << "Number of files: " << num_files; | ||
|
||
// Load the first HDF5 file and initialize the counter. | ||
// TODO: make this a function, since we also call it in Forward (in .cu also) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it will be cleaner to have a function to load a HDF5 and read data and label from it. From line 51 until 58
In other parts of the code Similar for top_label Also using @sergeyk I like the layer, and the current test pass. An example of a simple multi-class dataset will be great. |
@sergeyk I think you could easily expand hdf5_data_layer to take 3D or 4D matrices. It seems to me that overwriting status in |
@sergio multi-dim data was enabled by #217, which I merged into dev. I'll rebase this PR on dev now. As for overwriting status, that's the way HDF5 examples do it, eg. http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/hdf5-examples/1_8/C/H5D/h5ex_d_shuffle.c Other comments are good, will incorporate all in the next commit. |
@sergio good for review again |
@sergeyk Thanks for taking care of the comments. However, now looking at the code, it seems that the function that
And change |
At first I agreed with you about Without the switch to blobs, I don't agree that Unless you can further justify why |
Using Blobs will not automatically mean that the data would reside in GPU, even in GPU mode. Here is when the data moves from CPU to GPU, So it will be safe for you to use a Blob to store the data read it by Regarding simple OOP, I think in some cases one can twist classes to behave as imperative code and global variables. It would be different if The other reason to make |
Okay, your proposed refactor makes sense then. Stand by for update. |
@sguada check it out. |
current_row_ = 0; | ||
} | ||
|
||
CUDA_CHECK(cudaMemcpy( | ||
&(*top)[0]->mutable_gpu_data()[i * data_count], | ||
&(data_.get()[current_row_ * data_count]), | ||
&data_blob_.mutable_cpu_data()[current_row_ * data_count], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_blob_ shouldn't be mutable here. You are just reading from it. Just use
&data_blob_.cpu_data()[current_row_ * data_count]
@sergeyk nice job! All test pass, but make lint complains about test_hdf5data_layer.cpp, io.cpp, vision_layers.hpp and io.hpp Once you address my latest comments, and fix lint complains in the files you have changed, it will be ready to merge. You may want to look in #120 and include a shuffle_files after you reach the end of the list. But it is not necessary. |
@sergeyk There are still several lint errors
|
Do you really need to read 100 times the same files to test HDF5DataLayerTest? It produces too many logs that make hard to read the test status. |
Now good. |
@sergeyk good job!! |
HDF5DataLayer source now takes list of filenames, loads one at a time.
HDF5DataLayer source now takes list of filenames, loads one at a time.
Also test for loading GZIP-compressed HDF5 files. With sparse data, this flies!