Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add external library data layer #3955

Closed

Conversation

sasagalic-MSFT
Copy link

Currently there are a number of data layers available in Caffe supporting different input formats. However, there is always one more format which is optimal for the given training.
One cannot expect Caffe to support all the formats, but fairly close solution would be to implement extensible architecture which enables easy integration of external formats with Caffe.
In this change external library data layer is implemented. This data layer expects external library (Shared Object on Linux or DLL on Windows) to provide data for Caffe network.
Using this data layer has several benefits:

  • Decouples data source from Caffe. New formats are easily consumable and do not require changes in Caffe codebase
  • Enables in-memory synthetic data generation
  • Enables multiple top blobs (current prefetching data layers only support 2 tops, which is a limitation for some trainings)
  • Opens possibility to implement some of the existing data layers (LMDB, LevelDB etc.) as external libraries to reduce number of dependencies Caffe has

This change is implemented with compatibility in mind. All existing data layers maintain the same behaviour. External library adapters are provided for Windows and Linux to localize platform specific code.

In this change external library data layer is implemented. This data layer is
data layer that pulls actual input data from external library which is compliant
with interface declared in external_lib_data_source.hpp.
External library data layer is employed by specifying "ExternalLibData" as layer
type in network proto file as well as library data parameters. Path to external
library, external library data source factory method name and external library
parameters are parameters that need to be specified. Since external library
implementation is platform dependent, appropriate interface is added to avoid
spreading platform dependence throughout Caffe codebase. Interface
implementation is added for Windows (dynamic link library) and Linux (shared
object library).
@willyd
Copy link
Contributor

willyd commented May 19, 2016

Could this be made even more general? Allowing not only plugin-like data layers, but also compute layers?

@zer0n zer0n deleted the bvlc_external_lib_data_layer branch July 22, 2016 04:50
@willyd willyd mentioned this pull request Feb 1, 2017
Copy link
Contributor

@hgaiser hgaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of loading layers from external libraries. I would also like to extend this to compute layers (as proposed in #5243). That would solve the large number of incompatible caffe forks that currently exist for different networks.

See the inline comments for some feedback on the implementation.

#ifndef EXTARNAL_LIB_DATA_SOURCE_H_
#define EXTARNAL_LIB_DATA_SOURCE_H_

class IExternalLibDataSource;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caffe doesn't use prefixes for interface classes right now. For example: Layer is an interface and it's simply called Layer. I think it makes sense to maintain the same naming style, so not include the prefix. Same goes for the other interface classes.

* @brief Returns library object that abstracts away library management
* dependent on platform.
*/
boost::shared_ptr<IExternalLib> GetDataSourceLibraryWrapper(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If C++11 is allowed, a std::unique_ptr would be nicer here. There is still a single owner at this point, so it models the ownership more accurately, and it can be implicitly converted to a shared_ptr (boost and std). It could even be released into any lifetime management scheme you can think of.

class ExternalLibDataLayer : public BasePrefetchingDataLayer<Dtype> {
public:
explicit ExternalLibDataLayer(const LayerParameter& param);
virtual ~ExternalLibDataLayer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a class template, shouldn't the function definitions be included here? It seems like this will result in linking errors for some values for Dtype.

/**
* @brief Defines interface for data source.
*/
class IExternalLibDataSource {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you create additional interfaces? Caffe already has an interface for data layers: BaseDataLayer<Dtype>. Why not let the loaded module create instances of that class instead of the new interfaces? I'd expect that would save a lot of code overhead. On top of that I think it would also simplify the process for supporting compute layers.

class IExternalLib {
public:
virtual ~IExternalLib() {}
virtual IExternalLibDataSource* GetDataSource() = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the class for loading symbols/functions from a library could be generalized (possibly through templating) to support loading functions with any signature. Then it could be reused for loading compute layers as well.

@shelhamer
Copy link
Member

Closing in favor of the more general approach in #5294. Thanks @sasagalic-MSFT for your concern about how to handle all the different varieties of data out there.

@shelhamer shelhamer closed this Apr 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants