Add domain adaptation example and gradient reversal layer #4031

zumpchke · 2016-10-12T05:33:13Z

Add an example implementing the 'Domain-Adversarial Training of Neural Networks' paper (https://arxiv.org/abs/1505.07818)

This allows domain adaptation in an unsupervised manner by forcing the net to learn features that are domain invariant between training and target domains using concept of gradient reversal.

Further

Nontrivial example of functional API.
Shows how to visualize activations of intermediate layer.
Example of broken out training loop.

Credits:

Clayton Mellina (@pumpikano) (https://github.com/pumpikano/tf-dann) for providing
a sketch of implementation (in TF) and utility functions.
Yusuke Iwasawa (@yusuke0519) (Gradient-reversal layer #3119 (comment))
for Theano implementation (op) for gradient reversal.

fchollet

Incomplete review for now. I would appreciate it if someone else could jump in and review this PR.

fchollet · 2016-10-14T18:19:02Z

keras/layers/core.py

+
+
+class GradientReversal(Layer):
+    def __init__(self, l, **kwargs):


Use a variable with a complete name, not l.

fchollet · 2016-10-14T18:19:23Z

keras/layers/core.py

+        return input_shape
+
+    def get_config(self):
+        config = {'lambda': self.l}


The entries in config should match the arguments in __init__.

fchollet · 2016-10-14T18:19:53Z

keras/layers/core.py

+        if K._BACKEND == 'theano':
+            self.op = K.ReverseGradient(self.l)
+        elif K._BACKEND == 'tensorflow':
+            self.op = K.ReverseGradientBuilder()


The interface should be the same for Theano and TensorFlow.

fchollet · 2016-10-15T16:43:32Z

It looks like you are including in the PR many changes that are unrelated to the PR. This makes the PR difficult to review. Please rebase from master.

zumpchke · 2016-10-16T02:45:28Z

@fchollet done

zumpchke · 2016-10-19T08:55:12Z

@fchollet Any further feedback on this PR?

fchollet · 2016-10-21T22:35:57Z

keras/engine/training.py

@@ -983,7 +983,8 @@ def _standardize_user_data(self, x, y,
        sample_weights = [standardize_weights(ref, sw, cw, mode)
                          for (ref, sw, cw, mode)
                          in zip(y, sample_weights, class_weights, self.sample_weight_modes)]
-        check_array_lengths(x, y, sample_weights)
+        if check_batch_dim:
+            check_array_lengths(x, y, sample_weights)


I don't get this. Can you clarify what you are trying to do?

In the training process for domain adaptation, the input batch contains batch_size/2 samples from the source domain and batch_size/2 samples from the target domain. Per the model (added picture in PR), the full batch is used in the left branch to do "domain classification" whereas the first half of the batch is sliced out feature extraction in the right branch for the source domain.

The purpose of this is to bypass a sanity check in Keras by propagating a kwarg from train_on_batch and allow unequal batch lengths for the X and y arguments.

fchollet · 2016-10-21T22:36:12Z

keras/engine/training.py

@@ -1215,6 +1216,7 @@ def train_on_batch(self, x, y,
                from this class during training.
                This can be useful to tell the model to "pay more attention" to
                samples from an under-represented class.
+            check_batch_dim: Whether to check batch dimensions for sanity.


Description not consistent with actual behavior...

Tried to make this clearer (as above).

fchollet · 2016-10-21T22:44:40Z

keras/backend/theano_backend.py

+
+class ReverseGradient(theano.Op):
+    '''Flips the sign of incoming gradient during training.'''
+    view_map = {0: [0]}


This probably doesn't do what you think it does. Do not use global class attributes, especially not ones that are pointers.

fchollet · 2016-10-21T22:45:14Z

keras/layers/core.py

+        self.trainable_weights = []
+
+    def call(self, x, mask=None):
+        return self.op(x)


If all you do with ReverseGradient is call it, why should it be a class? Everything in the backend is a function.

fchollet · 2016-10-21T22:46:36Z

keras/backend/tensorflow_backend.py

@@ -1924,3 +1925,24 @@ def ctc_decode(y_pred, input_length, greedy=True, beam_width=100,
                     for st in decoded]

    return (decoded_dense, log_prob)
+
+
+class ReverseGradient(object):


This should definitely be a function, taking two arguments.

I can clarify this a bit since I wrote the original here. The grad reversal layer overrides the gradient of identity with an expression that has a hyperparam lambda. Since different instances can have different lambdas, we need to register a new gradient op for each instance of the grad reversal layer. This is accomplished by assigning each new gradient op a unique name via num_calls. The only reason this is a callable class is to avoid num_calls being a global.

A couple comments on the implementation here:

It doesn't make sense for the lambda hyperparam to be an arg to init here - it defeats the entire purpose of implementing this as a class.

The only reason this is a class is to organize num_calls, so the class should be instantiated once right here to make a op function and the class itself need not be exported from the module, e.g. https://github.com/pumpikano/tf-dann/blob/master/flip_gradient.py#L22. Alternately, num_calls can be a global or captured in a closure - this is a style decision that is up to @fchollet.

Hope this helps.

Another option: generate a unique gradient name with e.g. system time and forget num_calls altogether.

fchollet · 2016-10-21T22:48:48Z

keras/backend/tensorflow_backend.py

@@ -1319,6 +1319,7 @@ def switch(condition, then_expression, else_expression):
        condition: scalar tensor.
        then_expression: TensorFlow operation.
        else_expression: TensorFlow operation.
+        lazy: Unused (compatibility with Theano backend)


Is this argument really indispensable then?

Not sure what you mean, but the purpose here was to ensure that we don't get a TypeError upon switching implementations and using the lazy kwarg to switch.

zumpchke · 2016-10-26T12:47:56Z

Hey @pumpikano @fchollet, thanks for the feedback. I've tried to simplify this PR to fit more into the ethos of 'everything in the backend is a function'. Let me know what you think.

jmhessel · 2016-11-24T01:32:51Z

Does this have the potential to get merged soon? :-) Would be super helpful feature, I think!

calanchue · 2016-11-28T11:01:41Z

examples/dann.py

+        # When building DANN model, route first half of batch (source examples)
+        # to domain classifier, and route full batch (half source, half target)
+        # to the domain classifier.
+        net = Lambda(lambda x: K.switch(K.learning_phase(), x[:int(batch_size / 2), :], x, lazy=True),


Do we needs those switch operation(and related core modification) and lazy to do reversal gradient training? How about just make two model? Source model which include both classifier_output and domain_ouput. And target model which only include domain_output. After that, train source model with only source batch data and target model with only target batch data for each epoch. I means train_on_batch two times for source and target model.
this will reduce keras core modifcation which starts from this new 'switch' opertion, and also make the trained model doesn't depend on size of trained batch size when it evaluate

Keras allows to supply train methods with sample weights. So, I think it is possible just to pass a dictionary of sample weights. For classifier_output it should be the vector

classifier_output_w = np.ones(batch_size) classifier_output_w[batch_size//2:] = 0

and for domain_ouput sample weights are just

domain_ouput_w = np.ones(batch_size)

Actually, I am trying to train the net (not for MNIST, though) with this approach and it seems to work.

@rykov8 How can i control each ouptput weight for each sample?
Do you mean training like below?

metrics = dann_model.train_on_batch( sample_weights=classifier_output_w ) metrics = dann_model.train_on_batch( sample_weights=domain_ouput_w )

But in this way, the model gets twice many src data than tgt data.

@calanchue No, actually, I mean, that we can avoid this Lambda layer (in this example script): build a model exactly like it is constructed here, but without this layer. And then the idea is the following:
we generate a batch, that has 1/2 of labelled data and 1/2 of auxiliary data (used for adaptation). Moreover, we generate classification labels, that have 1/2 true labels and 1/2 some fictive labels (they don't matter actually) and also domain labels to denote whether a sample is from labelled data or from auxiliary data. Last but not least are the sample weights, described above. Than we just pass this data to any train method of out model. Some pseudo-code (not a working sample, I didn't test it, just to explain the idea):

def generator(X, labels, aux_data): # not a real generator, outputs the same batch, just to show the idea while True: train_data = X[:batch_size] train_data[batch_size//2:] = aux_data[:batch_size//2] y = np.zeros(batch_size) y[:batch_size//2] = labels[:batch_size//2] domain_y = np.zeros(batch_size) domain_y[:batch_size//2] = 1 classifier_output_w = np.ones(batch_size) classifier_output_w[batch_size//2:] = 0 domain_ouput_w = np.ones(batch_size) feed_dict = ({'main_input': train_data}, {'classifier_output': y, 'domain_output': domain_y}, {'classifier_output': classifier_output_w, 'domain_output': domain_ouput_w}) yield feed_dict ... feed_dict = generator.next() model.train_on_batch(feed_dict)

I use model.fit_generator method in my experiments, but I think it is not very important. Here we can see, that fictive labels don't influence the main classifier because of sample weights and, moreover, we train main classifier and domain classifier in one forward-backward pass without any changes in Keras background or other stuff.

@ I have tested it on train_on_batch and it works. Thank you!:+1:

You're welcome! Hope it will help to merge this PR with the main branch, I believe, that gradient reversal layer and domain adaptation method, that use it, could be quite useful for the community.

calanchue · 2016-11-28T14:43:01Z

Got some error while running dann.py. used theano backend.

Traceback (most recent call last):
  File "D:/dev/CODE/dann_keras/keras/examples/dann.py", line 224, in <module>
    dann_model = builder.build_dann_model(main_input, hp_lambda)
  File "D:/dev/CODE/dann_keras/keras/examples/dann.py", line 189, in build_dann_model
    branch = self.grl(net, hp_lambda)
  File "D:\dev\CODE\dann_keras\keras\keras\engine\topology.py", line 514, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "D:\dev\CODE\dann_keras\keras\keras\engine\topology.py", line 572, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "D:\dev\CODE\dann_keras\keras\keras\engine\topology.py", line 149, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
TypeError: call() takes at least 3 arguments (3 given)

maybe hp_lambda should be parameter of init(), not call()

I have tested your older version 262f9d0 and It doesn't have the problem.

calanchue · 2016-11-29T04:51:16Z

i made modified version of dann. got more accurate result. if it is good enough, we can reduce core modification on keras

calanchue · 2016-11-29T13:25:33Z

There were some mistake. ignore 'Two fit' and please see 'Two fit dann'.

rykov8 · 2016-12-01T07:34:50Z

examples/dann.py

+        l = 2. / (1. + np.exp(-10. * p)) - 1
+        lr = 0.01 / (1. + 10 * p)**0.75
+        hp_lambda = l
+        builder.opt.lr = lr


I'm not sure, but it seems, that here you just override hp_lambda and builder.opt.lr to become float numbers, but don't change the real gradient multiplier and learning rate in the graph. I believe, that you need to use K.set_value() as it is done in LearningRateScheduler callback.

Thanks for pointing this! Yes, this changes are not affected learning rate and hp_lambda. I tried with Keras 2 (Tensorflow). In order to make changes to parameters hp_lambda and learning rate, I need to use K.set_value.

keunwoochoi · 2016-12-19T05:06:43Z

Is there any update on this PR? It seems like the issues are more about the example than layer/backend implementations. Probably we could first have layer and backend and then discuss how to deal with the examples.

fchollet · 2017-03-15T21:39:03Z

Closing outdated PR. If you still care about the content of the PR, please submit a new PR to master, updated for the Keras 2.0 API.

zumpchke mentioned this pull request Oct 13, 2016

Improving results for MNIST-M? pumpikano/tf-dann#2

Closed

fchollet reviewed Oct 14, 2016

View reviewed changes

zumpchke added 4 commits October 16, 2016 13:19

Add domain adaptation example and gradient reversal layer

f7e9e55

Fix pep8 issues

bcc06eb

Evaluate over entire test set

d53a4a6

Clean up backend implementation

262f9d0

zumpchke force-pushed the domain_adapt branch from f7c65ca to 262f9d0 Compare October 16, 2016 02:22

fchollet reviewed Oct 21, 2016

View reviewed changes

zumpchke added 4 commits October 26, 2016 23:11

Simplify implementation so reverse gradient op is a function

fd8883a

pep8

777f849

Fix up comment a bit

7bdc363

Fix up comment

3c92091

calanchue reviewed Nov 28, 2016

View reviewed changes

calanchue mentioned this pull request Nov 29, 2016

Two fit zumpchke/keras#1

Closed

calanchue mentioned this pull request Nov 29, 2016

Two fit dann zumpchke/keras#2

Open

rykov8 reviewed Dec 1, 2016

View reviewed changes

fchollet closed this Mar 15, 2017

pumpikano mentioned this pull request Jul 13, 2017

Keras implementation? pumpikano/tf-dann#14

Open

PeterChe1990 mentioned this pull request Jun 14, 2018

Add ReversedGradient layer #10429

Closed



		class GradientReversal(Layer):
		def __init__(self, l, **kwargs):

Add domain adaptation example and gradient reversal layer #4031

Add domain adaptation example and gradient reversal layer #4031

Conversation

zumpchke commented Oct 12, 2016 • edited Loading

fchollet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fchollet commented Oct 15, 2016

zumpchke commented Oct 16, 2016

zumpchke commented Oct 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zumpchke commented Oct 26, 2016

jmhessel commented Nov 24, 2016

calanchue Nov 28, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rykov8 Dec 1, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rykov8 Dec 2, 2016 • edited Loading

Choose a reason for hiding this comment

calanchue commented Nov 28, 2016 • edited Loading

calanchue commented Nov 29, 2016 • edited Loading

calanchue commented Nov 29, 2016 • edited Loading

rykov8 Dec 1, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keunwoochoi commented Dec 19, 2016

fchollet commented Mar 15, 2017

zumpchke commented Oct 12, 2016 •

edited

Loading

calanchue Nov 28, 2016 •

edited

Loading

rykov8 Dec 1, 2016 •

edited

Loading

rykov8 Dec 2, 2016 •

edited

Loading

calanchue commented Nov 28, 2016 •

edited

Loading

calanchue commented Nov 29, 2016 •

edited

Loading

calanchue commented Nov 29, 2016 •

edited

Loading

rykov8 Dec 1, 2016 •

edited

Loading