Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The domain classifier loss is not decreasing. #12

Open
zhongpeixiang opened this issue Mar 10, 2023 · 8 comments
Open

The domain classifier loss is not decreasing. #12

zhongpeixiang opened this issue Mar 10, 2023 · 8 comments

Comments

@zhongpeixiang
Copy link

image

As shown in the image above, the domain classifier loss is almost constant throughout the training process. I use a ViT as the feature extractor, a linear layer as label classifier, and a two-layer MLP as the domain classifier.

What are the possible causes? and what are the typical loss curves for domain classifier?

Thanks

@cs-mshah
Copy link

cs-mshah commented Apr 3, 2023

Even I am facing this issue. The DANN loss converges quite fast. Here are the plots of this repository without any changes to the code: wandb-DANN. In my other case I tried using a resnet18 and the domain classifier similar to the classifier part of resnet. There also the loss almost instantly stabilised to around 0.7. Is there any good repository where we can clearly understand how DANN works and a CLEAR practical working example of stable training?

@taotaowang97479
Copy link

Even I am facing this issue. The DANN loss converges quite fast. Here are the plots of this repository without any changes to the code: wandb-DANN. In my other case I tried using a resnet18 and the domain classifier similar to the classifier part of resnet. There also the loss almost instantly stabilised to around 0.7. Is there any good repository where we can clearly understand how DANN works and a CLEAR practical working example of stable training?

I had the same problem. I used the code and data set given by the author, and there was no problem in the training. With the progress of training, the loss of the domain classifier stabilized at 0.65-0.67, and the training loss was shown in the figure.
train_loss
However, with my own network, the same training method, the domain classifier loss is stable at 0.69 from the beginning, which seems to indicate that the domain classifier is not learning, it will randomly classify the source domain or target domain samples with 50% probability.
Can anyone figure out how to fix it?

@cs-mshah
Copy link

I had shifted to a more robust codebase: https://github.com/thuml/Transfer-Learning-Library

@taotaowang97479
Copy link

I had shifted to a more robust codebase: https://github.com/thuml/Transfer-Learning-Library

So is it the code itself? I went to this codabase and looked at the same DANN code and I didn't think there was a big difference in the way it was written

@JialingRichard
Copy link

Even I am facing this issue. The DANN loss converges quite fast. Here are the plots of this repository without any changes to the code: wandb-DANN. In my other case I tried using a resnet18 and the domain classifier similar to the classifier part of resnet. There also the loss almost instantly stabilised to around 0.7. Is there any good repository where we can clearly understand how DANN works and a CLEAR practical working example of stable training?

I had the same problem. I used the code and data set given by the author, and there was no problem in the training. With the progress of training, the loss of the domain classifier stabilized at 0.65-0.67, and the training loss was shown in the figure. train_loss However, with my own network, the same training method, the domain classifier loss is stable at 0.69 from the beginning, which seems to indicate that the domain classifier is not learning, it will randomly classify the source domain or target domain samples with 50% probability. Can anyone figure out how to fix it?

Does this means the domain classifier even didn't learn anything? Did you solve this problem, like other resp's code can be better?

@taotaowang97479
Copy link

Even I am facing this issue. The DANN loss converges quite fast. Here are the plots of this repository without any changes to the code: wandb-DANN. In my other case I tried using a resnet18 and the domain classifier similar to the classifier part of resnet. There also the loss almost instantly stabilised to around 0.7. Is there any good repository where we can clearly understand how DANN works and a CLEAR practical working example of stable training?

I had the same problem. I used the code and data set given by the author, and there was no problem in the training. With the progress of training, the loss of the domain classifier stabilized at 0.65-0.67, and the training loss was shown in the figure. train_loss However, with my own network, the same training method, the domain classifier loss is stable at 0.69 from the beginning, which seems to indicate that the domain classifier is not learning, it will randomly classify the source domain or target domain samples with 50% probability. Can anyone figure out how to fix it?

Does this means the domain classifier even didn't learn anything? Did you solve this problem, like other resp's code can be better?

Yes, the domain classifier has not learned anything. I have already abandoned my project, as it is not an issue with the code itself. Based on my experience, hyperparameter tuning does not have much effect (I have tried random search for hyperparameters). I believe the key to successful training is whether the dataset aligns with the network architecture: if the data is too challenging, the network is likely unable to learn domain shift; if the network architecture is overly complex, adversarial training is easily unbalanced. Both of these factors can lead to the domain classifier performing random classification.

@JialingRichard
Copy link

Even I am facing this issue. The DANN loss converges quite fast. Here are the plots of this repository without any changes to the code: wandb-DANN. In my other case I tried using a resnet18 and the domain classifier similar to the classifier part of resnet. There also the loss almost instantly stabilised to around 0.7. Is there any good repository where we can clearly understand how DANN works and a CLEAR practical working example of stable training?

I had the same problem. I used the code and data set given by the author, and there was no problem in the training. With the progress of training, the loss of the domain classifier stabilized at 0.65-0.67, and the training loss was shown in the figure. train_loss However, with my own network, the same training method, the domain classifier loss is stable at 0.69 from the beginning, which seems to indicate that the domain classifier is not learning, it will randomly classify the source domain or target domain samples with 50% probability. Can anyone figure out how to fix it?

Does this means the domain classifier even didn't learn anything? Did you solve this problem, like other resp's code can be better?

Yes, the domain classifier has not learned anything. I have already abandoned my project, as it is not an issue with the code itself. Based on my experience, hyperparameter tuning does not have much effect (I have tried random search for hyperparameters). I believe the key to successful training is whether the dataset aligns with the network architecture: if the data is too challenging, the network is likely unable to learn domain shift; if the network architecture is overly complex, adversarial training is easily unbalanced. Both of these factors can lead to the domain classifier performing random classification.

Hi, I just did a training use this code this week. I found maybe we can set alpha(the weight of GRL layer) is very small at first, to make sure the classifier is strong enough, then we start scale the alpha to make sure the feature extractor layer start 'cheating'. As a result, the accuracy of source and domain classifier accuracy can be very high at first then slow down after that finally they both can be very close to 0.5 accuracy, just like this picture below (red and purple line for 2 domain classifier accuracy). So that We can have a better performance, although this changes of alpha are completed by a math function in this resp code. Hope this can be useful for you to train DANN maybe.
图片

@taotaowang97479
Copy link

Even I am facing this issue. The DANN loss converges quite fast. Here are the plots of this repository without any changes to the code: wandb-DANN. In my other case I tried using a resnet18 and the domain classifier similar to the classifier part of resnet. There also the loss almost instantly stabilised to around 0.7. Is there any good repository where we can clearly understand how DANN works and a CLEAR practical working example of stable training?

I had the same problem. I used the code and data set given by the author, and there was no problem in the training. With the progress of training, the loss of the domain classifier stabilized at 0.65-0.67, and the training loss was shown in the figure. train_loss However, with my own network, the same training method, the domain classifier loss is stable at 0.69 from the beginning, which seems to indicate that the domain classifier is not learning, it will randomly classify the source domain or target domain samples with 50% probability. Can anyone figure out how to fix it?

Does this means the domain classifier even didn't learn anything? Did you solve this problem, like other resp's code can be better?

Yes, the domain classifier has not learned anything. I have already abandoned my project, as it is not an issue with the code itself. Based on my experience, hyperparameter tuning does not have much effect (I have tried random search for hyperparameters). I believe the key to successful training is whether the dataset aligns with the network architecture: if the data is too challenging, the network is likely unable to learn domain shift; if the network architecture is overly complex, adversarial training is easily unbalanced. Both of these factors can lead to the domain classifier performing random classification.

Hi, I just did a training use this code this week. I found maybe we can set alpha(the weight of GRL layer) is very small at first, to make sure the classifier is strong enough, then we start scale the alpha to make sure the feature extractor layer start 'cheating'. As a result, the accuracy of source and domain classifier accuracy can be very high at first then slow down after that finally they both can be very close to 0.5 accuracy, just like this picture below (red and purple line for 2 domain classifier accuracy). So that We can have a better performance, although this changes of alpha are completed by a math function in this resp code. Hope this can be useful for you to train DANN maybe. 图片

Hi, your ACC and the loss in the image I shared are perfectly matched in this DANN project. In some of the applications of DANN in my field, some alphas are fixed throughout, while others, like in this DANN project, gradually increase as the number of training iterations increases. When I previously used random search for parameters, I always used a fixed alpha value. Now, I'm trying to dynamically adjust alpha. Thanks for your reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants