-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine tuning this model #56
Comments
@Meight |
I've managed to successfully fine-tune models from this repo: https://github.com/qubvel/segmentation_models, maybe you can use them as well. |
Thank you for the reply! Although I spent so much time on this for no useful result, I'm kind of glad to learn it's not just a stupid mistake I kept missing. I should have said in my initial post that I came across that story of broken batch normalization — which is kind of crazy to be honest, but that's another debate —, but I wasn't so sure as this issue hasn't occurred in other Keras models I've tried to fine tune in the past. That could definitely be at least one of the problems of this model though. I discovered the repository you linked only a few days ago and I still have to adapt it to our workflow. I'm glad to learn you managed to fine tune these models. On a side note not related to the current repo, I noticed the models are implemeted using I would suggest you update the readme of this repository as to tell people the proposed implementation couldn't be trained or fine tuned as far as we know and that it's only valid for inference for now. Hopefully we can spare people a lot of wasted time if they're not willing to troubleshoot it themselves. I'll submit a pull request for that, if you like. Thank you again for the reply! |
Related to bonlime#56.
@Meight @bonlime |
@pluniak I tried both cases and results and each time results were ridiculously poor. I grabbed a native TensorFlow version of DeepLab v3+, used the exact same preprocessing and quickly got results close to those of the paper. My conclusion is that there was definitely something wrong with the model in this repo, but I stopped wasting time investigating it as soon as @bonlime confirmed he had been having similar issues. Besides, state of the art for semantic segmentation evolved quite significantly since this model was published, and there exist other alternatives that perform about equally. There was virtually no interest for my research to invest time on this. |
@Meight May I ask which other models have evolved since them that performed equally well for you? I'm especially interested in models available in Keras or TF. |
Hi, I was able to fine-tune this network from pre-trained weights a few months ago. I did nothing special, just loaded the model with the pre-trained pascal voc weights and hit train. The only thing in my case is that the number of classes is 120, so the last layer is definitely different. Other than that, the network trains and smoothly converges with great performance. |
@rdiazgar can you show me some of your results ? Iam intending to fine-tune this repo's model but was hesitate when reading the author's readme. ? |
@trungpham2606, sorry but I'm afraid I can't show you any results, as they are currently submitted to a conference and hence I must keep them confidential. What I meant to say with my post is that I certainly had no problems using this network with pretrained weights and fine-tune it for a different dataset (KITTI). In my case, I just loaded the deeplab model with the 'pascal voc' weights with a different number of categories to classify (120 labels). Then I simply followed standard keras training with a custom data generator to feed the network and opted out by assigning a small learning rate value (1e-3), except for the last layer, which had a lr value 10x larger (1e-2). This was my fine-tuning strategy and it certainly has worked without any problems so far. I was also surprised to recently see the README section claiming it can't be fine-tuned. Perhaps they refer to other strategies for fine-tuning, like freezing all but the last layers. I can only say that in my experience, I have not encountered any problems using this network, either training from scratch or fine-tuning from pre-trained weights. Raul |
@rdiazgar oh. First, thank you for your quick response. I will try to fine tune this model according to your fine-tune pipeline. |
downgrade from tensorflow 1.11 1.12 to 1.10 might solve the problem, or not using tf.keras. |
@Meight I am having the same problem. I can tune it a little bit but the accuracy is very bad (less than 0.5) even after a good amount of training time. Did you manage to get at least more than 0.5 accuracy? |
@hfurkanbozkurt This is about the range I was able to reach too (~0.47-0.48). When fine tuning the pure TF implementation I have now I was able to reach results close to those of the paper. I have no clue what was wrong with my pipeline when I tried using this Keras implementation since it works flawlessly with the TF implementation with no modification whatsoever. Seeing some people commenting here that they could fine tune it successfully baffles me since there also seems to be many people whom haven't been able to and I spent about three weeks on this and probably checked every single line of code 10 times. This will remain a mystery as far as I'm concerned... Good luck if you keep working on it! |
I was successfully able to retrain on my custom dataset from the pre-loaded weights (I haven't tried fine-tuning the decoder only). After combing through the issues on here, here is a list of changes I made:
I hope this helps someone! |
@kritiyer can you provide some result's images you get :3 |
@trungpham2606 I'm working with medical image data so I'm not comfortable posting the images here, but I promise it's working! I did have to use the datumbox keras fork for BatchNormalization=False to work properly and give decent results: https://github.com/datumbox/keras/tree/fork/keras2.2.4 |
@bonlime @Meight Hello, just to make it totally clear, is it possible to train end-to-end a model (without any frozen layers) with voc dataset weights initialization? If not, do you have any idea why this is happening? @kritiyer @rdiazgar Can you please refer some results in terms of mIoU? Please elaborate a bit more on the steps you followed to train the model? E.g. why should not someone use ImageDataGenerator()? |
Hi @wave-transmitter , Yes, it is possible to train end-to-end this model without any frozen layers. I have successfully used this model with the mobilenetv2 and xception backbones either from scratch, from the pascal-voc weights, and even from the cityscapes weights (see #67). The dataset that I used to train my model is not pascal voc, but Kitti. Unfortunately, I cannot share any results as of now because my work is under a conference confidentiality policy. I will certainly post some results when the conference proceedings become public. In my personal case, I instantiated the model with or without the pre-trained weights, never froze a layer, and trained the model via a custom image data generator that feeds the images (normalized by 1./255) and their corresponding ground truth values. I did not use the ImageDataGenerator available in Keras, but I see no reason why this should be the problem. Best, |
@wave-transmitter Hello, I also successfully trained using both Mobilenet and Xception (from the Pascal weights), and was able to fine-tune the decoder as well as train from scratch with frozen batch normalization layers (I don't have enough GPU memory to train the batch normalization layers). So far the best Dice score I got for a binary classification problem is 0.97. I used an ImageDataGenerator to feed in my data because it was too large to load in memory, but if I used any of the data augmentation arguments (rotate, shear, flip, etc) I got garbage results and I'm not sure why. I listed the steps I took to train in my comments above. I'm using tensorflow-gpu 1.10 and keras 2.2.4 (datumbox fork, linked above). |
hi @rdiazgar , Would you mind to also share which optimizer and loss function you were using? Thanks in advance! |
Hi @Licini, Sure. I simply used SGD with momentum=0.9, and learning rate of 0.001. The loss is cross-entropy. |
Thank you both for your detailed answers. @rdiazgar Is it possible to share your model's accuracy in terms of IoU? No need to share inferenced results. Moreover, for how many epochs did you train end-to-end the model and which was the selected batch size? @kritiyer Can you please also let us know about your choices regarding the optimizer, the learning rate and the batch size? Similarly, for how many epochs did you train your model? |
Truth being said, I am not using this model for semantic segmentation, so I don't have any quantitative measure for intersection over union. I am training this model for monocular depth estimation. I trained the model for about 30 epochs with a batch size of 4, which is about 300k iterations for the KITTI training set. The input images are random crops of 375x513 pixels. Raul |
@rdiazgar Thanks for the sharing! I was able to retrain a simple two classes version using mobilenetv2, no frozen layers. And It worked pretty well. For anyone who's interested. I was using binary cross-entropy, one object class and background class. My dataset was about 8k imgs without any augmentation. It trained for 10 epochs with batch of 8. Didn't have any IoU measurements yet, but it at least worked for my eyes. |
sure @pluniak , I was using keras 2.2.4 with tensorflow-gpu 1.8.0 |
I used keras 2.2.4, and tensorflow-gpu 1.9.0 in one machine and 1.12.0 in another one. |
I have also successfully fine-tuned this model. I did nothing special: TF.1.13.1-GPU, Keras 2.2.4, binary_cross_entropy, Adamax(default params), labels_shape(height,width,no_classes). Keras.ImageDataGenerator and class_weights also work. I passed in numpy arrays. Converges quickly with reasonable performance. @kritiyer @rdiazgar @Licini |
Hi @pluniak , Regarding the lack of activation in the last layer, I believe that this is just for convenience. For instance, if you want to classify your pixels via a Softmax function, all this function does is to turn raw output logit values and turn them into a probability distribution (probits). However, this is only useful from a training point of view, because these probits are used for computing the loss function (e.g., cross-entropy). However, at test time, you only care about which output logit has the higher value (argmax), and you don't need to call Softmax to do that. Plus, by not using Softmax at test time, you save some computation costs because exponentials and logarithms are quite expensive operations. If you check Keras' docs and code, you'll see that most of the loss functions defined have an optional parameter named 'from_logits' that takes into account exactlty that: when True, the loss calls for softmax before computing the loss; when False, it assumes the network's last layer includes already a softmax call. Best |
@rdiazgar Thanks. Makes sense :-) |
@rdiazgar hello bro, I want to ask you about the pre-processing part. Did you normalize the images to [-1, 1] or any other ranges ? |
Hi, The range [0,1] worked out the best for me. You're right, the [-1,1] got me worse results. |
@rdiazgar |
I never used the train.py script of this repo. I have my own training script and I simply instantiate the DeepLabv3+ model. The 'pascal_voc' weights are simply a model checkpoint that has learnt to segment images from the Pascal VOC dataset. You can also use the weights = 'cityscapes' to start your training script from a pre-trained checkpoint oriented to autonomous driving. |
@rdiazgar oh tks rdiazgar. I will try. |
@rauldiza |
If I understand your question right, you asking what shape should your ground truth labels be, right? That depends on what the loss function needs. For instance, sparse_categorical_crossentropy expects the labels to be simply the number associated with each class, while categorical_crossentropy expects the labels to be one-hot coded vectors for each class. In a segmentation scenario like this, if you are using categorical_crossentropy as a loss function, the shape of your labels should be (batch_size, image_height, image_width, classes). If you choose the sparse loss version, the shape should be (batch_size, image_height, image_width, 1). |
Hi, so is there anyone fine tune with VOC successfully?
fire epoch result:
BN is depends on the data, so when the data distribution change the BN parameter shall be change too. Freeze or not is not a problem. and if the data is changing like transfer learning, I believe the BN layer shall not be freeze . Thus can catch the distribution of new data. So I think the fine tune might need the data and DA exactly same as the first stage traning. |
May i ask which paper you want to reproduce? |
I am not sure the effect of these changes on the final results. |
@rauldiaz Hello! Sorry for undigging such an old thread, but I'm having really bad results on the mobilenetv2 version with the cityscapes weights. The exception works amazingly well, but the mobilenetv2 returns a very blurry image segmentation. The dataset i'm using is the kitti360. The only thing I modified on the model was the line 172 of the model #in_channels = inputs.shape[-1].value # inputs._keras_shape[-1] because it was causing an error while creating the model (the change was suggested on issue #125 ). Did you went through something similar? |
hey @Meight, would you please share the steps you followed in fine-tunning the official repo of deeplabV3+? I was tunning it for two-class(background+one foreground) but after some iterations, all of my image pixels start to acquire a single value(1 in my case). |
@Thunder003 Hi I have same problem. Did you solve it? |
@GivenAward please don't use this repo, i stopped supporting it a very long time ago. Also it uses very old versions of Keras/TF, better switch to pytorch and use segmentation models from there |
Has anyone been able to successfully fine tune this model at all and, say, from Xception only pretrained on ImageNet?
After three weeks of tweaking and exploring, a good dozen of different loss functions and many more runs with a wide range of hyperparameters (including around those of the original paper), I still can't get the model to even overfit on a small batch from Pascal VOC raw dataset. Consequently, I haven't even been able to reproduce the original paper's results by fine tuning this repo's model so far.
I triple checked and unit tested my preprocessing pipeline, which in turn is just copy/pasted from the original repo, and here's the kind of results I get during training phase:
(bottom right most picture is just the argmax over all classes.)
The model does converge toward the same loss value when using pixelwise cross-crossentropy with logits (tried all the possible variations of that, whether by adding a softmax activation in the model or by using TF's native function
tf.nn.softmax_cross_entropy_with_logits_v2
) with different hyperparameters but it doesn't even begin to perform proper segmentation. I've also tried using @bonlime 's cost function as shared in this reply and several variations of soft dice loss but results aren't any better.Plotting the different feature maps shows I've successfully loaded the weights of Xception pretrained on ImageNet (the model can totally discriminate objects across images), so this is not a problem.
I'm starting to seriously doubt this model is actually trainable or tunable as is so I'd be curious to hear if anyone got to train it before I dive into its detailed implementation.
The text was updated successfully, but these errors were encountered: