Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras training and validation accuracy always converges to 0.5 #13006

Closed
Randryn0 opened this issue Jun 25, 2019 · 12 comments
Closed

Keras training and validation accuracy always converges to 0.5 #13006

Randryn0 opened this issue Jun 25, 2019 · 12 comments
Assignees
Labels
type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.

Comments

@Randryn0
Copy link

Randryn0 commented Jun 25, 2019

System information

  • Have I written custom code (as opposed to using example directory): yes
  • OS Platform and Distribution: Linux ubuntu 4.15.0-51-generic 💗 💓 💕 💖  #16~18.04.1-Ubuntu SMP
  • TensorFlow backend: yes
  • TensorFlow version: 1.13.1
  • Keras version: 2.2.4
  • Python version: 3.6.8

I have used a model (provided here] that trains a model on two categories of pictures and then tries to classify them.
Furthermore, I force the network to use the same seeds when training so as to get comparable results. I also create and close the tf sessions as I have read that this may also cause problems.

Describe the current behaviour
Most of the time the test and validation accuracy converge around 0.5 and the loss stays at exactly the same value for every epoch. I can run the same code, without changing it, several times in a row and get this problem and only rarely do I get a run where the neural network is trained properly and rises to and above 0.9 accuracy for training and validation.

Epoch 1/5
20/20 [==============================] - 5s 255ms/step - loss: 8.0572 - acc: 0.4994 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 2/5
20/20 [==============================] - 5s 252ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 3/5
20/20 [==============================] - 5s 251ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 4/5
20/20 [==============================] - 5s 250ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000
Epoch 5/5
20/20 [==============================] - 5s 252ms/step - loss: 8.0590 - acc: 0.5000 - val_loss: 8.0590 - val_acc: 0.5000

Describe the expected behaviour
For the same seeds the run should produce at least very similar results i.e. it should not be stuck around 0.5 accuracy.

Code to reproduce the issue

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import keras
from keras import backend as K

import tensorflow as tf

import time
import matplotlib.pyplot as plt
import sys
import os
import numpy
import random as rn

os.environ['PYTHONHASHSEED'] = '0'
numpy.random.seed(11)
rn.seed(11)  
tf.set_random_seed(11)

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, itner_op_parallelism_threads=1)
sess = tf.Session(graph.tf.get_default_graph(), config=session_conf)
K.set_session(sess)

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(300, 300, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))


model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))


keras.optimizers.Adam(lr=0.001)
model.trainable
model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])

train_datagen = ImageDataGenerator()
test_datagen = ImageDataGenerator()

train_generator = tain_datagen.flow_from_directory('train', target_size=(100, 100), batch_size=16, class_mode='binary', shuffle=False)
validation_generator = test_datagen.flow_from_directory('validation', target_size=(100, 100), batch_size=16, class_mode='binary', shuffle=False)

history = model.fit_generator(train_generator, steps_per_epoch=320//16, epochs=5, validation_data=validation_generator, validation_steps = 80/16)

K.clear_session()

[Edit]
I've tried a different tutorial and I get the same result. The only changes I have done to the code is to change the image dimension to the size of my images, changed the image directories and to turn off Image Augmentation, as well as changed the number of images.

About half of my runs produced output such as this

00/100 [==============================] - 11s 115ms/step - loss: 7.9406 - acc: 0.5006 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 2/10
100/100 [==============================] - 11s 111ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 3/10
100/100 [==============================] - 11s 110ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 4/10
100/100 [==============================] - 11s 111ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 5/10
100/100 [==============================] - 11s 110ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 6/10
100/100 [==============================] - 11s 109ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 7/10
100/100 [==============================] - 11s 110ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 8/10
100/100 [==============================] - 11s 109ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 9/10
100/100 [==============================] - 11s 106ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000
Epoch 10/10
100/100 [==============================] - 11s 107ms/step - loss: 7.9712 - acc: 0.5000 - val_loss: 7.9712 - val_acc: 0.5000

while the other half produced

00/100 [==============================] - 11s 114ms/step - loss: 6.7217 - acc: 0.5731 - val_loss: 4.1994 - val_acc: 0.7362
Epoch 2/10
100/100 [==============================] - 11s 110ms/step - loss: 3.4067 - acc: 0.6759 - val_loss: 0.3655 - val_acc: 0.8662
Epoch 3/10
100/100 [==============================] - 11s 109ms/step - loss: 0.4138 - acc: 0.8434 - val_loss: 0.3426 - val_acc: 0.8612
Epoch 4/10
100/100 [==============================] - 11s 110ms/step - loss: 0.3220 - acc: 0.8831 - val_loss: 0.3140 - val_acc: 0.8788
Epoch 5/10
100/100 [==============================] - 11s 110ms/step - loss: 0.2494 - acc: 0.9106 - val_loss: 0.3999 - val_acc: 0.8575
Epoch 6/10
100/100 [==============================] - 11s 110ms/step - loss: 0.2244 - acc: 0.9250 - val_loss: 0.3218 - val_acc: 0.8900
Epoch 7/10                                                                                                                                                                              
100/100 [==============================] - 11s 108ms/step - loss: 0.1907 - acc: 0.9344 - val_loss: 0.6445 - val_acc: 0.8375                                                             
Epoch 8/10                                                                                                                                                                              
100/100 [==============================] - 11s 108ms/step - loss: 0.1743 - acc: 0.9409 - val_loss: 0.4450 - val_acc: 0.8738                                                             
Epoch 9/10                                                                                                                                                                              
100/100 [==============================] - 11s 110ms/step - loss: 0.1382 - acc: 0.9503 - val_loss: 0.4937 - val_acc: 0.8738                                                             
Epoch 10/10                                                                                                                                                                             
100/100 [==============================] - 11s 110ms/step - loss: 0.1301 - acc: 0.9550 - val_loss: 0.4955 - val_acc: 0.8950    

(the numbers were not always exactly the same).

I though that maybe I have bad images so I tried use images from other tutorials (like this) but I got the same results: about half or more of the runs are useless because they converge at 0.5 accuracy. What's going on?

@jvishnuvardhan
Copy link
Contributor

@Randryn0 can you provide a standalone code to reproduce the issue? The above code as it is not working as I found a typo and underfined variables. Thanks!

@jvishnuvardhan jvishnuvardhan self-assigned this Jun 28, 2019
@jvishnuvardhan jvishnuvardhan added the type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited. label Jun 28, 2019
@santhoshBjeeffy
Copy link

Hello Randryn0

Even am facing the same issue now while doing sentimental analysis on movie review data.
is the above issue is resolved for you?
if soo please let us know the solution.

@jvishnuvardhan
Copy link
Contributor

Closing this issue. Please open a new issue with a standalone code to reproduce the issue. Resolution will be faster with a standalone code. Another comment is, it looks more like a support issue. If you strongly thinks this is a bug, then post here. Otherwise, post it in stackoverflow. thanks!

@MessyPaste
Copy link

I also have this problem.

@NikhilKothari
Copy link

I have the same problem!

@irudnyts
Copy link

One has to change the last layer

model.add(Dense(1))
model.add(Activation('sigmoid'))

to

model.add(Dense(2))
model.add(Activation('softmax'))

When tain_datagen.flow_from_directory generates batches, it uses one-hot encoding, returning a vector of two instead of scalar.

@peter0201yu
Copy link

I also have this problem. Does anyone have a solution yet? I tried different optimizers and learning rates as well but the accuracy always converges to 0.5.

@peter0201yu
Copy link

I also set the last layer to have two nodes with activation='softmax' but it still doesn't work.

@khatbahusain
Copy link

khatbahusain commented Jan 29, 2021

When using gen_train.flow_from_directory('../data/train', shuffle=True, class_mode='binary')
with activation='sigmoid'

be sure to add class_mode='binary' to gen_train.flow_from_directory

@rbhambriiit
Copy link

@khatbahusain : Thanks for the tip.

changing the class_mode to binary helped me. The default seems to be categorical

@EricChu98
Copy link

When using gen_train.flow_from_directory('../data/train', shuffle=True, class_mode='binary') with activation='sigmoid'

be sure to add class_mode='binary' to gen_train.flow_from_directory

OMG! You are my lifesaver!!! I stuck with this problem for a week!

@yasasvy
Copy link

yasasvy commented Feb 20, 2023

When using gen_train.flow_from_directory('../data/train', shuffle=True, class_mode='binary') with activation='sigmoid'
be sure to add class_mode='binary' to gen_train.flow_from_directory

OMG! You are my lifesaver!!! I stuck with this problem for a week!

Still I am getting val_acc 0.5 constantly. Can anyone confirm with using Sigmoid(last layer activation), binary_crossentropy(loss) and binary (class_mode) how many output neurons should I use in last layer for classifying one image into one of two classes(2 sub folders inside flow_from_directory)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:support User is asking for help / asking an implementation question. Stackoverflow would be better suited.
Projects
None yet
Development

No branches or pull requests