-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way in Keras to apply different weights to a cost function? #2115
Comments
Similar: #2121 |
You could use class_weight. |
class_weight applies a weight to all data that belongs to the class, it should be dependent on the missclassification. |
You are absolutely right, I'm sorry I misunderstood your question. I will try to come back with something tomorrow using |
Ok so I had the time to quickly test it. So if you want to pass constants included in the cost function, just build a new function with partial. '''Train a simple deep NN on the MNIST dataset.
Get to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''
from __future__ import print_function
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils
import keras.backend as K
from itertools import product
# Custom loss function with costs
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
return K.categorical_crossentropy(y_pred, y_true) * final_mask
w_array = np.ones((10,10))
w_array[1, 7] = 1.2
w_array[7, 1] = 1.2
ncce = partial(w_categorical_crossentropy, weights=np.ones((10,10)))
batch_size = 128
nb_classes = 10
nb_epoch = 20
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
rms = RMSprop()
model.compile(loss=ncce, optimizer=rms)
model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
show_accuracy=True, verbose=1,
validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test,
show_accuracy=True, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1]) |
Wow, that s nice. Thanks for the detailed answer! |
Try to test it on a toy example to verify that it actually works. If it's what you are looking for, feel free to close the issue! |
Well, I am stuck, I can t make it run in my model, it says:
This is the model I am using:
|
Sure, sorry I was using Theano functionnalities. I replaced the following line in my previous example: y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1)) It should do the trick! |
Sounds the way to go, I was using tensorflow as backend. I tell you if it works as soon as posiible. Thanks! |
I still get an error:
I ve tried your first reply under theano backend and it works though. |
Ok, I was not sure about how y_pred_max = K.reshape(y_pred_max, (K.int_shape(y_pred)[0], 1)) |
I get more or less the same:
It seems like it cannot get the shape of y_pred as an integer , right? |
Mm, ok I will take a look at it today and work directly with tensors to try to find a way to have it work properly for both backend. |
Hi there, I tried something like that:
I Think it will do it. |
The latter only works for non recurrent networks, but this code works for RNNs following the same idea. It only works for tensorflow though. I couldn t find a way to reshape a tensor the way we want with the keras backend:
|
My bad, just replacing tf.expand_dims with K.expand_dims worked for me:
The last line is necessary for tensorboard callback to work, thanks!! |
Is the Mar 31 solution for @ayalalazaro above still recommended as of v1.2? (Noticed @tboquet 's comment: Keras 1.0 will provide a more flexible way to introduce new objectives and metrics.) My problem is binary classification where true positive accuracy is more important, and some false negatives are acceptable. Would I need the approach above to achieve that objective? I tried |
Just a small detail about the
|
Hello, I am trying to implement this in tensorflow. I am confused as to what
I do not see it defined anywhere in this thread, and get
as output... Thanks |
@jerpint It’s available from import functools
ncce = functools.partial(w_categorical_crossentropy, weights=np.ones((10,10))) |
I am trying to incorporate @curiale's implementation
These are the variables' shapes inside
Frankly I am lost in |
Hnn, I'm sorry but I don't quite understand: What does this |
@recluze Sorry for the confusion. Let me clarify: The model is an image segmentation network with output |
@enikkari this error can be resolved by adding another line after:
as following:
|
Also, to prevent a row in y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.equal(y_pred, y_pred_max) can be replaced with a more robust (and intuitive) code: y_pred_arg_max = K.argmax(y_pred)
y_pred_max_mat = K.one_hot(y_pred_arg_max, num_classes=y_pred.shape[1]) Another added value of this, is it no longer requires to follow with the |
Adding to the import tensorflow.keras.backend as K
from tensorflow.keras.losses import CategoricalCrossentropy
class WeightedCategoricalCrossentropy(CategoricalCrossentropy):
def __init__(self, cost_mat, name='weighted_categorical_crossentropy', **kwargs):
assert cost_mat.ndim == 2
assert cost_mat.shape[0] == cost_mat.shape[1]
super().__init__(name=name, **kwargs)
self.cost_mat = K.cast_to_floatx(cost_mat)
def __call__(self, y_true, y_pred, sample_weight=None):
assert sample_weight is None, "should only be derived from the cost matrix"
return super().__call__(
y_true=y_true,
y_pred=y_pred,
sample_weight=get_sample_weights(y_true, y_pred, self.cost_mat),
)
def get_sample_weights(y_true, y_pred, cost_m):
num_classes = len(cost_m)
y_pred.shape.assert_has_rank(2)
y_pred.shape[1:].assert_is_compatible_with(num_classes)
y_pred.shape.assert_is_compatible_with(y_true.shape)
y_pred = K.one_hot(K.argmax(y_pred), num_classes)
y_true_nk1 = K.expand_dims(y_true, 2)
y_pred_n1k = K.expand_dims(y_pred, 1)
cost_m_1kk = K.expand_dims(cost_m, 0)
sample_weights_nkk = cost_m_1kk * y_true_nk1 * y_pred_n1k
sample_weights_n = K.sum(sample_weights_nkk, axis=[1, 2])
return sample_weights_n Usage: model.compile(loss=WeightedCategoricalCrossentropy(cost_matrix), ...) Similarly, this can be applied for the from tensorflow.keras.metrics import CategoricalAccuracy
class WeightedCategoricalAccuracy(CategoricalAccuracy):
def __init__(self, cost_mat, name='weighted_categorical_accuracy', **kwargs):
assert cost_mat.ndim == 2
assert cost_mat.shape[0] == cost_mat.shape[1]
super().__init__(name=name, **kwargs)
self.cost_mat = K.cast_to_floatx(cost_mat)
def update_state(self, y_true, y_pred, sample_weight=None):
assert sample_weight is None, "should only be derived from the cost matrix"
return super().update_state(
y_true=y_true,
y_pred=y_pred,
sample_weight=get_sample_weights(y_true, y_pred, self.cost_mat),
) Usage: model.compile(metrics=[WeightedCategoricalAccuracy(cost_matrix), ...], ...) |
In addition to w_arry given by @tboquet in the post above, how to construct the cost_matrix? can somebody help please? |
@GalAvineri |
@eliadl I'm getting an unexpected keyword argument 'sample_weight' |
@dest-dir Please post a StackOverflow question with your code, and share the link here. I'll try assist there. |
@eliadl how I insert the cost matrix in another custom loss? Like focal loss `class FocalLoss(tf.keras.losses.Loss):
|
@damhurmuller Please post a StackOverflow question with your code, and share the link here. I'll try assist there. |
For semantic segmentation, with:
Usage:
|
@mendi80 Please, is your function right ? |
@dest-dir , @eliadl The sample weight problem seems to be solved by changing the magic function __call__'s to call. I also modified the return on call to multiply the output of super().call(y_t,y_p) by the return from get_sample_weights. @eliadl - I think your approach, from what I understood, was to overwrite/overload rather than access the categorical crossentropy call method and pass in sample_weight as an expected parameter of this call; however, I couldn't figure out why this worked for you and not for us? (And, frankly, my python knowledge isn't really up for figuring this out!) I utilised @SpikingNeuron's class code in order to get this working. I also changed the weight argument from a positional argument to a named argument as part of trying to get the model loading working The loss class therefore became: Class weighted_categorical_crossentropy(tensorflow.keras.losses.CategoricalCrossentropy):
def __init__(
self,
*,
weights,
from_logits=False,
label_smoothing=0,
reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE,
name='categorical_crossentropy',
):
super().__init__(
from_logits, label_smoothing, reduction, name=f"weighted_{name}"
)
self.weights = weights
def call(self, y_true, y_pred):
return super().call(y_true, y_pred) * get_sample_weights(y_true, y_pred, self.weights)
def get_config(self):
return {'weights': self.weights}
@classmethod
def from_config(cls, config):
return cls(**config)
def get_sample_weights(y_true, y_pred, cost_m):
num_classes = len(cost_m)
cost_m = K.cast(cost_m, 'float32')
y_pred.shape.assert_has_rank(2)
assert(y_pred.shape[1] == num_classes)
y_pred.shape.assert_is_compatible_with(y_true.shape)
y_pred = K.one_hot(K.argmax(y_pred), num_classes)
y_true_nk1 = K.expand_dims(y_true, 2)
y_pred_n1k = K.expand_dims(y_pred, 1)
cost_m_1kk = K.expand_dims(cost_m, 0)
sample_weights_nkk = cost_m_1kk * y_true_nk1 * y_pred_n1k
sample_weights_n = K.sum(sample_weights_nkk, axis=[1, 2])
return sample_weights_n Note the inclusion of: def get_config(self):
return {'weights': self.weights}
@classmethod
def from_config(cls, config):
return cls(**config) This is necessary in order for the custom loss function to be registered with Keras for model saving. tf.keras.losses.weighted_categorical_crossentropy = weighted_categorical_crossentropy Usage: model.compile(
optimizer='adam',
loss={'output': weighted_categorical_crossentropy(weights=cost_matrix)
) Saving: model.save(filepath,,save_format='tf') Loading: model = tf.keras.models.load_model(
filepath,
compile=True,
custom_objects={
'weighted_categorical_crossentropy': weighted_categorical_crossentropy(weights=cost_matrix)
}
) Feedback welcome. |
def get_config(self):
return super().get_config().copy().update(
{'weights': self.weights}
) |
@eliadl - Thanks; SO Question |
@dest-dir as @PhilAlton found, the problem was
should have been this:
|
Hello does anyone know how to do this for sparse categorical crossentropy? |
Hello, thank you for this awesome thread. I have a small question though, I am trying to implement this solution in In other words, is Thank you in advance.
|
I guess I found the answer, I have seen the documentation of
I will just do in on the logits then? |
yeah even I have the same question on this. |
With tf.keras implementation I would propose a more vectorized approach (avoid the for loop):
You can modify the above to fit your needs but it worked for me with an example weight matrix, where you want to achieve some cost-sensitive learning where certain mispredictions are more/less important than others.
|
Hello. How can we implement this for Sparse Categorical Cross Entropy? |
I would also like to know how to implement this for SparseCategoricalCrossEntropy |
Has anyone verified the code above works? If so can they share a minimal working example? @isaranto have you verified your vectorized approach of the original method works on the MNIST network example as given? I put a high weight on the misclassification that naturally seems to be highest when running the dense neural network given by @tboquet and the results are not intuitive, as in, the number of misclassifications does not decrease. I've compared the confusion matrix of results for using weights on w[7,9] =1, 1.1, 1.2, 1.5, 1.7, 2, ... 10, ... 100 and one would expect the number of misclassification on [7,9], to decrease as the weight increases, but there doesn't seem to be a consistent pattern and if anything it seems like 7 out of 30 times I run the results, the misclassification for [7,9] increases dramatically (like from 20 to 386). So I tried negative numbers, and that did have the immediate effect of decreasing the misclassification rates. However, using negative numbers isn't consistent with any of the above discussion. Here's the code I've used - it's long so I posted a link to my Public Google Colab Notebook: https://github.com/RachelRamirez/misclassification_matrix/blob/main/w%5B7%2C9%5D%3D100_Misclassification_Cost_Matrix_Example.ipynb This is the output of one of the worst confusion matrixes (run 14) using w[7,9]=100. It seems like its rewarding the misclassification instead of the reverse.
|
Hey @RachelRamirez , it has been 2 years since I wrote that comment so don't remember all that well. |
Thank you for the quick reply. I have played with lots of weights, and all of the numbers seem to reward misclassifications until I use a negative weight, which isn't consistent with the comments above. I wish I could comment on a more recent thread but this seems to be the only issue that addresses misclassifications and is continually referenced in all the other Kera's threads. |
Hi @RachelRamirez - I verified this class https://stackoverflow.com/a/61963004 extensively at the time... Hopefully provides a starting point for your specific problem? |
@PhilAlton Thanks! I verified your process works in line with how I expected it to work using the MNIST example! If I raise the cost of a misclassification, the resulting costly misclassification goes down. I still wish Keras would make this a more easy to implement feature. |
Yep @RachelRamirez - I remember this being a real pain at the time! Tbh, we might be massively overcomplicating this... Loss functions do take a "sample_weights" argument, but it's not well documented (imo). It wasn't 100% clear to me if this was equivalent to class weights, plus I only discovered this when I had my own implementation working... |
@PhilAlton Loss functions support a That's basically why we needed this #2115 (comment). |
@eliadl - ah yes, it's all coming back to me now! @RachelRamirez - if you were sufficiently motivated, you could raise a pull request to get this included... Not something I've done before! (imbalanced problems are very common, though accessing via call us clearly TF/Keras' preferred approach, eg: https://keras.io/examples/structured_data/imbalanced_classification/ - though it's not intuitive that the weights should be passed through model.fit) |
Hi there,
I am trying to implement a classification problem with three classes: 0,1 and 2. I would like to fine tune my cost function so that missclassification is weighted some how. In particular, predicting 1 instead of 2 should give twice the cost than predicting 0. writing it in a table format, it should be something like that:
Costs:
Predicted:
0 | 1 | 2
__________________________
Actual 0 | 0 | 0.25 | 0.25
1 | 0.25 | 0 | 0.5
2 | 0.25 | 0.5 | 0
I really like keras framework, it would be nice if it is possible to implement it and not having to dig into tensorflow or theano code.
Thanks
The text was updated successfully, but these errors were encountered: