Multi-Target Models that accept NA values in the target variables. #57

FirpoMarberry · 2023-09-12T20:44:22Z

FirpoMarberry
Sep 12, 2023

This is a really exciting package. Thanks for putting everything together here. I was wondering about multi-target models where we don't know the outcomes of all of the targets.

Suppose some process exists where we want to both model the probability that an agent attempts an event as well as the result of the event. The agent would only attempt the event if they expected a good outcome, so we'd want to model the correlation between the likelihood of an attempt and the quality of the outcome conditional on an attempt.

However if an attempt is not made then we wouldn't know the outcome of the attempt, so it would be NA. Right now including these observations would result in an error. Would it be possible to instead have these observations not be counted when calculating the loss for that target of the observation but still use it for the targets that do have an observation?

This is fairly easy to implement with a neural net in keras via a custom loss function (like the one below) but my particular problem has had much better performance from XGB than from any neural nets I've built.

def binary_cross_entropy_nan(target, output):
    #expects probabilities
    index = ~tf.math.is_nan(target)
    target = tf.boolean_mask(target, index)
    output = tf.boolean_mask(output, index)
    
    epsilon = _to_tensor(tf.keras.backend.epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, epsilon, 1 - epsilon)
    output = tf.math.log(output / (1 - output))

    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)

Thank you!

StatMixedML · 2023-09-13T07:35:09Z

StatMixedML
Sep 13, 2023
Maintainer

Hi @FirpoMarberry and thanks for your interest in the project.

Using XGBoostLSS to model multi-targets of different types is something that I am very interested in, though I haven't really looked into it.

What you are describing is conceptually feasible within the framework, as long as the loss is twice differentiable. XGBoostLSS builds one model for each target/parameter and then calculates the joint loss based on the loss function, from which gradients and hessians are derived. Concerning the NAs: if it is working for a NN-type of model then it is also working for XGBoostLSS. You mention that you want to model the

correlation between the likelihood of an attempt and the quality of the outcome conditional on an attempt

I am not sure if the above cross-entropy loss would also explicitly model dependencies between the targets, since it does not have a "dependency parameter" that models the co-relation as function of covariates.

Since the framework is of probabilistic nature, is there a distributional assumption/density we can sample from?

Thanks for your suggestion.

1 reply

FirpoMarberry Sep 18, 2023
Author

Thanks for the quick response here. Sorry for my delay.

Concerning the NAs: if it is working for a NN-type of model then it is also working for XGBoostLSS.

When I have NAs in some but not all of the target variables I get this error: xgboost.core.XGBoostError: [18:07:26] ../src/data/data.cc:461: Check failed: valid: Label contains NaN, infinity or a value too large. Maybe I'd have to adjust the loss function but I'd want to be able to include the dependency parameter you developed. It looks like the NLL loss function uses torch.nansum, which should allow for NAs in the target. So possible this is an issue with base XGB not allowing NaNs in the target?

I am not sure if the above cross-entropy loss would also explicitly model dependencies between the targets, since it does not have a "dependency parameter" that models the co-relation as function of covariates.

That's true. My bad. I was just using that as an example of how to handle NAs in the loss function. When I've built something similar with neural nets I didn't explicitly measure the dependencies between the targets, just made the targets share almost all of the layers of the neural nets so it's drawing on similar features from the data.

Since the framework is of probabilistic nature, is there a distributional assumption/density we can sample from?

In theory we could assume some kind of MVN. Though the target would contain a mix of binomial and continuous targets but we could do some kind of logit transformation for the binomial targets if needed.

Thank you again for all your help!

StatMixedML · 2023-09-19T08:30:50Z

StatMixedML
Sep 19, 2023
Maintainer

When I have NAs in some but not all of the target variables I get this error: xgboost.core.XGBoostError: [18:07:26] ../src/data/data.cc:461: Check failed: valid: Label contains NaN, infinity or a value too large. Maybe I'd have to adjust the loss function but I'd want to be able to include the dependency parameter you developed. It looks like the NLL loss function uses torch.nansum, which should allow for NAs in the target. So possible this is an issue with base XGB not allowing NaNs in the target?

When I was referring to "Concerning the NAs: if it is working for a NN-type of model then it is also working for XGBoostLSS" I meant that given a proper way to deal with NAs and a loss function to train, XGBoostLSS is able to estimate all parameters. Hence, we first need to have a family.py that sets up the distribution and the corresponding log_prob function. We can then also apply masking to the NA values so that the gradients and hessians are not influenced by the NAs. Since the base XGBoost does not allow NAs in the target, we need to replace them with any constant on which we then later apply the mask.

In theory we could assume some kind of MVN. Though the target would contain a mix of binomial and continuous targets but we could do some kind of logit transformation for the binomial targets if needed.

Isn't there any PyTorch, Tensorflow or Python implementation that we can use? Not sure if we should use a MVN for a discrete (even after transformation) variable.

1 reply

FirpoMarberry Sep 26, 2023
Author

We can then also apply masking to the NA values so that the gradients and hessians are not influenced by the NAs. Since the base XGBoost does not allow NAs in the target, we need to replace them with any constant on which we then later apply the mask.

This makes sense. Thank you. Would this have to be done within a family.py script?

Isn't there any PyTorch, Tensorflow or Python implementation that we can use? Not sure if we should use a MVN for a discrete (even after transformation) variable.

I'm not familiar with any implementation for this. I've been trying to modify MVN.py to do this even if it's imperfect to use MVN for a discrete variable. Though admittedly my experience with pytorch is pretty limited which is making this take a bit longer than I'd like.

StatMixedML · 2023-09-27T13:21:05Z

StatMixedML
Sep 27, 2023
Maintainer

This makes sense. Thank you. Would this have to be done within a family.py script?

Yes, this essentially has to be done within the distribution_utils.py that we need to create along with the new MVN.py.

I'm not familiar with any implementation for this. I've been trying to modify MVN.py to do this even if it's imperfect to use MVN for a discrete variable. Though admittedly my experience with pytorch is pretty limited which is making this take a bit longer than I'd like.

Have you tried looking into copulas?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Target Models that accept NA values in the target variables. #57

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Multi-Target Models that accept NA values in the target variables. #57

FirpoMarberry Sep 12, 2023

Replies: 3 comments · 2 replies

StatMixedML Sep 13, 2023 Maintainer

FirpoMarberry Sep 18, 2023 Author

StatMixedML Sep 19, 2023 Maintainer

FirpoMarberry Sep 26, 2023 Author

StatMixedML Sep 27, 2023 Maintainer

FirpoMarberry
Sep 12, 2023

Replies: 3 comments 2 replies

StatMixedML
Sep 13, 2023
Maintainer

FirpoMarberry Sep 18, 2023
Author

StatMixedML
Sep 19, 2023
Maintainer

FirpoMarberry Sep 26, 2023
Author

StatMixedML
Sep 27, 2023
Maintainer