You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why do the wild-type predictions obtained by setting different mutation sites differ when using the same zeroshot model for prediction? When the "positions" parameter is set to "V39", the wild type in the result table, which corresponds to "V", has a value of -2.5876. When the "positions" parameter is set to "V39 D40", the value corresponding to "VD" in the result table is -5.0146. Why are the results of the wild-type output different between the two settings? If so, which one should be used as the wild-type prediction value for comparability
The text was updated successfully, but these errors were encountered:
@BKRBH sorry this is late. I don't check this repo often anymore. Please tag me and I'll be quicker.
It's because the probabilities are not normalized. When you set "V39", the model returns the log probability for "V" at position 39; it does not consider any other positions. When you set "V39 D40", the model sums the log probabilities for "V" at position 39 and "D" at position 40.
The reason the other positions are ignored is because MLDE was made for evaluating combinatorial landscapes/libraries. To get the internal consistency that you are looking for would mean summing over the log probabilities of all positions, which is a waste of compute if you know from the beginning that they're all constants. As a tradeoff, though, this means that you can only compare scores relative within a combinatorial space.
Why do the wild-type predictions obtained by setting different mutation sites differ when using the same zeroshot model for prediction? When the "positions" parameter is set to "V39", the wild type in the result table, which corresponds to "V", has a value of -2.5876. When the "positions" parameter is set to "V39 D40", the value corresponding to "VD" in the result table is -5.0146. Why are the results of the wild-type output different between the two settings? If so, which one should be used as the wild-type prediction value for comparability
The text was updated successfully, but these errors were encountered: