-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Pseudocounts for drop-out variants #43
Comments
This is a question that has come up before, but as you said is not supported by Enrich2. I'll try to explain the reasoning behind not calculating these scores and provide a possible workaround.
If you are using ratio-based scores, this might perform well. The issues come in with regression-based scores with many time points. If a variant drops out early, does it make sense to calculate a strong negative score based on the regression line intercepting the x-axis when the dropout happens? What if the variant drops out in the middle of the experiment and is then seen again in a later time point (due to sampling issues)? The log-linear fit will be very poor and potentially misleading. We were not able to determine a general solution to these issues, and did not have sufficient test data to approach the problem at the time, so we went ahead and filtered out these variants.
Enrich2 is no longer under active development, but I have added this feature request to the successor project. If you would like to add a pseudocount, my suggestion is:
Please let me know if you need extra assistance getting this set up. There are some example notebooks in the documentation that show how to open the HDF5 files, but the code may be out of date. |
Thanks for the reply Alan! |
Hi,
I'm running Enrich2 on a selection MAVE and have noticed I am unable to get scores for some poorly performing variants because they tend to drop out in later time points during the selection. My PI was wondering if we could alleviate that by introducing pseudo-counts, only for those variants that were clearly present in the initial sample and then decline. We have 3-4 time points plus initial samples and are scoring with WLS regression.
Do you know if this is at all done for MAVE data or if not what the objections are?
And is this something you would consider adding to Enrich2?
The text was updated successfully, but these errors were encountered: