-
Notifications
You must be signed in to change notification settings - Fork 3
Running stability selection on your data
Once you have found the SNPs that have a significant correlation with microbiome abundances, you need to determine which taxa (or other covariates) most consistently and robustly contribute to the correlation.
Lasso regression is sensitive to small variations of the covariates so it is common to use a resampling method like stability selection to choose relevant covariates. (For more details, see the supplemental methods.)
hominid_stability_selection
runs on a single processor.
hominid_stability_selection
skips SNPs whose 95th percentile
confidence interval for R2 includes zero (0).
The command-line arguments are all required and are expected in this order:
- Output file from
hominid
, with unprocessed SNPs removed.- SNPs that were not processed by
hominid
are those that have "NA" in columns 7 to 21 (starting with column "rsq_mean" and ending with "cv_kurtosis"). Delete the rows corresponding to these SNPs.
- SNPs that were not processed by
- OTU/taxon table: Use the same input file as was used in
hominid
- Output file name
- The lowest α coefficient: During stability selection, the Lasso tuning parameter, α, is varied between 0.3 αmax and αmax. You can change the range to, say, 0.5 αmax and αmax by setting this argument to 0.5.
- transformation of the input abundance data. Use the same value as was
used in
hominid
. - number of SNPs to test. To run on all input SNPs, set this value to -1.
To see a sample hominid_stability_selection
command, see test_stability_selection.sh
hominid_stability_selection
takes the file
produced by hominid
and merely adds extra columns
at the right side of the table. The added columns are the input
taxa/covariates,
and the data values are the stability scores for each
taxon/covariate.