diff --git a/README.Rmd b/README.Rmd index 162bb28..eadfca4 100644 --- a/README.Rmd +++ b/README.Rmd @@ -5,14 +5,31 @@ output: github_document ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` - `happi`: a **H**ierarchical **Ap**proach to **P**angenomics **I**nference -## What is `happi`? +# HAPPI + + + +a **H**ierarchical **Ap**proach to **P**angenomics **I**nference + + `happi` is a method for modeling gene presence in pangenomics that leverages information about genome quality to improve inference. `happi` models the association between an experimental condition and gene presence where the **experimental condition** is the **primary predictor** of interest and **gene presence** is the **outcome** while incorporating user-chosen information on genome quality metrics (e.g. mean coverage, contamination, completion, etc...). You might be interested in using `happi` to conduct your pangenomics hypothesis testing if you work with fragmented genomes such as metagenome assembled genomes (MAGs). `happi` is currently distributed as an `R` package and can be installed using the instructions below. -## Where does `happi` fit into my workflow? +If you are a **microbial ecologist** or **bioinformatician**, some of the things that you may like about `happi` include + +- **AMY TO LIST ITEM 1** + +- **AMY TO LIST ITEM 2** + +If you are a **statistician**, things you may like about `happi` include + +- **AMY TO LIST ITEM 1** + +- **AMY TO LIST ITEM 2** + +### Where does `happi` fit into my workflow? If you're new to shotgun metagenomics we understand that things can feel overwhelming! On top of all the tools and names floating around you're probably wondering where does `happi` fit into the vast suite of bioinformatics tools for metagenomics data and how can you use it in your work? `happi` can be used *after* you have assembled, binned, annotated, and refined your genomes or metagenome-assembled genomes (MAGs) and as such it can be used with any bioinformatics workflow that conducts assembly, binning, annotation, and refinement. @@ -43,6 +60,28 @@ functions through the `R` interactive session. You can follow the vignettes by r ``` utils::browseVignettes(package = "happi") ``` + +The syntax to use the main functions from `happi` in `R` is shown in the following example, + +``` +happi_results <- happi(outcome = presence_vector, covariate=x_matrix, quality_var= quality_vector) +happi_results$summary +``` + +where `presence_vector` is a length-n vector indicating the presence/absence (coded as 0 or 1) of the target gene, `x_matrix` is a n x p design matrix for the predictors of interest, and `quality_vector` is a length-n vector indicating the quality of the genome. + +To use `happi`'s nonparametric permutation testing approach, we could run the `happi()` function above with the extra argument `run_npLRT = TRUE` or we can take our `happi` results object as an input to the function `happi::npLRT()` as shown below. + +``` +perm_test_result <- npLRT(happi_results, + change_threshold = 0.1, + spline_df = 3, + nstarts = 1, + epsilon = 0, + firth = T, + method = "splines") +``` + An example snakemake workflow of `happi`'s usage has been made available under the `workflows/` folder of this github directory. To run the example workflow you'll need to install snakemake. We recommend creating a conda environment with your snakemake installation: ``` @@ -68,13 +107,12 @@ The Snakefile is customizable for your own input data and parameters. Please ref ## How do I export data from anvi'o for use in `happi`? - ## Citation -If you use `happi` please cite our work: +If you use `happi` please cite our work: -An open-access preprint is available [here](https://www.biorxiv.org/content/10.1101/2022.04.26.489591v1.full). +Trinh P, Clausen DS, Willis AD. happi: a hierarchical approach to pangenomics inference. Genome Biology. 2023;24(1):214-214. [Available here](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-03040-6) -## Issues/Requests +## Issues/Requests If you have any issues using our software or further questions please submit an issue [here](https://github.com/statdivlab/happi/issues). diff --git a/README.md b/README.md index c288d2f..a05e39a 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,13 @@ - `happi`: a -**H**ierarchical **Ap**proach to **P**angenomics **I**nference +# HAPPI -## What is `happi`? + + +a **H**ierarchical **Ap**proach to **P**angenomics **I**nference + + `happi` is a method for modeling gene presence in pangenomics that leverages information about genome quality to improve inference. `happi` @@ -19,7 +22,20 @@ hypothesis testing if you work with fragmented genomes such as metagenome assembled genomes (MAGs). `happi` is currently distributed as an `R` package and can be installed using the instructions below. -## Where does `happi` fit into my workflow? +If you are a **microbial ecologist** or **bioinformatician**, some of +the things that you may like about `happi` include + +- **AMY TO LIST ITEM 1** + +- **AMY TO LIST ITEM 2** + +If you are a **statistician**, things you may like about `happi` include + +- **AMY TO LIST ITEM 1** + +- **AMY TO LIST ITEM 2** + +### Where does `happi` fit into my workflow? If you’re new to shotgun metagenomics we understand that things can feel overwhelming! On top of all the tools and names floating around you’re @@ -76,6 +92,30 @@ the vignettes by running the following code in `R`: utils::browseVignettes(package = "happi") +The syntax to use the main functions from `happi` in `R` is shown in the +following example, + + happi_results <- happi(outcome = presence_vector, covariate=x_matrix, quality_var= quality_vector) + happi_results$summary + +where `presence_vector` is a length-n vector indicating the +presence/absence (coded as 0 or 1) of the target gene, `x_matrix` is a n +x p design matrix for the predictors of interest, and `quality_vector` +is a length-n vector indicating the quality of the genome. + +To use `happi`’s nonparametric permutation testing approach, we could +run the `happi()` function above with the extra argument +`run_npLRT = TRUE` or we can take our `happi` results object as an input +to the function `happi::npLRT()` as shown below. + + perm_test_result <- npLRT(happi_results, + change_threshold = 0.1, + spline_df = 3, + nstarts = 1, + epsilon = 0, + firth = T, + method = "splines") + An example snakemake workflow of `happi`’s usage has been made available under the `workflows/` folder of this github directory. To run the example workflow you’ll need to install snakemake. We recommend creating @@ -106,8 +146,9 @@ Please refer to the sample data files that have been provided in If you use `happi` please cite our work: -An open-access preprint is available -[here](https://www.biorxiv.org/content/10.1101/2022.04.26.489591v1.full). +Trinh P, Clausen DS, Willis AD. happi: a hierarchical approach to +pangenomics inference. Genome Biology. 2023;24(1):214-214. [Available +here](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-03040-6) ## Issues/Requests