diff --git a/404.html b/404.html index 0e09ce1e..57ef8288 100644 --- a/404.html +++ b/404.html @@ -20,7 +20,7 @@
- +Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.1.
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.1.
vignettes/comparing_results.Rmd
+ Source: vignettes/comparing_results.Rmd
comparing_results.Rmd
vignettes/intro_vignette.Rmd
+ Source: vignettes/intro_vignette.Rmd
intro_vignette.Rmd
By default, cluster_enriched_terms()
performs
-hierarchical clustering of the terms (using \(1 - \kappa\) as the distance metric).
-Iterating over \(2,3,...n\) clusters
-(where \(n\) is the number of terms),
-cluster_enriched_terms()
determines the optimal number of
-clusters by maximizing the average silhouette width, partitions the data
-into this optimal number of clusters and returns a data frame with
-cluster assignments.
cluster_enriched_terms()
+determines the optimal number of clusters by maximizing the average
+silhouette width, partitions the data into this optimal number of
+clusters and returns a data frame with cluster assignments.
example_pathfindR_output_clustered <- cluster_enriched_terms(example_pathfindR_output, plot_dend = FALSE, plot_clusters_graph = FALSE)
@@ -1200,9 +1202,7 @@Analysis with Custom Gene Sets - -
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.1.
- + - - +vignettes/manual_execution.Rmd
+ Source: vignettes/manual_execution.Rmd
manual_execution.Rmd
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.1.
- + - - +vignettes/non_hs_analysis.Rmd
+ Source: vignettes/non_hs_analysis.Rmd
non_hs_analysis.Rmd
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.1.
- + - - +vignettes/obtain_data.Rmd
+ Source: vignettes/obtain_data.Rmd
obtain_data.Rmd
Site built with pkgdown 2.0.9.
+Site built with pkgdown 2.1.1.
- + - - +vignettes/visualization_vignette.Rmd
+ Source: vignettes/visualization_vignette.Rmd
visualization_vignette.Rmd
By default the node sizes are plotted proportional to the number of
genes a term contains (num_genes
). To adjust node sizes
-using the \(-log_{10}\)(lowest p
-values), set node_size = "p_val"
:
node_size = "p_val"
:
term_gene_graph(example_pathfindR_output, num_terms = 3, node_size = "p_val")
See ?term_gene_graph
for more details.
get_kegg_gsets()
which now returns KEGG IDs so that the user can convert the returned identifiers using a more appropriate tool (e.g. BioMart) should they wishcolor_kegg_pathway()
function using ggkegg
to create colored KEGG pathway ggplot objects (instead of using KEGGREST
to obtain the colored PNG files, which no longer works #169)visualize_hsa_KEGG
function to visualize_KEGG_diagram()
to reflect this is now able to handle KEGG pathway enrichment results from any organismvisualize_terms()
, visualize_term_interactions()
and visualize_KEGG_diagram()
functions so that they now return a list of ggplot objects (named by term ID)get_kegg_gsets()
function to also use ggkegg
for fetching genes per pathway datamagick
, KEGGgraph
and KEGGREST
+get_biogrid_pin()
function so that it can now determine the latest version and download/process it from BioGRID (via setting release = "latest"
, which is now the default behavior)UpSet_plot()
plot function regarding the interaction with ggupset
package that was discovered in a reverse dependency check for ggplot2 3.5.0
(#189)score_terms()
(#186)term_gene_graph()
+create_HTML_report()
so run_pathfindR()
once again generates HTML reportsdisable_parallel
argument in active_snw_enrichment_wrapper()
to be able to disable parallel runs via foreach
+forech
wasn’t loading pathfindR
(#164)create_HTML_report()
so run_pathfindR()
no longer generates a HTML reportdir_for_report
argument in the internal function create_HTML_report()
to fix test issues on CRANseedForRandom
argument in active_snw_search()
to ensure reproducibility. By default behavior, in run_pathfindR()
, a seed is set for each iteration to produce reproducible results (#108)run_pathfindR()
+run_pathfindR()
+run_pathfindR()
is now to run in a temporary directory. The user can still set output_dir
to run in a specified directory and also produce HTML reportshierarchical_term_clustering()
, update the sequence of number of clusters for which silhouette width is calculated for choosing the optimal number of clusters. This should speed up the function for cases with a large number of enriched termsreturn_pin_path()
where the PIN was not properly read (#157)input_processing()
so that an alias that is not already present is selectedscale_vals
) in color_kegg_pathway()
, the default is now scale_vals=TRUE
+term_gene_heatmap()
function so that legend title is shown and can be customizedterm_gene_heatmap()
function so that coloring is proper when no change values are provided in genes_df
+sort_terms_by_p
argument to the term_gene_heatmap()
function to enable sorting of terms by ‘lowest_p’vertex.label.cex
and vertex.size.scaling
arguments to cluster_graph_vis()
+show_legend
argument to visualize_term_interactions()
to toggle the legendcolor_kegg_pathway()
+color_kegg_pathway()
the default value for normalize_vals
is now FALSE
+get_kegg_gsets()
where empty result was returned for some organisms due to an error in parsing (#72)repel = TRUE
in term_gene_graph()
and combined_results_graph()
for better visualization of labelsenrichment_chart()
(#75)visualize_term_interactions()
+get_biogrid_pin()
where the download method was set to wget
(now set to auto
, per #83)get_biogrid_pin()
(if tab3 is available for the chosen release, otherwise tab2 format is used)get_biogrid_pin()
to ‘4.4.200’get_kegg_gsets()
, improved parsing of KEGG term descriptions so that no description is duplicated (#87)score_terms()
, if using descriptions, the ID is now appended for (any) duplicated term descriptions (#87)obtain_colored_url()
, swapped bg_color
with fg_color
due to an issue with KEGGREST
+term_gene_heatmap()
(#95)get_biogrid_pin()
, the “download.file.method” from global options is usedcombined_results_graph()
raises an error if there are no common terms in the combined data framerun_pathfindR()
, the default iterations
was set back to 10 (the default for all other v1.x)run_pathfindR()
, as “GR” (the default active subnetwork search method) provides nearly identical results in each iteration, the default iterations
is set to 1get_biogrid_pin()
as BioGRID updated the URL for downloadvisualize_term_interactions()
where the file name was too long, it was causing an error on Windows. Limited to 100 characters (#58)check_java_version()
where java version 14 could not be parsed (#49)combined_results_graph()
where gene nodes were not colored correctly (#55)pathfindR.data
for storing pathfindR datavisualize_active_subnetworks()
for visualizing graphs of active subnetworkscombine_pathfindR_results()
and combined_results_graph()
for comparison of 2 pathfindR results and term-gene graph of the combined results, respectivelyget_pin_file()
for obtaining organism-specific PIN data (only from BioGRID for now)get_gene_sets_list()
for obtaining organism-specific gene sets list from KEGG, Reactome and MSigDBterm_gene_heatmap()
to create heatmap visualizations of enriched terms and the involved input genes. Rows are enriched terms and columns are involved input genes. If genes_df
is provided, colors of the tiles indicate the change valuesUpSet_plot()
to create UpSet plots of enriched termscell_markers_gsets
and cell_markers_descriptions
+parallel::makeCluster()
in run_pathfindR()
(#45)download_kegg_png()
(#37, @rix133)RA_comparison_output
of pathfindR results on another RA-related dataset (GSE84074)visualize_hsa_KEGG()
, fixed the issue where >1 entrez ids were returned for a gene symbol (the first one is kept)visualize_hsa_KEGG()
, implemented a tryCatch to avoid any issues when KEGGREST::color.pathway.by.objects()
might fail (#28)visualize_hsa_KEGG()
, now limiting the number of genes passes onto KEGGREST::color.pathway.by.objects()
to < 60 (because the KEGG API now limits the number?)term_gene_heatmap()
(i.e. when genes_df
is not provided) to binary colored heatmap (by default, “green” and “red”, controlled by low
and high
) by up-/down- regulation statusget_pin_file()
and get_gene_sets_list()
and fixed a minor issue in the vignette (#46)create_kappa_matrix()
when chance
is 1, the metric is turned into 0class(.) == *
in cluster_graph_vis()
+max_to_plot
to visualize_hsa_KEGG()
and to run_pathfindR()
. This argument controls the number of pathways to be visualized (default is NULL, i.e. no filter). This was implemented not to slow down the runtime of run_pathfindR()
as downloading the png files is slow.enriched_ters.Rmd
+DESCRIPTION
was updatedannotate_pathway_DEGs()
, calculate_pw_scores()
, cluster_pathways()
, fuzzy_pw_clustering()
, hierarchical_pw_clustering()
, visualize_pw_interactions()
and visualize_pws()
were renamed to annotate_term_DEGs()
, score_terms()
, cluster_enriched_terms()
, fuzzy_term_clustering()
, hierarchical_term_clustering()
, visualize_term_interactions()
and visualize_terms()
respectivelyenriched_pathways.Rmd
was renamed to enriched_terms.Rmd
+term_gene_graph()
, which creates a graph of enriched terms - involved genesenrichment()
and enrichment_analyses()
to get enrichment results fasterfetch_gene_set()
for obtaining gene set data more easilymin_gset_size
, max_gset_size
in fetch_gene_set()
and run_pathfindR()
)gaCrossover
during active subnetwork search which controls the probability of a crossover in GA (default = 1, i.e. always perform crossover)testthat
+create_kappa_matrix()
)mmu_kegg_genes
& mmu_kegg_descriptions
: mmu KEGG gene sets datamyeloma_input
& myeloma_output
: example mmu input and output datasig_gene_thr
in subnetwork filtering via filterActiveSnws()
now serves the threshold proportion of significant genes in the active subnetwork. e.g., if there are 100 significant genes and sig_gene_thr = 0.03
, subnetwork that contain at least 3 (100 x 0.03) significant genes will be accepted for further analysispathview
dependency by implementing colored pathway diagram visualization function using KEGGREST
and KEGGgraph
+hierarchical_term_clustering()
, redefined the distance measure as 1 - kappa statistic
+cluster_graph_vis()
(during the calculations for additional node colors)cluster_graph_vis()
+active_snw_search()
, unnecessary warnings during active subnetwork search were removedenrichment_chart()
, supplying fuzzy clustered results no longer raises an errorinput_testing()
and input_processing()
to ensure that both the initial input data frame and the processed input data frame for active subnetwork search contain at least 2 genes (to fix the corner case encountered in issue #17)enrichment_chart()
, ensuring that bubble sizes displayed in the legend (proportional to # of DEGs) are integersenrichment_chart()
, added the arguments num_bubbles
(default is 4) to control number of bubbles displayed in the legend and even_breaks
(default is TRUE
) to indicate if even increments of breaks are requiredterm_gene_graph()
(create the igraph object as an undirected graph for better auto layout)visualize_term_interactions()
. The legend no longer displays “Non-input Active Snw. Genes” if they were not providedhuman_genes
in run_pathfindR()
and input_processing()
was renamed as convert2alias
+top_terms
to enrichment_chart()
, controlling the number top enriched terms to plot (default is 10)run_pathfindR
into individual functions: active_snw_search
, enrichment_analyses
, summarize_enrichment_results
, annotate_pathway_DEGs
, visualize_pws
.pathmap
as visualize_hsa_KEGG
, updated the function to produce different visualizations for inputs with binary change values (ordered) and no change values (the input_processing
function, assigns a change value of 100 to all).visualize_pw_interactions
, which creates PNG files visualizing the interactions (in the selected PIN) of genes involved in the given pathways.create_kappa_matrix
, hierarchical_pw_clustering
, fuzzy_pw_clustering
and cluster_pathways
.cluster_graph_vis
for visualizing graph diagrams of clustering results.score_quan_thr
and sig_gene_thr
for run_pathfindR
were not being utilized.run_pathfindR
, added message at the end of run, reporting the number enriched pathways.run_pathfindR
now creates a variable org_dir
that is the “path/to/original/working/directory”. org_dir
is used in multiple functions to return to the original working directory if anything fails. This changes the previous behavior where if a function stopped with an error the directory was changed to “..”, i.e. the parent directory. This change was adapted so that the user is returned to the original working directory if they supply a recursive output folder (output_dir
, e.g. “./ALL_RESULTS/RESULT_A”).input_processing
, added the argument human_genes
to only perform alias symbol conversion when human gene symbols are provided. - Updated the Rmd files used to create the report HTML filesGO-All
, all annotations in the GO database (BP+MF+CC)pathfindR - An R Package for Pathway Enrichment Analysis Utilizing Active Subnetworks
to reflect the new functionalities.plot_scores
, added the argument label_cases
to indicate whether or not to label the cases in the pathway scoring heatmap plot. Also added the argument case_control_titles
which allows the user to change the default “Case” and “Control” headers. Also added the arguments low
and high
used to change the low and high end colors of the scoring color gradient.plot_scores
, reversed the color gradient to match the coloring scheme used by pathview (i.e. red for positive values, green for negative values)parseActiveSnwSearch
, replaced score_thr
by score_quan_thr
. This was done so that the scoring filter for active subnetworks could be performed based on the distribution of the current active subnetworks and not using a constant empirical score value threshold.parseActiveSnwSearch
, increased sig_gene_thr
from 2 to 10 as we observed in most of the cases, this resulted in faster runs with comparable results.choose_clusters
, added the argument p_val_threshold
to be used as p value threshold for filtering the enriched pathways prior to clustering.run_pathfindR
. For this, the gene_sets
argument should be set to “Custom” and custom_genes
and custom_pathways
should be provided.calculate_pw_scores
where if there was one DEG, subsetting the experiment matrix failedcalculate_pw_scores
. If there is none, the pathway is skipped.calculate_pw_scores
, if cases
are provided, the pathways are reordered before plotting the heat map and returning the matrix according to their activity in cases
. This way, “up” pathways are grouped together, same for “down” pathways.calculate_pwd
, if a pathway has perfect overlap with other pathways, change the correlation value with 1 instead of NA.choose_clusters
, if result_df
has less than 3 pathways, do not perform clustering.run_pathfindR
checks whether the output directory (output_dir
) already exists and if it exists, now appends “(1)” to output_dir
and displays a warning message. This was implemented to prevent writing over existing results.run_pathfindR
, recursive creation for the output directory (output_dir
) is now supported.run_pathfindR
, if no pathways are found, the function returns an empty data frame instead of raising an error.Implemented the (per subject) pathway scoring function calculate_pw_scores
and the function to plot the heatmap of pathway scores per subject plot_scores
.
Added the auto
parameter to choose_clusters
. When auto == TRUE
(default), the function chooses the optimal number of clusters k
automatically, as the value which maximizes the average silhouette width. It then returns a data frame with the cluster assignments and the representative/member statuses of each pathway.
Added the Fold_Enrichment
column to the resulting data frame of enrichment
, and as a corollary to the resulting data frame of run_pathfindR
.
Added the option bubble
to plot a bubble chart displaying the enrichment results in run_pathfindR
using the helper function enrichment_chart
. To plot the bubble chart set bubble = TRUE
in run_pathfindR
or use enrichment_chart(your_result_df)
.
Add the parameter silent_option
to run_pathfindR
. When silent_option == TRUE
(default), the console outputs during active subnetwork search are printed to a file named “console_out.txt”. If silent_option == FALSE
, the output is printed on the screen. Default was set to TRUE
because multiple console outputs are simultaneously printed when running in parallel.
Added the list_active_snw_genes
parameter to run_pathfindR
. When list_active_snw_genes == TRUE
, the function adds the column non_DEG_Active_Snw_Genes
, which reports the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value.
Added the data RA_clustered
, which is the example output of the clustering workflow.
In the function, run_pathfindR
added the option to specify the argument output_dir
which specifies the directory to be created under the current working directory for storing the result HTML files. output_dir
is “pathfindR_Results” by default.
run_pathfindR
now checks whether the output directory (output_dir
) already exists and if it exists, stops and displays an error message. This was implemented to prevent writing over existing results.
genes_table.html
now contains a second table displaying the input gene symbols for which there were no interactions in the PIN.
gene_sets
option in run_pathfindR
to chose between different gene sets. Available gene sets are KEGG
, Reactome
, BioCarta
and Gene Ontology gene sets (GO-BP
, GO-CC
and GO-MF
)cluster_pathways
automatically recognizes the ID type and chooses the gene sets accordinglyinput_processing
+input_processing
, genes for which no interactions are found in the PIN are now removed before active subnetwork searchinput_processing
+run_pathfindR
returns to the user’s working directory.R/visualization.R
+ Source: R/visualization.R
UpSet_plot.Rd
A dataframe of pathfindR results that must contain the following columns:
Description of the enriched term (necessary if use_description = TRUE
)
the input data that was used with run_pathfindR
.
It must be a data frame with 3 columns:
Gene Symbol (Gene Symbol)
Change value, e.g. log(fold change) (optional)
The change values in this data frame are used to color the affected genes
Number of top enriched terms to use while creating the plot. Set to NULL
to use
all enriched terms (default = 10)
the option for producing the plot. Options include 'heatmap', 'boxplot' and 'barplot'. (default = 'heatmap')
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
a string indicating the color of 'low' values in the coloring gradient (default = 'green')
a string indicating the color of 'mid' values in the coloring gradient (default = 'black')
a string indicating the color of 'high' values in the coloring gradient (default = 'red')
additional arguments for input_processing
(used if
genes_df
is provided)
UpSet plots are plots of the intersections of sets as a matrix. This +
UpSet plots are plots of the intersections of sets as a matrix. This
function creates a ggplot object of an UpSet plot where the x-axis is the
UpSet plot of intersections of enriched terms. By default (i.e.
method = 'heatmap'
) the main plot is a heatmap of genes at the
@@ -192,16 +192,16 @@
R/utility.R
+ Source: R/utility.R
active_snw_enrichment_wrapper.Rd
processed input data frame
path/to/PIN/file
list for gene sets
adjusted-p value threshold used when filtering enrichment results (default = 0.05)
boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = FALSE
)
correction method to be used for adjusting p-values. (default = 'bonferroni')
algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = 'GR').
boolean to indicate whether to disable parallel runs
via foreach
(default = FALSE)
if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)
number of iterations for active subnetwork search and enrichment analyses (Default = 10)
optional argument for specifying the number of processes used by foreach. If not specified, the function determines this automatically (Default == NULL. Gets set to 1 for Genetic Algorithm)
active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)
threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2
Initial temperature for SA (default = 1.0)
Final temperature for SA (default = 0.01)
Iteration number for SA (default = 10000)
Population size for GA (default = 400)
Iteration number for GA (default = 200)
Number of threads to be used in GA (default = 5)
Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)
For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)
Sets max depth in greedy search, 0 for no limit (default = 1)
Search depth in greedy search (default = 1)
Overlap threshold for results of greedy search (default = 0.5)
Number of subnetworks to be presented in the results (default = 1000)
boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because @@ -239,9 +241,7 @@
Data frame of combined pathfindR enrichment results
+Data frame of combined pathfindR enrichment results
R/active_snw_search.R
+ Source: R/active_snw_search.R
active_snw_search.Rd
input the input data that active subnetwork search uses. The input must be a data frame containing at least these 2 columns:
Gene Symbol
Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
name for active subnetwork search output data without file extension (default = 'active_snws')
(previously created) directory for a parallel run iteration. Used in the wrapper function (see ?run_pathfindR) (Default = NULL)
active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)
threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2
algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = 'GR').
seed for reproducibility while running the java modules (applies for GR and SA)
boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because during parallel runs, the console messages get disorderly printed.
if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)
For SA and GA, probability of adding a gene in initial solution (default = 0.1)
Initial temperature for SA (default = 1.0)
Final temperature for SA (default = 0.01)
Iteration number for SA (default = 10000)
Population size for GA (default = 400)
Iteration number for GA (default = 200)
Number of threads to be used in GA (default = 5)
Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)
For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)
Sets max depth in greedy search, 0 for no limit (default = 1)
Search depth in greedy search (default = 1)
Overlap threshold for results of greedy search (default = 0.5)
Number of subnetworks to be presented in the results (default = 1000)
A list of genes in every identified active subnetwork that has a score greater than +
A list of genes in every identified active subnetwork that has a score greater than the `score_quan_thr`th quantile and that has at least `sig_gene_thr` affected genes.
R/utility.R
+ Source: R/utility.R
annotate_term_genes.Rd
data frame of enrichment results. The only must-have column is 'ID'.
input data processed via input_processing
List that contains genes for each gene set. Names of this list are gene set IDs (default = kegg_genes)
The original data frame with two additional columns:
The original data frame with two additional columns:
the up-regulated genes in the input involved in the given term's gene set, comma-separated
character vector containing the output of 'java -version'. If
NULL, result of fetch_java_version
is used (default = NULL)
only parses and checks whether the java version is >= 1.8
+only parses and checks whether the java version is >= 1.8
data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if use_description = TRUE
) or 'ID'
(if use_description = FALSE
), 'Down_regulated', and 'Up_regulated'.
@@ -103,29 +105,29 @@
Either 'hierarchical' or 'fuzzy'. Details of clustering are
provided in the corresponding functions hierarchical_term_clustering
,
and fuzzy_term_clustering
boolean value indicate whether or not to plot the graph diagram of clustering results (default = TRUE)
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
boolean to indicate whether or not to use non-input active subnetwork genes in the calculation of kappa statistics (default = FALSE, i.e. only use affected genes)
additional arguments for hierarchical_term_clustering
,
fuzzy_term_clustering
and cluster_graph_vis
.
See documentation of these functions for more details.
a data frame of clustering results. For 'hierarchical', the cluster +
a data frame of clustering results. For 'hierarchical', the cluster assignments (Cluster) and whether the term is representative of its cluster (Status) is added as columns. For 'fuzzy', terms that are in multiple clusters are provided for each cluster. The cluster assignments (Cluster) @@ -177,16 +177,16 @@
R/clustering.R
+ Source: R/clustering.R
cluster_graph_vis.Rd
clustering result (either a matrix obtained via
hierarchical_term_clustering
or fuzzy_term_clustering
`fuzzy_term_clustering` or a vector obtained via `hierarchical_term_clustering`)
matrix of kappa statistics (output of create_kappa_matrix
)
data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if use_description = TRUE
) or 'ID'
(if use_description = FALSE
), 'Down_regulated', and 'Up_regulated'.
@@ -114,29 +116,27 @@
threshold for kappa statistics, defining strong relation (default = 0.35)
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
font size for vertex labels; it is interpreted as a multiplication factor of some device-dependent base font size (default = 0.7)
scaling factor for the node size (default = 2.5)
Plots a graph diagram of clustering results. Each node is an enriched term +
Plots a graph diagram of clustering results. Each node is an enriched term from `enrichment_res`. Size of node corresponds to -log(lowest_p). Thickness of the edges between nodes correspond to the kappa statistic between the two terms. Color of each node corresponds to distinct clusters. For fuzzy @@ -145,9 +145,9 @@
if (FALSE) {
+ if (FALSE) { # \dontrun{
cluster_graph_vis(clu_obj, kappa_mat, enrichment_res)
-}
+} # }
R/visualization.R
+ Source: R/visualization.R
color_kegg_pathway.Rd
hsa KEGG pathway id (e.g. hsa05012)
vector of change values, names should be hsa KEGG gene ids
should change values be scaled? (default = TRUE
)
low, middle and high color values for coloring the pathway nodes
(default = NULL
). If node_cols=NULL
, the low, middle and high color
are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no
@@ -114,26 +116,24 @@
input_processing
), only one color ('#F38F18' if NULL) is used.the default position of legends ("none", "left", "right", "bottom", "top", "inside")
a ggplot object containing the colored KEGG pathway diagram visualization
+a ggplot object containing the colored KEGG pathway diagram visualization
R/comparison.R
+ Source: R/comparison.R
combine_pathfindR_results.Rd
Data frame of combined pathfindR enrichment results. Columns are:
Data frame of combined pathfindR enrichment results. Columns are:
ID of the enriched term
combined_results <- combine_pathfindR_results(example_pathfindR_output, example_comparison_output)
-#> You may run `combined_results_graph()` to create visualizations of combined term-gene graphs of selected terms
+#> You may run `combined_results_graph()` to create visualizations of combined term-gene graphs of selected terms
Data frame of combined pathfindR enrichment results
the vector of selected terms for creating the graph
(either IDs or term descriptions). If set to 'common'
, all of the
common terms are used. (default = 'common')
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
The type of layout to create (see ggraph
for details. Default = 'stress')
Argument to indicate whether to use number of significant genes ('num_genes') or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes')
a ggraph
object containing the combined term-gene graph.
+
a ggraph
object containing the combined term-gene graph.
Each node corresponds to an enriched term (orange if common, different shades of blue otherwise),
an up-regulated gene (green), a down-regulated gene (red) or
a conflicting (i.e. up in one analysis, down in the other or vice versa) gene
@@ -155,16 +155,16 @@
/path/to/output/dir
+/path/to/output/dir
R/utility.R
+ Source: R/utility.R
create_HTML_report.Rd
the input data that pathfindR uses. The input must be a data frame with three columns:
Gene Symbol (Gene Symbol)
Change value, e.g. log(fold change) (OPTIONAL)
processed input data frame
final pathfindR result data frame
directory to render the report in
R/clustering.R
+ Source: R/clustering.R
create_kappa_matrix.Rd
data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if use_description = TRUE
) or 'ID'
(if use_description = FALSE
), 'Down_regulated', and 'Up_regulated'.
@@ -100,12 +102,12 @@
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
boolean to indicate whether or not to use non-input active subnetwork genes in the calculation of kappa statistics (default = FALSE, i.e. only use affected genes)
a matrix of kappa statistics between each term in the +
a matrix of kappa statistics between each term in the enrichment results.
R/enrichment.R
+ Source: R/enrichment.R
enrichment.Rd
The set of gene symbols to be used for enrichment analysis. In the scope of this package, these are genes that were identified for an active subnetwork
List that contains genes for each gene set. Names of this list are gene set IDs (default = kegg_genes)
Vector that contains term descriptions for the gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)
correction method to be used for adjusting p-values. (default = 'bonferroni')
adjusted-p value threshold used when filtering enrichment results (default = 0.05)
vector of significant gene symbols. In the scope of this package, these are the input genes that were used for active subnetwork search
vector of background genes. In the scope of this package,
the background genes are taken as all genes in the PIN
(see enrichment_analyses
)
A data frame that contains enrichment results
+A data frame that contains enrichment results
R/enrichment.R
+ Source: R/enrichment.R
enrichment_analyses.Rd
a list of subnetwork genes (i.e., vectors of genes for each subnetwork)
vector of significant gene symbols. In the scope of this package, these are the input genes that were used for active subnetwork search
Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
List that contains genes for each gene set. Names of this list are gene set IDs (default = kegg_genes)
Vector that contains term descriptions for the gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)
correction method to be used for adjusting p-values. (default = 'bonferroni')
adjusted-p value threshold used when filtering enrichment results (default = 0.05)
boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = FALSE
)
a dataframe of combined enrichment results. Columns are:
a dataframe of combined enrichment results. Columns are:
ID of the enriched term
R/visualization.R
+ Source: R/visualization.R
enrichment_chart.Rd
a data frame that must contain the following columns:
Description of the enriched term
number of top terms (according to the 'lowest_p' column)
to plot (default = 10). If plot_by_cluster = TRUE
, selects the top
top_terms
terms per each cluster. Set top_terms = NULL
to plot
@@ -127,18 +129,18 @@
boolean value indicating whether or not to group the
enriched terms by cluster (works if result_df
contains a
'Cluster' column).
number of sizes displayed in the legend # genes
(Default = 4)
whether or not to set even breaks for the number of sizes
displayed in the legend # genes
. If TRUE
(default), sets
equal breaks and the number of displayed bubbles may be different than the
@@ -148,9 +150,7 @@
a ggplot2
object containing the bubble chart.
+
a ggplot2
object containing the bubble chart.
The x-axis corresponds to fold enrichment values while the y-axis indicates
the enriched terms. Size of the bubble indicates the number of significant
genes in the given enriched term. Color indicates the -log10(lowest-p) value.
@@ -177,16 +177,16 @@
Name of the gene sets to be used for enrichment analysis. Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All', 'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'. @@ -104,21 +106,21 @@
minimum number of genes a term must contain (default = 10)
maximum number of genes a term must contain (default = 300)
a list containing the genes involved in each custom term. Each element is a vector of gene symbols located in the given custom term. Names should correspond to the IDs of the custom terms.
A vector containing the descriptions for each custom term. Names of the vector should correspond to the IDs of the custom terms.
a list containing 2 elements
a list containing 2 elements
list of vectors of genes contained in each term
character vector containing the output of 'java -version'
+character vector containing the output of 'java -version'
R/active_snw_search.R
+ Source: R/active_snw_search.R
filterActiveSnws.Rd
path to the output of an Active Subnetwork Search
vector of significant gene symbols. In the scope of this package, these are the input genes that were used for active subnetwork search
active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)
threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number @@ -116,9 +118,7 @@
A list containing subnetworks
: a list of of genes in every
+
A list containing subnetworks
: a list of of genes in every
active subnetwork that has a score greater than the score_quan_thr
th
quantile and that contains at least sig_gene_thr
of significant genes
and scores
the score of each filtered active subnetwork
R/clustering.R
+ Source: R/clustering.R
fuzzy_term_clustering.Rd
matrix of kappa statistics (output of create_kappa_matrix
)
data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if use_description = TRUE
) or 'ID'
(if use_description = FALSE
), 'Down_regulated', and 'Up_regulated'.
@@ -105,21 +107,19 @@
threshold for kappa statistics, defining strong relation (default = 0.35)
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
a boolean matrix of cluster assignments. Each row corresponds to an +
a boolean matrix of cluster assignments. Each row corresponds to an enriched term, each column corresponds to a cluster.
if (FALSE) {
+ if (FALSE) { # \dontrun{
fuzzy_term_clustering(kappa_mat, enrichment_res)
fuzzy_term_clustering(kappa_mat, enrichment_res, kappa_threshold = 0.45)
-}
+} # }
R/data_generation.R
+ Source: R/data_generation.R
get_biogrid_pin.Rd
organism name. BioGRID naming requires underscores for spaces so 'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus' etc. See https://wiki.thebiogrid.org/doku.php/statistics for a full list of available organisms (default = 'Homo_sapiens')
the path of the file to save the PIN data. By default, the PIN data is saved in a temporary file
the requested BioGRID release (default = 'latest')
the path of the file in which the PIN data was saved. If +
the path of the file in which the PIN data was saved. If
path2pin
was not supplied by the user, the PIN data is saved in a
temporary file
R/data_generation.R
+ Source: R/data_generation.R
get_gene_sets_list.Rd
As of this version, either 'KEGG', 'Reactome' or 'MSigDB' (default = 'KEGG')
(Used for 'KEGG' only) KEGG organism code for the selected organism. For a full list of all available organisms, see https://www.genome.jp/kegg/catalog/org_list.html
(Used for 'MSigDB' only) species name, such as Homo sapiens, Mus musculus, etc.
See msigdbr_show_species
for all the species available in
the msigdbr package (default = 'Homo sapiens')
(Used for 'MSigDB' only) collection. i.e., H, C1, C2, C3, C4, C5, C6, C7.
(Used for 'MSigDB' only) sub-collection, such as CGP, MIR, BP, etc. (default = NULL, i.e. list all gene sets in collection)
A list containing 2 elements:
gene_sets - A list containing the genes involved in each gene set
A list containing 2 elements:
gene_sets - A list containing the genes involved in each gene set
descriptions - A named vector containing the descriptions for each gene set
. For 'KEGG' and 'MSigDB', it is possible to choose a specific organism. For a full list of all available KEGG organisms, see https://www.genome.jp/kegg/catalog/org_list.html. @@ -143,16 +143,16 @@
R/data_generation.R
+ Source: R/data_generation.R
get_kegg_gsets.Rd
KEGG organism code for the selected organism. For a full list of all available organisms, see https://www.genome.jp/kegg/catalog/org_list.html
list containing 2 elements:
gene_sets - A list containing KEGG IDs for the genes involved in each KEGG pathway
list containing 2 elements:
gene_sets - A list containing KEGG IDs for the genes involved in each KEGG pathway
descriptions - A named vector containing the descriptions for each KEGG pathway
R/data_generation.R
+ Source: R/data_generation.R
get_mgsigdb_gsets.Rd
species name, such as Homo sapiens, Mus musculus, etc.
See msigdbr_show_species
for all the species available in
the msigdbr package
collection. i.e., H, C1, C2, C3, C4, C5, C6, C7.
sub-collection, such as CGP, BP, etc. (default = NULL, i.e. list all gene sets in collection)
Retrieves the MSigDB gene sets and returns a list containing 2 elements:
gene_sets - A list containing the genes involved in each of the selected MSigDB gene sets
Retrieves the MSigDB gene sets and returns a list containing 2 elements:
gene_sets - A list containing the genes involved in each of the selected MSigDB gene sets
descriptions - A named vector containing the descriptions for each selected MSigDB gene set
R/data_generation.R
+ Source: R/data_generation.R
get_pin_file.Rd
As of this version, this function is implemented to get data from 'BioGRID' only. This argument (and this wrapper function) was implemented for future utility
organism name. BioGRID naming requires underscores for spaces so 'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus' etc. See https://wiki.thebiogrid.org/doku.php/statistics for a full list of available organisms (default = 'Homo_sapiens')
the path of the file to save the PIN data. By default, the PIN data is saved in a temporary file
additional arguments for get_biogrid_pin
the path of the file in which the PIN data was saved. If +
the path of the file in which the PIN data was saved. If
path2pin
was not supplied by the user, the PIN data is saved in a
temporary file
if (FALSE) {
+ if (FALSE) { # \dontrun{
pin_path <- get_pin_file()
-}
+} # }
R/data_generation.R
+ Source: R/data_generation.R
get_reactome_gsets.Rd
Gets the latest Reactome pathways gene sets in gmt format. Parses the +
Gets the latest Reactome pathways gene sets in gmt format. Parses the gmt file and returns a list containing 2 elements:
gene_sets - A list containing the genes involved in each Reactome pathway
descriptions - A named vector containing the descriptions for each Reactome pathway
R/data_generation.R
+ Source: R/data_generation.R
gset_list_from_gmt.Rd
list containing 2 elements:
gene_sets - A list containing the genes involved in each gene set
list containing 2 elements:
gene_sets - A list containing the genes involved in each gene set
descriptions - A named vector containing the descriptions for each gene set
R/clustering.R
+ Source: R/clustering.R
hierarchical_term_clustering.Rd
matrix of kappa statistics (output of create_kappa_matrix
)
data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if use_description = TRUE
) or 'ID'
(if use_description = FALSE
), 'Down_regulated', and 'Up_regulated'.
@@ -108,37 +110,35 @@
number of clusters to be formed (default = NULL
).
If NULL
, the optimal number of clusters is determined as the number
which yields the highest average silhouette width.
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
the agglomeration method to be used
(default = 'average', see hclust
)
boolean to indicate whether to plot the kappa statistics clustering heatmap or not (default = FALSE)
boolean to indicate whether to plot the clustering dendrogram partitioned into the optimal number of clusters (default = TRUE)
a vector of clusters for each enriched term in the enrichment results.
+a vector of clusters for each enriched term in the enrichment results.
if (FALSE) {
+ if (FALSE) { # \dontrun{
hierarchical_term_clustering(kappa_mat, enrichment_res)
hierarchical_term_clustering(kappa_mat, enrichment_res, method = 'complete')
-}
+} # }
R/enrichment.R
+ Source: R/enrichment.R
hyperg_test.Rd
the p-value as determined using the hypergeometric distribution.
+the p-value as determined using the hypergeometric distribution.
the input data that pathfindR uses. The input must be a data frame with three columns:
Gene Symbol (Gene Symbol)
Change value, e.g. log(fold change) (OPTIONAL)
the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)
Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
boolean to indicate whether or not to convert gene symbols in the input that are not found in the PIN to an alias symbol found in the PIN (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.
This function first filters the input so that all p values are less +
This function first filters the input so that all p values are less than or equal to the threshold. Next, gene symbols that are not found in the PIN are identified. If aliases of these gene symbols are found in the PIN, the symbols are converted to the corresponding aliases. The @@ -173,16 +173,16 @@
the input data that pathfindR uses. The input must be a data frame with three columns:
Gene Symbol (Gene Symbol)
Change value, e.g. log(fold change) (OPTIONAL)
the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)
Only checks if the input and the threshold follows the required +
Only checks if the input and the threshold follows the required specifications.
TRUE if x is a valid color, otherwise FALSE
+TRUE if x is a valid color, otherwise FALSE
R/pathfindr.R
+ Source: R/pathfindr.R
pathfindr.Rd
R/scoring.R
+ Source: R/scoring.R
plot_scores.Rd
Matrix of agglomerated enriched term scores per sample. Columns are samples, rows are enriched terms
(Optional) A vector of sample names that are cases in the case/control experiment. (default = NULL)
Boolean value to indicate whether or not to label the samples in the heatmap plot (default = TRUE)
Naming of the 'Case' group (as in cases
) (default = 'Case')
Naming of the 'Control' group (default = 'Control')
a string indicating the color of 'low' values in the coloring gradient (default = 'green')
a string indicating the color of 'mid' values in the coloring gradient (default = 'black')
a string indicating the color of 'high' values in the coloring gradient (default = 'red')
A `ggplot2` object containing the heatmap plot. x-axis indicates +
A `ggplot2` object containing the heatmap plot. x-axis indicates
the samples. y-axis indicates the enriched terms. 'Score' indicates the
score of the term in a given sample. If cases
are provided, the plot is
divided into 2 facets, named by case_title
and control_title
.
R/data_generation.R
+ Source: R/data_generation.R
process_pin.Rd
processed PIN data frame (removes self-interactions and +
processed PIN data frame (removes self-interactions and duplicated interactions)
R/utility.R
+ Source: R/utility.R
return_pin_path.Rd
The absolute path to chosen PIN.
+The absolute path to chosen PIN.
if (FALSE) {
+ if (FALSE) { # \dontrun{
pin_path <- return_pin_path('GeneMania')
-}
+} # }
R/core.R
+ Source: R/core.R
run_pathfindr.Rd
the input data that pathfindR uses. The input must be a data frame with three columns:
Gene Symbol (Gene Symbol)
Change value, e.g. log(fold change) (OPTIONAL)
Name of the gene sets to be used for enrichment analysis. Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All', 'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'. @@ -119,73 +121,71 @@
minimum number of genes a term must contain (default = 10)
maximum number of genes a term must contain (default = 300)
a list containing the genes involved in each custom term. Each element is a vector of gene symbols located in the given custom term. Names should correspond to the IDs of the custom terms.
A vector containing the descriptions for each custom term. Names of the vector should correspond to the IDs of the custom terms.
Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)
adjusted-p value threshold used when filtering enrichment results (default = 0.05)
boolean to indicate whether or not to convert gene symbols in the input that are not found in the PIN to an alias symbol found in the PIN (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.
boolean value. If TRUE, a bubble chart displaying the enrichment results is plotted. (default = TRUE)
the directory to be created where the output and intermediate
files are saved (default = NULL
, a temporary directory is used)
boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = FALSE
)
additional arguments for active_snw_enrichment_wrapper
Data frame of pathfindR enrichment results. Columns are:
Data frame of pathfindR enrichment results. Columns are:
ID of the enriched term
output_dir
/results.html' under the current working directory.
-
-
By default, a bubble chart of top 10 enrichment results are plotted. The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched terms. Sizes of the bubbles indicate the number of significant genes in the given terms. @@ -263,9 +261,9 @@
if (FALSE) {
+ if (FALSE) { # \dontrun{
run_pathfindR(example_pathfindR_input)
-}
+} # }
R/scoring.R
+ Source: R/scoring.R
score_terms.Rd
a data frame that must contain the 3 columns below:
Description of the enriched term (necessary if use_description = TRUE
)
the experiment (e.g., gene expression/methylation) matrix. Columns are samples and rows are genes. Column names must contain sample names and row names must contain the gene symbols.
(Optional) A vector of sample names that are cases in the case/control experiment. (default = NULL)
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
Boolean value to indicate whether or not to draw the heatmap plot of the scores. (default = TRUE)
Additional arguments for plot_scores
for aesthetics
of the heatmap plot
Matrix of agglomerated scores of each enriched term per sample. +
Matrix of agglomerated scores of each enriched term per sample. Columns are samples, rows are enriched terms. Optionally, displays a heatmap of this matrix.
For an experiment matrix (containing expression, methylation, etc. values), the rows of which are genes and the columns of which are samples, @@ -193,16 +193,16 @@
R/utility.R
+ Source: R/utility.R
single_iter_wrapper.Rd
current iteration index (default = NULL
)
vector of directories for parallel runs
processed input data frame
path/to/PIN/file
active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)
threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2
algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = 'GR').
boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because during parallel runs, the console messages get disorderly printed.
if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)
For SA and GA, probability of adding a gene in initial solution (default = 0.1)
Initial temperature for SA (default = 1.0)
Final temperature for SA (default = 0.01)
Iteration number for SA (default = 10000)
Population size for GA (default = 400)
Iteration number for GA (default = 200)
Number of threads to be used in GA (default = 5)
Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)
For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)
Sets max depth in greedy search, 0 for no limit (default = 1)
Search depth in greedy search (default = 1)
Overlap threshold for results of greedy search (default = 0.5)
Number of subnetworks to be presented in the results (default = 1000)
list for gene sets
correction method to be used for adjusting p-values. (default = 'bonferroni')
adjusted-p value threshold used when filtering enrichment results (default = 0.05)
boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = FALSE
)
Data frame of enrichment results using active subnetwork search results
+Data frame of enrichment results using active subnetwork search results
R/enrichment.R
+ Source: R/enrichment.R
summarize_enrichment_results.Rd
a dataframe of combined enrichment results. Columns are:
ID of the enriched term
boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = FALSE
)
a dataframe of summarized enrichment results (over multiple iterations). Columns are:
a dataframe of summarized enrichment results (over multiple iterations). Columns are:
ID of the enriched term
if (FALSE) {
+ if (FALSE) { # \dontrun{
summarize_enrichment_results(enrichment_res)
-}
+} # }
A dataframe of pathfindR results that must contain the following columns:
Description of the enriched term (necessary if use_description = TRUE
)
Number of top enriched terms to use while creating the graph. Set to NULL
to use
all enriched terms (default = 10, i.e. top 10 terms)
The type of layout to create (see ggraph
for details. Default = 'stress')
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
Argument to indicate whether to use number of significant genes ('num_genes') or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes')
vector of 3 colors to be used for coloring nodes (colors for term nodes, up, and down, respectively)
a ggraph
object containing the term-gene graph.
+
a ggraph
object containing the term-gene graph.
Each node corresponds to an enriched term (beige), an up-regulated gene (green)
or a down-regulated gene (red). An edge between a term and a gene indicates
that the given term involves the gene. Size of a term node is proportional
@@ -179,16 +179,16 @@
R/visualization.R
+ Source: R/visualization.R
term_gene_heatmap.Rd
A dataframe of pathfindR results that must contain the following columns:
Description of the enriched term (necessary if use_description = TRUE
)
the input data that was used with run_pathfindR
.
It must be a data frame with 3 columns:
Gene Symbol (Gene Symbol)
Change value, e.g. log(fold change) (optional)
The change values in this data frame are used to color the affected genes
Number of top enriched terms to use while creating the plot. Set to NULL
to use
all enriched terms (default = 10)
Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = FALSE
)
a string indicating the color of 'low' values in the coloring gradient (default = 'green')
a string indicating the color of 'mid' values in the coloring gradient (default = 'black')
a string indicating the color of 'high' values in the coloring gradient (default = 'red')
legend title (default = 'change')
boolean to indicate whether to sort terms by 'lowest_p'
(TRUE
) or by number of genes (FALSE
) (default = FALSE
)
additional arguments for input_processing
(used if
genes_df
is provided)
a ggplot2 object of a heatmap where rows are enriched terms and +
a ggplot2 object of a heatmap where rows are enriched terms and
columns are involved input genes. If genes_df
is provided, colors of
the tiles indicate the change values.
R/visualization.R
+ Source: R/visualization.R
visualize_KEGG_diagram.Rd
KEGG ids of pathways to be colored and visualized
input data processed via input_processing
should change values be scaled? (default = TRUE
)
low, middle and high color values for coloring the pathway nodes
(default = NULL
). If node_cols=NULL
, the low, middle and high color
are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no
@@ -114,16 +116,14 @@
input_processing
), only one color ('#F38F18' if NULL) is used.the default position of legends ("none", "left", "right", "bottom", "top", "inside")
Creates colored visualizations of the enriched human KEGG pathways +
Creates colored visualizations of the enriched human KEGG pathways and returns them as a list of ggplot objects, named by Term ID.
if (FALSE) {
+ if (FALSE) { # \dontrun{
input_processed <- data.frame(
GENE = c("PKLR", "GPI", "CREB1", "INS"),
CHANGE = c(1.5, -2, 3, 5)
)
gg_list <- visualize_KEGG_diagram(c("hsa00010", "hsa04911"), input_processed)
-}
+} # }
R/active_snw_search.R
+ Source: R/active_snw_search.R
visualize_active_subnetworks.Rd
path to the output of an Active Subnetwork Search
the input data that was used with run_pathfindR
.
It must be a data frame with 3 columns:
Gene Symbol (Gene Symbol)
Change value, e.g. log(fold change) (optional)
The change values in this data frame are used to color the affected genes
Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
number of top subnetworks to be visualized (leave blank if you want to visualize all subnetworks)
The type of layout to create (see ggraph
for details. Default = 'stress')
active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)
threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2
additional arguments for input_processing
a list of ggplot objects of graph visualizations of identified active +
a list of ggplot objects of graph visualizations of identified active subnetworks. Green nodes are down-regulated genes, reds are up-regulated genes and yellows are non-input genes
R/visualization.R
+ Source: R/visualization.R
visualize_term_interactions.Rd
Data frame of enrichment results. Must-have columns are: 'Term_Description', 'Up_regulated' and 'Down_regulated'
Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
Boolean to indicate whether to display the legend (TRUE
)
or not (FALSE
) (default: TRUE
)
list of ggplot objects (named by Term ID) visualizing the interactions of genes involved +
list of ggplot objects (named by Term ID) visualizing the interactions of genes involved
in the given enriched terms (annotated in the result_df
) in the PIN used
for enrichment analysis (specified by pin_name_path
).
if (FALSE) {
+ if (FALSE) { # \dontrun{
result_df <- example_pathfindR_output[1:2, ]
gg_list <- visualize_term_interactions(result_df, pin_name_path = 'IntAct')
-}
+} # }
R/visualization.R
+ Source: R/visualization.R
visualize_terms.Rd
Data frame of enrichment results. Must-have columns for
KEGG human pathway diagrams (is_KEGG_result = TRUE
) are: 'ID' and 'Term_Description'.
Must-have columns for the rest are: 'Term_Description', 'Up_regulated' and
'Down_regulated'
input data processed via input_processing
,
not necessary when is_KEGG_result = FALSE
boolean to indicate whether KEGG gene sets were used for
enrichment analysis or not (default = TRUE
)
Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
additional arguments for visualize_KEGG_diagram
(used
when is_KEGG_result = TRUE
) or visualize_term_interactions
(used when is_KEGG_result = FALSE
)
Depending on the argument is_KEGG_result
, creates visualization of
- interactions of genes involved in the list of enriched terms in
result_df
. Returns a list of ggplot objects named by Term ID.
Depending on the argument is_KEGG_result
, creates visualization of
+ interactions of genes involved in the list of enriched terms in
+ result_df
. Returns a list of ggplot objects named by Term ID.
if (FALSE) {
+ if (FALSE) { # \dontrun{
input_processed <- data.frame(
GENE = c("PARP1", "NDUFA1", "STX6", "SNAP23"),
CHANGE = c(1.5, -2, 3, 5)
@@ -159,7 +158,7 @@ Examples
gg_list <- visualize_terms(result_df, input_processed)
gg_list2 <- visualize_terms(result_df, is_KEGG_result = FALSE, pin_name_path = 'IntAct')
-}
+} # }