From a2b9329d86f43c8305e5b9f2e4cccd29c4270d81 Mon Sep 17 00:00:00 2001 From: egeulgen Date: Mon, 14 Oct 2024 09:57:54 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20egeulgen?= =?UTF-8?q?/pathfindR@5b647763b2774ee5cb7e0487ca68c6586854df7b=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 404.html | 18 +- CODE_OF_CONDUCT.html | 18 +- CONTRIBUTING.html | 18 +- LICENSE-text.html | 18 +- LICENSE.html | 18 +- articles/comparing_results.html | 21 +- .../figure-html/compare_graph1-1.png | Bin 409874 -> 410074 bytes articles/index.html | 18 +- articles/intro_vignette.html | 44 +- articles/manual_execution.html | 27 +- articles/non_hs_analysis.html | 27 +- articles/obtain_data.html | 27 +- articles/visualization_vignette.html | 32 +- .../figure-html/hmap-1.png | Bin 96700 -> 96801 bytes .../figure-html/term_gene1-1.png | Bin 226972 -> 226823 bytes .../figure-html/upset1-1.png | Bin 180025 -> 180024 bytes authors.html | 22 +- index.html | 18 +- news/index.html | 422 ++++++++++++++++-- pkgdown.yml | 5 +- reference/UpSet_plot-1.png | Bin 185162 -> 186347 bytes reference/UpSet_plot.html | 44 +- reference/active_snw_enrichment_wrapper.html | 78 ++-- reference/active_snw_search.html | 72 +-- reference/annotate_term_genes.html | 32 +- reference/check_java_version.html | 28 +- reference/cluster_enriched_terms-1.png | Bin 29772 -> 29706 bytes reference/cluster_enriched_terms.html | 38 +- reference/cluster_graph_vis.html | 44 +- reference/color_kegg_pathway.html | 40 +- reference/combine_pathfindR_results-1.png | Bin 413934 -> 411367 bytes reference/combine_pathfindR_results.html | 34 +- reference/combined_results_graph.html | 36 +- reference/configure_output_dir.html | 28 +- reference/create_HTML_report.html | 30 +- reference/create_kappa_matrix.html | 32 +- reference/enrichment.html | 40 +- reference/enrichment_analyses.html | 42 +- reference/enrichment_chart.html | 36 +- reference/fetch_gene_set.html | 36 +- reference/fetch_java_version.html | 24 +- reference/filterActiveSnws.html | 34 +- reference/fuzzy_term_clustering.html | 38 +- reference/get_biogrid_pin.html | 32 +- reference/get_gene_sets_list.html | 36 +- reference/get_kegg_gsets.html | 28 +- reference/get_mgsigdb_gsets.html | 32 +- reference/get_pin_file.html | 38 +- reference/get_reactome_gsets.html | 24 +- reference/gset_list_from_gmt.html | 30 +- reference/hierarchical_term_clustering.html | 44 +- reference/hyperg_test.html | 32 +- reference/index.html | 20 +- reference/input_processing.html | 34 +- reference/input_testing.html | 30 +- reference/isColor.html | 28 +- reference/pathfindR-package.html | 8 + reference/pathfindr.html | 20 +- reference/plot_scores.html | 42 +- reference/process_pin.html | 28 +- reference/return_pin_path.html | 32 +- reference/run_pathfindr.html | 60 ++- reference/score_terms.html | 40 +- reference/single_iter_wrapper.html | 78 ++-- reference/summarize_enrichment_results.html | 34 +- reference/term_gene_graph.html | 38 +- reference/term_gene_heatmap-1.png | Bin 53421 -> 53971 bytes reference/term_gene_heatmap.html | 46 +- reference/visualize_KEGG_diagram.html | 40 +- reference/visualize_active_subnetworks.html | 42 +- reference/visualize_term_interactions.html | 36 +- reference/visualize_terms.html | 45 +- sitemap.xml | 288 +++--------- 73 files changed, 1512 insertions(+), 1312 deletions(-) create mode 100644 reference/pathfindR-package.html diff --git a/404.html b/404.html index 0e09ce1e..57ef8288 100644 --- a/404.html +++ b/404.html @@ -20,7 +20,7 @@ - +
- +
@@ -95,7 +95,7 @@ - +
@@ -123,17 +123,17 @@

Page not found (404)

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -133,16 +133,16 @@

Attribution -

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -188,16 +188,16 @@

Attribution -

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -95,16 +95,16 @@

License

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -99,16 +99,16 @@

MIT License

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - - +
- +
@@ -96,7 +95,7 @@ - +
@@ -104,7 +103,7 @@

Comparing Two pathfindR Results

- Source: vignettes/comparing_results.Rmd + Source: vignettes/comparing_results.Rmd
@@ -182,17 +181,17 @@

Comparing Two pathfindR Results

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -102,16 +102,16 @@

All vignettes

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - - +
- +
@@ -96,7 +95,7 @@ - +
@@ -104,9 +103,9 @@

Introduction to pathfindR

Ege Ulgen

-

2024-05-04

+

2024-10-14

- Source: vignettes/intro_vignette.Rmd + Source: vignettes/intro_vignette.Rmd
@@ -587,13 +586,16 @@

Clustering Enriched TermsHierarchical Clustering

By default, cluster_enriched_terms() performs -hierarchical clustering of the terms (using \(1 - \kappa\) as the distance metric). -Iterating over \(2,3,...n\) clusters -(where \(n\) is the number of terms), -cluster_enriched_terms() determines the optimal number of -clusters by maximizing the average silhouette width, partitions the data -into this optimal number of clusters and returns a data frame with -cluster assignments.

+hierarchical clustering of the terms (using +1κ1 - \kappa +as the distance metric). Iterating over +2,3,...n2,3,...n +clusters (where +nn +is the number of terms), cluster_enriched_terms() +determines the optimal number of clusters by maximizing the average +silhouette width, partitions the data into this optimal number of +clusters and returns a data frame with cluster assignments.

 example_pathfindR_output_clustered <- cluster_enriched_terms(example_pathfindR_output, plot_dend = FALSE, plot_clusters_graph = FALSE)
@@ -1200,9 +1202,7 @@ 

Analysis with Custom Gene Sets - -

+
@@ -1215,17 +1215,17 @@

Analysis with Custom Gene Sets

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - - +
- +
@@ -96,7 +95,7 @@ - +
@@ -105,9 +104,9 @@

Step-by-Step Execution of the pathfindR Enrichment Workflow

Ege Ulgen

-

2024-05-04

+

2024-10-14

- Source: vignettes/manual_execution.Rmd + Source: vignettes/manual_execution.Rmd
@@ -313,9 +312,7 @@

Visualizations - -

+ @@ -328,17 +325,17 @@

Visualizations

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - - +
- +
@@ -96,7 +95,7 @@ - +
@@ -105,9 +104,9 @@

pathfindR Analysis for non-Homo-sapiens organisms

Ege Ulgen

-

2024-05-04

+

2024-10-14

- Source: vignettes/non_hs_analysis.Rmd + Source: vignettes/non_hs_analysis.Rmd
@@ -844,9 +843,7 @@

Built-in Mus musculus Data - -

+ @@ -859,17 +856,17 @@

Built-in Mus musculus Data

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - - +
- +
@@ -96,16 +95,16 @@ - +
@@ -230,9 +229,7 @@

MSigDB Gene Sets - -

+
@@ -245,17 +242,17 @@

MSigDB Gene Sets

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - - +
- +
@@ -96,7 +95,7 @@ - +
@@ -104,9 +103,9 @@

Visualization of pathfindR Enrichment Results

-

2024-05-04

+

2024-10-14

- Source: vignettes/visualization_vignette.Rmd + Source: vignettes/visualization_vignette.Rmd
@@ -253,8 +252,9 @@

By default the node sizes are plotted proportional to the number of genes a term contains (num_genes). To adjust node sizes -using the \(-log_{10}\)(lowest p -values), set node_size = "p_val":

+using the +log10-log_{10}(lowest +p values), set node_size = "p_val":

 term_gene_graph(example_pathfindR_output, num_terms = 3, node_size = "p_val")

See ?term_gene_graph for more details.

@@ -300,9 +300,7 @@

+

@@ -315,17 +313,17 @@

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -77,7 +77,7 @@

Authors and Citation

- +
  • Ege Ulgen. Maintainer, copyright holder.

    @@ -90,7 +90,7 @@

    Authors and Citation

    Citation

    - Source: inst/CITATION + Source: inst/CITATION
    @@ -117,16 +117,16 @@

    Citation

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + Changelog • pathfindR - -
+ +
- +
-
+ + +
+ +
+
+ +
+

Minor Changes and Bug Fixes

+
  • fixed a bug regarding KEGG gene set fetching: removed the conversion functionality in get_kegg_gsets() which now returns KEGG IDs so that the user can convert the returned identifiers using a more appropriate tool (e.g. BioMart) should they wish
  • +
+
+
+ +
+

Major Changes

+
  • implemented a new color_kegg_pathway() function using ggkegg to create colored KEGG pathway ggplot objects (instead of using KEGGREST to obtain the colored PNG files, which no longer works #169)
  • +
  • renamed the visualize_hsa_KEGG function to visualize_KEGG_diagram() to reflect this is now able to handle KEGG pathway enrichment results from any organism
  • +
  • updated the visualize_terms(), visualize_term_interactions() and visualize_KEGG_diagram() functions so that they now return a list of ggplot objects (named by term ID)
  • +
  • updated the get_kegg_gsets() function to also use ggkegg for fetching genes per pathway data
  • +
  • removed unneeded dependencies: magick, KEGGgraph and KEGGREST +
  • +
+
+

Minor Changes and Bug Fixes

+
  • updated the get_biogrid_pin() function so that it can now determine the latest version and download/process it from BioGRID (via setting release = "latest", which is now the default behavior)
  • +
+
+
+ +
+

Minor Changes and Bug Fixes

+
  • fixed a bug in the UpSet_plot() plot function regarding the interaction with ggupset package that was discovered in a reverse dependency check for ggplot2 3.5.0 (#189)
  • +
  • fixed gene symbol case mismatch issue in score_terms() (#186)
  • +
  • applied enhancement suggestion from #184 to enable scale fill manual for term_gene_graph() +
  • +
+
+
+ +
+

Major Changes

+
+
+
+ +
+

Minor Changes and Bug Fixes

+
  • added the disable_parallel argument in active_snw_enrichment_wrapper() to be able to disable parallel runs via foreach +
  • +
  • fixed the issue encountered on CentOS where forech wasn’t loading pathfindR (#164)
  • +
  • fixed a CRAN error due to a package documentation issue (#172)
  • +
  • performed some refactoring and updated/improved all tests
  • +
+
+
+ +
+

Minor Changes and Bug Fixes

+
+
+
+ +
+

Minor Changes and Bug Fixes

+
  • added the dir_for_report argument in the internal function create_HTML_report() to fix test issues on CRAN
  • +
+
+
+ +
+

Major Changes

+
  • updated the java active subnetwork search component and added the seedForRandom argument in active_snw_search()to ensure reproducibility. By default behavior, in run_pathfindR(), a seed is set for each iteration to produce reproducible results (#108)
  • +
  • as the example input/output data were renamed for convenience in ‘pathfindR.data’ v2.0, ‘pathfindR’ now depends on pathfindR.data (>= 2.0)
  • +
  • refactored/simplified run_pathfindR() +
  • +
  • visualization enriched term diagrams are now NOT part of run_pathfindR() +
  • +
  • default behavior of run_pathfindR() is now to run in a temporary directory. The user can still set output_dir to run in a specified directory and also produce HTML reports
  • +
  • in hierarchical_term_clustering(), update the sequence of number of clusters for which silhouette width is calculated for choosing the optimal number of clusters. This should speed up the function for cases with a large number of enriched terms
  • +
  • updated the relevant vignettes to reflect the implemented changes
  • +
+
+

Minor Changes and Bug Fixes

+
+
+
+ +
+

Minor Changes and Bug Fixes

+
  • updated the alias selection function within input_processing() so that an alias that is not already present is selected
  • +
  • updated the min-max scaling (controlled by scale_vals) in color_kegg_pathway(), the default is now scale_vals=TRUE +
  • +
  • updated the term_gene_heatmap() function so that legend title is shown and can be customized
  • +
  • updated the term_gene_heatmap() function so that coloring is proper when no change values are provided in genes_df +
  • +
  • added the sort_terms_by_p argument to the term_gene_heatmap() function to enable sorting of terms by ‘lowest_p’
  • +
  • in visualization functions, made coloring of up-/down-regulated genes consistent (#126)
  • +
  • added the vertex.label.cex and vertex.size.scaling arguments to cluster_graph_vis() +
  • +
  • added the show_legend argument to visualize_term_interactions() to toggle the legend
  • +
+
+
+ +
+

Minor Changes and Bug Fixes

+
+
+
+ +
+

Major Changes

+
  • fixed an issue in get_kegg_gsets() where empty result was returned for some organisms due to an error in parsing (#72)
  • +
+
+

Minor Changes and Bug Fixes

+
+
+
+ +
+

Major Changes

+
  • In run_pathfindR(), the default iterations was set back to 10 (the default for all other v1.x)
  • +
+
+ +
+

Major Changes

+
  • In run_pathfindR(), as “GR” (the default active subnetwork search method) provides nearly identical results in each iteration, the default iterations is set to 1
  • +
  • added the column ‘support’ (the proportion of active subnetworks leading to enrichment over all subnetworks) in the output
  • +
  • updated the download URL in get_biogrid_pin() as BioGRID updated the URL for download
  • +
+
+

Minor Changes and Bug Fixes

+
  • changed old argument in the “Step-by-Step Execution of the pathfindR Enrichment Workflow” vignette
  • +
  • fixed an issue in visualize_term_interactions() where the file name was too long, it was causing an error on Windows. Limited to 100 characters (#58)
  • +
+
+
+ +
+

Minor Changes and Bug Fixes

+
+
+
+ +
+

Major Changes

+
  • created separate package pathfindR.data for storing pathfindR data
  • +
  • added the function visualize_active_subnetworks() for visualizing graphs of active subnetworks
  • +
  • add the new vignette “Comparing Two pathfindR Results” that briefly describes how different pathfindR results can be compared
  • +
  • added the functions combine_pathfindR_results() and combined_results_graph() for comparison of 2 pathfindR results and term-gene graph of the combined results, respectively
  • +
  • added the function get_pin_file() for obtaining organism-specific PIN data (only from BioGRID for now)
  • +
  • added the function get_gene_sets_list() for obtaining organism-specific gene sets list from KEGG, Reactome and MSigDB
  • +
  • added the function term_gene_heatmap() to create heatmap visualizations of enriched terms and the involved input genes. Rows are enriched terms and columns are involved input genes. If genes_df is provided, colors of the tiles indicate the change values
  • +
  • added the function UpSet_plot() to create UpSet plots of enriched terms
  • +
  • added the human cell markers gene sets data cell_markers_gsets and cell_markers_descriptions +
  • +
+
+

Minor Changes and Bug Fixes

+
  • fixed an issue regarding parallel::makeCluster() in run_pathfindR() (#45)
  • +
  • fixed save-related issue in download_kegg_png() (#37, @rix133)
  • +
  • added the output data RA_comparison_output of pathfindR results on another RA-related dataset (GSE84074)
  • +
  • in visualize_hsa_KEGG(), fixed the issue where >1 entrez ids were returned for a gene symbol (the first one is kept)
  • +
  • in visualize_hsa_KEGG(), implemented a tryCatch to avoid any issues when KEGGREST::color.pathway.by.objects() might fail (#28)
  • +
  • in visualize_hsa_KEGG(), now limiting the number of genes passes onto KEGGREST::color.pathway.by.objects() to < 60 (because the KEGG API now limits the number?)
  • +
  • changed default visualization in term_gene_heatmap() (i.e. when genes_df is not provided) to binary colored heatmap (by default, “green” and “red”, controlled by low and high) by up-/down- regulation status
  • +
  • update the vignette “pathfindR Analysis for non-Homo-sapiens organisms” to reflect new data generation functions get_pin_file() and get_gene_sets_list() and fixed a minor issue in the vignette (#46)
  • +
+
+
+ +
+

Minor Changes and Bug Fixes

+
+
+
+ +
+

Major Changes

+
  • Fixed error in DESCRIPTION: the Java version in SystemRequirements was corrected to “Java (>= 8.0)”
  • +
  • The Java version is now checked
  • +
+
+

Minor Changes and Bug Fixes

+
  • Fixed behavior: when no input genes are present in the enriched hsa KEGG pathway, visualization of the pathway is now skipped
  • +
  • Added the argument max_to_plot to visualize_hsa_KEGG() and to run_pathfindR(). This argument controls the number of pathways to be visualized (default is NULL, i.e. no filter). This was implemented not to slow down the runtime of run_pathfindR() as downloading the png files is slow.
  • +
  • Fixed links to visualizations in enriched_ters.Rmd +
  • +
+
+
+ +
+

Major Changes

+
  • Replaced most occurrences of “pathway” to “term”. This was adapted because “term” reflects the utility of the package better. The enrichment and clustering approaches work with any kind of gene set data (be it pathway gene sets, gene ontology gene sets, motif gene sets etc.) Accordingly: +
  • +
  • Added the visualization function term_gene_graph(), which creates a graph of enriched terms - involved genes
  • +
  • Made changes in enrichment() and enrichment_analyses() to get enrichment results faster
  • +
  • Added the function fetch_gene_set() for obtaining gene set data more easily
  • +
  • Terms in gene sets can now be filtered according to the number of genes a term contains (controlled by min_gset_size, max_gset_size in fetch_gene_set() and run_pathfindR())
  • +
  • Added the argument gaCrossover during active subnetwork search which controls the probability of a crossover in GA (default = 1, i.e. always perform crossover)
  • +
  • Added unit tests using testthat +
  • +
  • Updated all gene sets data
  • +
  • Updated all RA example data
  • +
  • The vignettes were updated
  • +
  • Updated all PIN data
  • +
  • Improved speed of kappa matrix calculation (create_kappa_matrix())
  • +
  • Added vignette for non-Homo-sapiens organisms
  • +
  • Added Mus musculus (mmu) data: +
    • +mmu_kegg_genes & mmu_kegg_descriptions: mmu KEGG gene sets data
    • +
    • mmu STRING PIN
    • +
    • +myeloma_input & myeloma_output: example mmu input and output data
    • +
  • +
  • Added the STRING PIN (combined score >= 400)
  • +
  • The argument sig_gene_thr in subnetwork filtering via filterActiveSnws() now serves the threshold proportion of significant genes in the active subnetwork. e.g., if there are 100 significant genes and sig_gene_thr = 0.03, subnetwork that contain at least 3 (100 x 0.03) significant genes will be accepted for further analysis
  • +
  • Removed pathview dependency by implementing colored pathway diagram visualization function using KEGGREST and KEGGgraph +
  • +
+
+

Minor Changes and Bug Fixes

+
  • In hierarchical_term_clustering(), redefined the distance measure as 1 - kappa statistic +
  • +
  • Fixed minor issue in cluster_graph_vis() (during the calculations for additional node colors)
  • +
  • Removed title from graph visualization of hierarchical clustering in cluster_graph_vis() +
  • +
  • In active_snw_search(), unnecessary warnings during active subnetwork search were removed
  • +
  • Fixed minor issue in enrichment_chart(), supplying fuzzy clustered results no longer raises an error
  • +
  • Added new checks in input_testing() and input_processing() to ensure that both the initial input data frame and the processed input data frame for active subnetwork search contain at least 2 genes (to fix the corner case encountered in issue #17)
  • +
  • Fixed minor issue in enrichment_chart(), ensuring that bubble sizes displayed in the legend (proportional to # of DEGs) are integers
  • +
  • In enrichment_chart(), added the arguments num_bubbles (default is 4) to control number of bubbles displayed in the legend and even_breaks (default is TRUE) to indicate if even increments of breaks are required
  • +
  • Updated the logo
  • +
  • Minor fix in term_gene_graph() (create the igraph object as an undirected graph for better auto layout)
  • +
  • Minor fix in visualize_term_interactions(). The legend no longer displays “Non-input Active Snw. Genes” if they were not provided
  • +
  • The argument human_genes in run_pathfindR() and input_processing() was renamed as convert2alias +
  • +
  • The gene symbols in the input data frame, the PIN and the gene sets are now turned into uppercase (for obtaining the best overlap)
  • +
  • Added the argument top_terms to enrichment_chart(), controlling the number top enriched terms to plot (default is 10)
  • +
  • Other minor bug/error fixes
  • +
+
+
+ +
+

Major Changes

+
  • Separated the steps of the function run_pathfindR into individual functions: active_snw_search, enrichment_analyses, summarize_enrichment_results, annotate_pathway_DEGs, visualize_pws.
  • +
  • renamed the function pathmap as visualize_hsa_KEGG, updated the function to produce different visualizations for inputs with binary change values (ordered) and no change values (the input_processing function, assigns a change value of 100 to all).
  • +
  • Created new the visualization function visualize_pw_interactions, which creates PNG files visualizing the interactions (in the selected PIN) of genes involved in the given pathways.
  • +
  • Added new vignette, describing the step-by-step execution of the pathfindR workflow
  • +
  • Changed clustering metric to kappa statistic, created the new clustering related functions create_kappa_matrix, hierarchical_pw_clustering, fuzzy_pw_clustering and cluster_pathways.
  • +
  • Implemented the new function cluster_graph_vis for visualizing graph diagrams of clustering results.
  • +
+
+

Minor Changes and Bug Fixes

+
  • Fixed the bug where the arguments score_quan_thr and sig_gene_thr for run_pathfindR were not being utilized.
  • +
  • in run_pathfindR, added message at the end of run, reporting the number enriched pathways.
  • +
  • the function run_pathfindR now creates a variable org_dir that is the “path/to/original/working/directory”. org_dir is used in multiple functions to return to the original working directory if anything fails. This changes the previous behavior where if a function stopped with an error the directory was changed to “..”, i.e. the parent directory. This change was adapted so that the user is returned to the original working directory if they supply a recursive output folder (output_dir, e.g. “./ALL_RESULTS/RESULT_A”).
  • +
  • in input_processing, added the argument human_genes to only perform alias symbol conversion when human gene symbols are provided. - Updated the Rmd files used to create the report HTML files
  • +
  • Added the data for GO-All, all annotations in the GO database (BP+MF+CC)
  • +
  • Updated the vignette pathfindR - An R Package for Pathway Enrichment Analysis Utilizing Active Subnetworks to reflect the new functionalities.
  • +
+
+
+ +
+

Minor Changes and Bug Fixes

+
  • in the function plot_scores, added the argument label_cases to indicate whether or not to label the cases in the pathway scoring heatmap plot. Also added the argument case_control_titles which allows the user to change the default “Case” and “Control” headers. Also added the arguments low and high used to change the low and high end colors of the scoring color gradient.
  • +
  • in the function plot_scores, reversed the color gradient to match the coloring scheme used by pathview (i.e. red for positive values, green for negative values)
  • +
  • minor change in parseActiveSnwSearch, replaced score_thr by score_quan_thr. This was done so that the scoring filter for active subnetworks could be performed based on the distribution of the current active subnetworks and not using a constant empirical score value threshold.
  • +
  • minor change in parseActiveSnwSearch, increased sig_gene_thr from 2 to 10 as we observed in most of the cases, this resulted in faster runs with comparable results.
  • +
  • in choose_clusters, added the argument p_val_threshold to be used as p value threshold for filtering the enriched pathways prior to clustering.
  • +
+
+
+ +
+

Major Changes

+
  • fixed issue related to the package pathview. ## Minor Changes and Bug Fixes
  • +
  • in the function choose_clusters, added option to use pathway names instead of pathway ids when visualizing the clustering dendrogram and heatmap.
  • +
+
+
+ +
+

Major Changes

+
  • Added the option to specify a custom gene set when using run_pathfindR. For this, the gene_sets argument should be set to “Custom” and custom_genes and custom_pathways should be provided.
  • +
+
+

Minor Changes and Bug Fixes

+
  • fixed minor bug in calculate_pw_scores where if there was one DEG, subsetting the experiment matrix failed
  • +
  • added if condition to check if there were DEGs in calculate_pw_scores. If there is none, the pathway is skipped.
  • +
  • in calculate_pw_scores, if cases are provided, the pathways are reordered before plotting the heat map and returning the matrix according to their activity in cases. This way, “up” pathways are grouped together, same for “down” pathways.
  • +
  • in calculate_pwd, if a pathway has perfect overlap with other pathways, change the correlation value with 1 instead of NA.
  • +
  • in choose_clusters, if result_df has less than 3 pathways, do not perform clustering.
  • +
  • +run_pathfindR checks whether the output directory (output_dir) already exists and if it exists, now appends “(1)” to output_dir and displays a warning message. This was implemented to prevent writing over existing results.
  • +
  • in run run_pathfindR, recursive creation for the output directory (output_dir) is now supported.
  • +
  • in run run_pathfindR, if no pathways are found, the function returns an empty data frame instead of raising an error.
  • +
+
+
+ +
+

Major Changes

+
  • Implemented the (per subject) pathway scoring function calculate_pw_scores and the function to plot the heatmap of pathway scores per subject plot_scores.

  • +
  • Added the auto parameter to choose_clusters. When auto == TRUE (default), the function chooses the optimal number of clusters k automatically, as the value which maximizes the average silhouette width. It then returns a data frame with the cluster assignments and the representative/member statuses of each pathway.

  • +
  • Added the Fold_Enrichment column to the resulting data frame of enrichment, and as a corollary to the resulting data frame of run_pathfindR.

  • +
  • Added the option bubble to plot a bubble chart displaying the enrichment results in run_pathfindR using the helper function enrichment_chart. To plot the bubble chart set bubble = TRUE in run_pathfindR or use enrichment_chart(your_result_df).

  • +
+
+

Minor Changes and Bug Fixes

+
  • Add the parameter silent_option to run_pathfindR. When silent_option == TRUE (default), the console outputs during active subnetwork search are printed to a file named “console_out.txt”. If silent_option == FALSE, the output is printed on the screen. Default was set to TRUE because multiple console outputs are simultaneously printed when running in parallel.

  • +
  • Added the list_active_snw_genes parameter to run_pathfindR. When list_active_snw_genes == TRUE, the function adds the column non_DEG_Active_Snw_Genes, which reports the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value.

  • +
  • Added the data RA_clustered, which is the example output of the clustering workflow.

  • +
  • In the function, run_pathfindR added the option to specify the argument output_dir which specifies the directory to be created under the current working directory for storing the result HTML files. output_dir is “pathfindR_Results” by default.

  • +
  • run_pathfindR now checks whether the output directory (output_dir) already exists and if it exists, stops and displays an error message. This was implemented to prevent writing over existing results.

  • +
  • genes_table.html now contains a second table displaying the input gene symbols for which there were no interactions in the PIN.

  • +
+
+
+ +
+

Major changes

+
  • Added the gene_sets option in run_pathfindR to chose between different gene sets. Available gene sets are KEGG, Reactome, BioCarta and Gene Ontology gene sets (GO-BP, GO-CC and GO-MF)
  • +
  • +cluster_pathways automatically recognizes the ID type and chooses the gene sets accordingly
  • +
+
+

Minor Changes and Bug Fixes

+
  • Fixed issue regarding p values < 1e-13. No active subnetworks were found when there were p values < 1e-13. These are now changed to 1e-13 in the function input_processing +
  • +
  • In input_processing, genes for which no interactions are found in the PIN are now removed before active subnetwork search
  • +
  • Duplicated gene symbols no longer raise an error. If there are duplicated symbols, the lowest p value is chosen for each gene symbol in the function input_processing +
  • +
  • To prevent the formation of nested folders, by default and on errors, the function run_pathfindR returns to the user’s working directory.
  • +
  • Citation information are now provided for our BioRxiv pre-print
  • +
+
+
+ + +
-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -98,7 +98,9 @@

Create UpSet Plot of Enriched Terms

Arguments

-
result_df
+ + +
result_df

A dataframe of pathfindR results that must contain the following columns:

Term_Description

Description of the enriched term (necessary if use_description = TRUE)

@@ -118,7 +120,7 @@

Arguments

-
genes_df
+
genes_df

the input data that was used with run_pathfindR. It must be a data frame with 3 columns:

  1. Gene Symbol (Gene Symbol)

  2. Change value, e.g. log(fold change) (optional)

  3. @@ -126,43 +128,41 @@

    Arguments

The change values in this data frame are used to color the affected genes

-
num_terms
+
num_terms

Number of top enriched terms to use while creating the plot. Set to NULL to use all enriched terms (default = 10)

-
method
+
method

the option for producing the plot. Options include 'heatmap', 'boxplot' and 'barplot'. (default = 'heatmap')

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

-
low
+
low

a string indicating the color of 'low' values in the coloring gradient (default = 'green')

-
mid
+
mid

a string indicating the color of 'mid' values in the coloring gradient (default = 'black')

-
high
+
high

a string indicating the color of 'high' values in the coloring gradient (default = 'red')

-
...
+
...

additional arguments for input_processing (used if genes_df is provided)

Value

- - -

UpSet plots are plots of the intersections of sets as a matrix. This +

UpSet plots are plots of the intersections of sets as a matrix. This function creates a ggplot object of an UpSet plot where the x-axis is the UpSet plot of intersections of enriched terms. By default (i.e. method = 'heatmap') the main plot is a heatmap of genes at the @@ -192,16 +192,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -115,122 +115,124 @@

Wrapper for Active Subnetwork Search + Enrichment over Single/Multiple Itera

Arguments

-
input_processed
+ + +
input_processed

processed input data frame

-
pin_path
+
pin_path

path/to/PIN/file

-
gset_list
+
gset_list

list for gene sets

-
enrichment_threshold
+
enrichment_threshold

adjusted-p value threshold used when filtering enrichment results (default = 0.05)

-
list_active_snw_genes
+
list_active_snw_genes

boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = FALSE)

-
adj_method
+
adj_method

correction method to be used for adjusting p-values. (default = 'bonferroni')

-
search_method
+
search_method

algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = 'GR').

-
disable_parallel
+
disable_parallel

boolean to indicate whether to disable parallel runs via foreach (default = FALSE)

-
use_all_positives
+
use_all_positives

if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)

-
iterations
+
iterations

number of iterations for active subnetwork search and enrichment analyses (Default = 10)

-
n_processes
+
n_processes

optional argument for specifying the number of processes used by foreach. If not specified, the function determines this automatically (Default == NULL. Gets set to 1 for Genetic Algorithm)

-
score_quan_thr
+
score_quan_thr

active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)

-
sig_gene_thr
+
sig_gene_thr

threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2

-
saTemp0
+
saTemp0

Initial temperature for SA (default = 1.0)

-
saTemp1
+
saTemp1

Final temperature for SA (default = 0.01)

-
saIter
+
saIter

Iteration number for SA (default = 10000)

-
gaPop
+
gaPop

Population size for GA (default = 400)

-
gaIter
+
gaIter

Iteration number for GA (default = 200)

-
gaThread
+
gaThread

Number of threads to be used in GA (default = 5)

-
gaCrossover
+
gaCrossover

Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)

-
gaMut
+
gaMut

For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)

-
grMaxDepth
+
grMaxDepth

Sets max depth in greedy search, 0 for no limit (default = 1)

-
grSearchDepth
+
grSearchDepth

Search depth in greedy search (default = 1)

-
grOverlap
+
grOverlap

Overlap threshold for results of greedy search (default = 0.5)

-
grSubNum
+
grSubNum

Number of subnetworks to be presented in the results (default = 1000)

-
silent_option
+
silent_option

boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because @@ -239,9 +241,7 @@

Arguments

Value

- - -

Data frame of combined pathfindR enrichment results

+

Data frame of combined pathfindR enrichment results

@@ -256,16 +256,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -112,7 +112,9 @@

Perform Active Subnetwork Search

Arguments

-
input_for_search
+ + +

input the input data that active subnetwork search uses. The input must be a data frame containing at least these 2 columns:

GENE

Gene Symbol

@@ -124,113 +126,111 @@

Arguments

-
pin_name_path
+
pin_name_path

Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')

-
snws_file
+
snws_file

name for active subnetwork search output data without file extension (default = 'active_snws')

-
dir_for_parallel_run
+
dir_for_parallel_run

(previously created) directory for a parallel run iteration. Used in the wrapper function (see ?run_pathfindR) (Default = NULL)

-
score_quan_thr
+
score_quan_thr

active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)

-
sig_gene_thr
+
sig_gene_thr

threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2

-
search_method
+
search_method

algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = 'GR').

-
seedForRandom
+
seedForRandom

seed for reproducibility while running the java modules (applies for GR and SA)

-
silent_option
+
silent_option

boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because during parallel runs, the console messages get disorderly printed.

-
use_all_positives
+
use_all_positives

if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)

-
geneInitProbs
+
geneInitProbs

For SA and GA, probability of adding a gene in initial solution (default = 0.1)

-
saTemp0
+
saTemp0

Initial temperature for SA (default = 1.0)

-
saTemp1
+
saTemp1

Final temperature for SA (default = 0.01)

-
saIter
+
saIter

Iteration number for SA (default = 10000)

-
gaPop
+
gaPop

Population size for GA (default = 400)

-
gaIter
+
gaIter

Iteration number for GA (default = 200)

-
gaThread
+
gaThread

Number of threads to be used in GA (default = 5)

-
gaCrossover
+
gaCrossover

Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)

-
gaMut
+
gaMut

For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)

-
grMaxDepth
+
grMaxDepth

Sets max depth in greedy search, 0 for no limit (default = 1)

-
grSearchDepth
+
grSearchDepth

Search depth in greedy search (default = 1)

-
grOverlap
+
grOverlap

Overlap threshold for results of greedy search (default = 0.5)

-
grSubNum
+
grSubNum

Number of subnetworks to be presented in the results (default = 1000)

Value

- - -

A list of genes in every identified active subnetwork that has a score greater than +

A list of genes in every identified active subnetwork that has a score greater than the `score_quan_thr`th quantile and that has at least `sig_gene_thr` affected genes.

@@ -262,16 +262,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -92,25 +92,25 @@

Annotate the Affected Genes in the Provided Enriched Terms

Arguments

-
result_df
+ + +
result_df

data frame of enrichment results. The only must-have column is 'ID'.

-
input_processed
+
input_processed

input data processed via input_processing

-
genes_by_term
+
genes_by_term

List that contains genes for each gene set. Names of this list are gene set IDs (default = kegg_genes)

Value

- - -

The original data frame with two additional columns:

Up_regulated
+

The original data frame with two additional columns:

Up_regulated

the up-regulated genes in the input involved in the given term's gene set, comma-separated

Down_regulated
@@ -143,16 +143,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,16 +88,16 @@

Check Java Version

Arguments

-
version
+ + +
version

character vector containing the output of 'java -version'. If NULL, result of fetch_java_version is used (default = NULL)

Value

- - -

only parses and checks whether the java version is >= 1.8

+

only parses and checks whether the java version is >= 1.8

Details

@@ -116,16 +116,16 @@

Details

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -95,7 +95,9 @@

Cluster Enriched Terms

Arguments

-
enrichment_res
+ + +
enrichment_res

data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if use_description = TRUE) or 'ID' (if use_description = FALSE), 'Down_regulated', and 'Up_regulated'. @@ -103,29 +105,29 @@

Arguments

provided.

-
method
+
method

Either 'hierarchical' or 'fuzzy'. Details of clustering are provided in the corresponding functions hierarchical_term_clustering, and fuzzy_term_clustering

-
plot_clusters_graph
+
plot_clusters_graph

boolean value indicate whether or not to plot the graph diagram of clustering results (default = TRUE)

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

-
use_active_snw_genes
+
use_active_snw_genes

boolean to indicate whether or not to use non-input active subnetwork genes in the calculation of kappa statistics (default = FALSE, i.e. only use affected genes)

-
...
+
...

additional arguments for hierarchical_term_clustering, fuzzy_term_clustering and cluster_graph_vis. See documentation of these functions for more details.

@@ -133,9 +135,7 @@

Arguments

Value

- - -

a data frame of clustering results. For 'hierarchical', the cluster +

a data frame of clustering results. For 'hierarchical', the cluster assignments (Cluster) and whether the term is representative of its cluster (Status) is added as columns. For 'fuzzy', terms that are in multiple clusters are provided for each cluster. The cluster assignments (Cluster) @@ -177,16 +177,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -96,17 +96,19 @@

Graph Visualization of Clustered Enriched Terms

Arguments

-
clu_obj
+ + +
clu_obj

clustering result (either a matrix obtained via hierarchical_term_clustering or fuzzy_term_clustering `fuzzy_term_clustering` or a vector obtained via `hierarchical_term_clustering`)

-
kappa_mat
+
kappa_mat

matrix of kappa statistics (output of create_kappa_matrix)

-
enrichment_res
+
enrichment_res

data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if use_description = TRUE) or 'ID' (if use_description = FALSE), 'Down_regulated', and 'Up_regulated'. @@ -114,29 +116,27 @@

Arguments

provided.

-
kappa_threshold
+
kappa_threshold

threshold for kappa statistics, defining strong relation (default = 0.35)

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

-
vertex.label.cex
+
vertex.label.cex

font size for vertex labels; it is interpreted as a multiplication factor of some device-dependent base font size (default = 0.7)

-
vertex.size.scaling
+
vertex.size.scaling

scaling factor for the node size (default = 2.5)

Value

- - -

Plots a graph diagram of clustering results. Each node is an enriched term +

Plots a graph diagram of clustering results. Each node is an enriched term from `enrichment_res`. Size of node corresponds to -log(lowest_p). Thickness of the edges between nodes correspond to the kappa statistic between the two terms. Color of each node corresponds to distinct clusters. For fuzzy @@ -145,9 +145,9 @@

Value

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 cluster_graph_vis(clu_obj, kappa_mat, enrichment_res)
-}
+} # }
 
@@ -162,16 +162,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -94,19 +94,21 @@

Color hsa KEGG pathway

Arguments

-
pw_id
+ + +
pw_id

hsa KEGG pathway id (e.g. hsa05012)

-
change_vec
+
change_vec

vector of change values, names should be hsa KEGG gene ids

-
scale_vals
+
scale_vals

should change values be scaled? (default = TRUE)

-
node_cols
+
node_cols

low, middle and high color values for coloring the pathway nodes (default = NULL). If node_cols=NULL, the low, middle and high color are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no @@ -114,26 +116,24 @@

Arguments

input_processing), only one color ('#F38F18' if NULL) is used.

-
legend.position
+
legend.position

the default position of legends ("none", "left", "right", "bottom", "top", "inside")

Value

- - -

a ggplot object containing the colored KEGG pathway diagram visualization

+

a ggplot object containing the colored KEGG pathway diagram visualization

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 pw_id <- 'hsa00010'
 change_vec <- c(-2, 4, 6)
 names(change_vec) <- c('hsa:2821', 'hsa:226', 'hsa:229')
 result <- pathfindR:::color_kegg_pathway(pw_id, change_vec)
-}
+} # }
 
@@ -148,16 +148,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,24 +88,24 @@

Combine 2 pathfindR Results

Arguments

-
result_A
+ + +
result_A

data frame of first pathfindR enrichment results

-
result_B
+
result_B

data frame of second pathfindR enrichment results

-
plot_common
+
plot_common

boolean to indicate whether or not to plot the term-gene graph of the common terms (default=TRUE)

Value

- - -

Data frame of combined pathfindR enrichment results. Columns are:

ID
+

Data frame of combined pathfindR enrichment results. Columns are:

ID

ID of the enriched term

Term_Description
@@ -160,8 +160,8 @@

Value

Examples

combined_results <- combine_pathfindR_results(example_pathfindR_output, example_comparison_output)
-#> You may run `combined_results_graph()` to create visualizations of combined term-gene graphs of selected terms
 
+#> You may run `combined_results_graph()` to create visualizations of combined term-gene graphs of selected terms
 
@@ -176,16 +176,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -94,35 +94,35 @@

Combined Results Graph

Arguments

-
combined_df
+ + +
combined_df

Data frame of combined pathfindR enrichment results

-
selected_terms
+
selected_terms

the vector of selected terms for creating the graph (either IDs or term descriptions). If set to 'common', all of the common terms are used. (default = 'common')

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

-
layout
+
layout

The type of layout to create (see ggraph for details. Default = 'stress')

-
node_size
+
node_size

Argument to indicate whether to use number of significant genes ('num_genes') or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes')

Value

- - -

a ggraph object containing the combined term-gene graph. +

a ggraph object containing the combined term-gene graph. Each node corresponds to an enriched term (orange if common, different shades of blue otherwise), an up-regulated gene (green), a down-regulated gene (red) or a conflicting (i.e. up in one analysis, down in the other or vice versa) gene @@ -155,16 +155,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,16 +88,16 @@

Configure Output Directory Name

Arguments

-
output_dir
+ + +
output_dir

the directory to be created where the output and intermediate files are saved (default = NULL, a temporary directory is used)

Value

- - -

/path/to/output/dir

+

/path/to/output/dir

@@ -112,16 +112,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,7 +88,9 @@

Create HTML Report of pathfindR Results

Arguments

-
input
+ + +
input

the input data that pathfindR uses. The input must be a data frame with three columns:

  1. Gene Symbol (Gene Symbol)

  2. Change value, e.g. log(fold change) (OPTIONAL)

  3. @@ -96,15 +98,15 @@

    Arguments

-
input_processed
+
input_processed

processed input data frame

-
final_res
+
final_res

final pathfindR result data frame

-
dir_for_report
+
dir_for_report

directory to render the report in

@@ -121,16 +123,16 @@

Arguments

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -92,7 +92,9 @@

Create Kappa Statistics Matrix

Arguments

-
enrichment_res
+ + +
enrichment_res

data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if use_description = TRUE) or 'ID' (if use_description = FALSE), 'Down_regulated', and 'Up_regulated'. @@ -100,12 +102,12 @@

Arguments

provided.

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

-
use_active_snw_genes
+
use_active_snw_genes

boolean to indicate whether or not to use non-input active subnetwork genes in the calculation of kappa statistics (default = FALSE, i.e. only use affected genes)

@@ -113,9 +115,7 @@

Arguments

Value

- - -

a matrix of kappa statistics between each term in the +

a matrix of kappa statistics between each term in the enrichment results.

@@ -141,16 +141,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -96,38 +96,40 @@

Perform Enrichment Analysis for a Single Gene Set

Arguments

-
input_genes
+ + +
input_genes

The set of gene symbols to be used for enrichment analysis. In the scope of this package, these are genes that were identified for an active subnetwork

-
genes_by_term
+
genes_by_term

List that contains genes for each gene set. Names of this list are gene set IDs (default = kegg_genes)

-
term_descriptions
+
term_descriptions

Vector that contains term descriptions for the gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)

-
adj_method
+
adj_method

correction method to be used for adjusting p-values. (default = 'bonferroni')

-
enrichment_threshold
+
enrichment_threshold

adjusted-p value threshold used when filtering enrichment results (default = 0.05)

-
sig_genes_vec
+
sig_genes_vec

vector of significant gene symbols. In the scope of this package, these are the input genes that were used for active subnetwork search

-
background_genes
+
background_genes

vector of background genes. In the scope of this package, the background genes are taken as all genes in the PIN (see enrichment_analyses)

@@ -135,9 +137,7 @@

Arguments

Value

- - -

A data frame that contains enrichment results

+

A data frame that contains enrichment results

See also

@@ -174,16 +174,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -97,42 +97,44 @@

Perform Enrichment Analyses on the Input Subnetworks

Arguments

-
snws
+ + +
snws

a list of subnetwork genes (i.e., vectors of genes for each subnetwork)

-
sig_genes_vec
+
sig_genes_vec

vector of significant gene symbols. In the scope of this package, these are the input genes that were used for active subnetwork search

-
pin_name_path
+
pin_name_path

Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')

-
genes_by_term
+
genes_by_term

List that contains genes for each gene set. Names of this list are gene set IDs (default = kegg_genes)

-
term_descriptions
+
term_descriptions

Vector that contains term descriptions for the gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)

-
adj_method
+
adj_method

correction method to be used for adjusting p-values. (default = 'bonferroni')

-
enrichment_threshold
+
enrichment_threshold

adjusted-p value threshold used when filtering enrichment results (default = 0.05)

-
list_active_snw_genes
+
list_active_snw_genes

boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = FALSE)

@@ -140,9 +142,7 @@

Arguments

Value

- - -

a dataframe of combined enrichment results. Columns are:

ID
+

a dataframe of combined enrichment results. Columns are:

ID

ID of the enriched term

Term_Description
@@ -191,16 +191,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -96,7 +96,9 @@

Create Bubble Chart of Enrichment Results

Arguments

-
result_df
+ + +
result_df

a data frame that must contain the following columns:

Term_Description

Description of the enriched term

@@ -119,7 +121,7 @@

Arguments

-
top_terms
+
top_terms

number of top terms (according to the 'lowest_p' column) to plot (default = 10). If plot_by_cluster = TRUE, selects the top top_terms terms per each cluster. Set top_terms = NULL to plot @@ -127,18 +129,18 @@

Arguments

all terms are plotted.

-
plot_by_cluster
+
plot_by_cluster

boolean value indicating whether or not to group the enriched terms by cluster (works if result_df contains a 'Cluster' column).

-
num_bubbles
+
num_bubbles

number of sizes displayed in the legend # genes (Default = 4)

-
even_breaks
+
even_breaks

whether or not to set even breaks for the number of sizes displayed in the legend # genes. If TRUE (default), sets equal breaks and the number of displayed bubbles may be different than the @@ -148,9 +150,7 @@

Arguments

Value

- - -

a ggplot2 object containing the bubble chart. +

a ggplot2 object containing the bubble chart. The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched terms. Size of the bubble indicates the number of significant genes in the given enriched term. Color indicates the -log10(lowest-p) value. @@ -177,16 +177,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -96,7 +96,9 @@

Fetch Gene Set Objects

Arguments

-
gene_sets
+ + +
gene_sets

Name of the gene sets to be used for enrichment analysis. Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All', 'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'. @@ -104,21 +106,21 @@

Arguments

must be specified. (Default = 'KEGG')

-
min_gset_size
+
min_gset_size

minimum number of genes a term must contain (default = 10)

-
max_gset_size
+
max_gset_size

maximum number of genes a term must contain (default = 300)

-
custom_genes
+
custom_genes

a list containing the genes involved in each custom term. Each element is a vector of gene symbols located in the given custom term. Names should correspond to the IDs of the custom terms.

-
custom_descriptions
+
custom_descriptions

A vector containing the descriptions for each custom term. Names of the vector should correspond to the IDs of the custom terms.

@@ -126,9 +128,7 @@

Arguments

Value

- - -

a list containing 2 elements

genes_by_term
+

a list containing 2 elements

genes_by_term

list of vectors of genes contained in each term

term_descriptions
@@ -155,16 +155,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,9 +88,7 @@

Obtain Java Version

Value

- - -

character vector containing the output of 'java -version'

+

character vector containing the output of 'java -version'

Details

@@ -109,16 +107,16 @@

Details

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -93,21 +93,23 @@

Parse Active Subnetwork Search Output File and Filter the Subnetworks

Arguments

-
active_snw_path
+ + +
active_snw_path

path to the output of an Active Subnetwork Search

-
sig_genes_vec
+
sig_genes_vec

vector of significant gene symbols. In the scope of this package, these are the input genes that were used for active subnetwork search

-
score_quan_thr
+
score_quan_thr

active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)

-
sig_gene_thr
+
sig_gene_thr

threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number @@ -116,9 +118,7 @@

Arguments

Value

- - -

A list containing subnetworks: a list of of genes in every +

A list containing subnetworks: a list of of genes in every active subnetwork that has a score greater than the score_quan_thrth quantile and that contains at least sig_gene_thr of significant genes and scores the score of each filtered active subnetwork

@@ -153,16 +153,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -93,11 +93,13 @@

Heuristic Fuzzy Multiple-linkage Partitioning of Enriched Terms

Arguments

-
kappa_mat
+ + +
kappa_mat

matrix of kappa statistics (output of create_kappa_matrix)

-
enrichment_res
+
enrichment_res

data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if use_description = TRUE) or 'ID' (if use_description = FALSE), 'Down_regulated', and 'Up_regulated'. @@ -105,21 +107,19 @@

Arguments

provided.

-
kappa_threshold
+
kappa_threshold

threshold for kappa statistics, defining strong relation (default = 0.35)

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

Value

- - -

a boolean matrix of cluster assignments. Each row corresponds to an +

a boolean matrix of cluster assignments. Each row corresponds to an enriched term, each column corresponds to a cluster.

@@ -132,10 +132,10 @@

Details

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 fuzzy_term_clustering(kappa_mat, enrichment_res)
 fuzzy_term_clustering(kappa_mat, enrichment_res, kappa_threshold = 0.45)
-}
+} # }
 
@@ -150,16 +150,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,27 +88,27 @@

Retrieve the Requested Release of Organism-specific BioGRID PIN

Arguments

-
org
+ + +
org

organism name. BioGRID naming requires underscores for spaces so 'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus' etc. See https://wiki.thebiogrid.org/doku.php/statistics for a full list of available organisms (default = 'Homo_sapiens')

-
path2pin
+
path2pin

the path of the file to save the PIN data. By default, the PIN data is saved in a temporary file

-
release
+
release

the requested BioGRID release (default = 'latest')

Value

- - -

the path of the file in which the PIN data was saved. If +

the path of the file in which the PIN data was saved. If path2pin was not supplied by the user, the PIN data is saved in a temporary file

@@ -125,16 +125,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -94,35 +94,35 @@

Retrieve Organism-specific Gene Sets List

Arguments

-
source
+ + +
source

As of this version, either 'KEGG', 'Reactome' or 'MSigDB' (default = 'KEGG')

-
org_code
+
org_code

(Used for 'KEGG' only) KEGG organism code for the selected organism. For a full list of all available organisms, see https://www.genome.jp/kegg/catalog/org_list.html

-
species
+
species

(Used for 'MSigDB' only) species name, such as Homo sapiens, Mus musculus, etc. See msigdbr_show_species for all the species available in the msigdbr package (default = 'Homo sapiens')

-
collection
+
collection

(Used for 'MSigDB' only) collection. i.e., H, C1, C2, C3, C4, C5, C6, C7.

-
subcollection
+
subcollection

(Used for 'MSigDB' only) sub-collection, such as CGP, MIR, BP, etc. (default = NULL, i.e. list all gene sets in collection)

Value

- - -

A list containing 2 elements:

  • gene_sets - A list containing the genes involved in each gene set

  • +

    A list containing 2 elements:

    • gene_sets - A list containing the genes involved in each gene set

    • descriptions - A named vector containing the descriptions for each gene set

    . For 'KEGG' and 'MSigDB', it is possible to choose a specific organism. For a full list of all available KEGG organisms, see https://www.genome.jp/kegg/catalog/org_list.html. @@ -143,16 +143,16 @@

    Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,16 +88,16 @@

Retrieve Organism-specific KEGG Pathway Gene Sets

Arguments

-
org_code
+ + +
org_code

KEGG organism code for the selected organism. For a full list of all available organisms, see https://www.genome.jp/kegg/catalog/org_list.html

Value

- - -

list containing 2 elements:

  • gene_sets - A list containing KEGG IDs for the genes involved in each KEGG pathway

  • +

    list containing 2 elements:

    • gene_sets - A list containing KEGG IDs for the genes involved in each KEGG pathway

    • descriptions - A named vector containing the descriptions for each KEGG pathway

@@ -113,16 +113,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,26 +88,26 @@

Retrieve Organism-specific MSigDB Gene Sets

Arguments

-
species
+ + +
species

species name, such as Homo sapiens, Mus musculus, etc. See msigdbr_show_species for all the species available in the msigdbr package

-
collection
+
collection

collection. i.e., H, C1, C2, C3, C4, C5, C6, C7.

-
subcollection
+
subcollection

sub-collection, such as CGP, BP, etc. (default = NULL, i.e. list all gene sets in collection)

Value

- - -

Retrieves the MSigDB gene sets and returns a list containing 2 elements:

  • gene_sets - A list containing the genes involved in each of the selected MSigDB gene sets

  • +

    Retrieves the MSigDB gene sets and returns a list containing 2 elements:

    • gene_sets - A list containing the genes involved in each of the selected MSigDB gene sets

    • descriptions - A named vector containing the descriptions for each selected MSigDB gene set

@@ -133,16 +133,16 @@

Details

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,42 +88,42 @@

Retrieve Organism-specific PIN data

Arguments

-
source
+ + +
source

As of this version, this function is implemented to get data from 'BioGRID' only. This argument (and this wrapper function) was implemented for future utility

-
org
+
org

organism name. BioGRID naming requires underscores for spaces so 'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus' etc. See https://wiki.thebiogrid.org/doku.php/statistics for a full list of available organisms (default = 'Homo_sapiens')

-
path2pin
+
path2pin

the path of the file to save the PIN data. By default, the PIN data is saved in a temporary file

-
...
+
...

additional arguments for get_biogrid_pin

Value

- - -

the path of the file in which the PIN data was saved. If +

the path of the file in which the PIN data was saved. If path2pin was not supplied by the user, the PIN data is saved in a temporary file

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 pin_path <- get_pin_file()
-}
+} # }
 
@@ -138,16 +138,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,9 +88,7 @@

Retrieve Reactome Pathway Gene Sets

Value

- - -

Gets the latest Reactome pathways gene sets in gmt format. Parses the +

Gets the latest Reactome pathways gene sets in gmt format. Parses the gmt file and returns a list containing 2 elements:

  • gene_sets - A list containing the genes involved in each Reactome pathway

  • descriptions - A named vector containing the descriptions for each Reactome pathway

@@ -107,16 +105,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,19 +88,19 @@

Retrieve Gene Sets from GMT-format File

Arguments

-
path2gmt
+ + +
path2gmt

path to the gmt file

-
descriptions_idx
+
descriptions_idx

index for descriptions (default = 2)

Value

- - -

list containing 2 elements:

  • gene_sets - A list containing the genes involved in each gene set

  • +

    list containing 2 elements:

    • gene_sets - A list containing the genes involved in each gene set

    • descriptions - A named vector containing the descriptions for each gene set

@@ -116,16 +116,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -96,11 +96,13 @@

Hierarchical Clustering of Enriched Terms

Arguments

-
kappa_mat
+ + +
kappa_mat

matrix of kappa statistics (output of create_kappa_matrix)

-
enrichment_res
+
enrichment_res

data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if use_description = TRUE) or 'ID' (if use_description = FALSE), 'Down_regulated', and 'Up_regulated'. @@ -108,37 +110,35 @@

Arguments

provided.

-
num_clusters
+
num_clusters

number of clusters to be formed (default = NULL). If NULL, the optimal number of clusters is determined as the number which yields the highest average silhouette width.

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

-
clu_method
+
clu_method

the agglomeration method to be used (default = 'average', see hclust)

-
plot_hmap
+
plot_hmap

boolean to indicate whether to plot the kappa statistics clustering heatmap or not (default = FALSE)

-
plot_dend
+
plot_dend

boolean to indicate whether to plot the clustering dendrogram partitioned into the optimal number of clusters (default = TRUE)

Value

- - -

a vector of clusters for each enriched term in the enrichment results.

+

a vector of clusters for each enriched term in the enrichment results.

Details

@@ -153,10 +153,10 @@

Details

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 hierarchical_term_clustering(kappa_mat, enrichment_res)
 hierarchical_term_clustering(kappa_mat, enrichment_res, method = 'complete')
-}
+} # }
 
@@ -171,16 +171,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,24 +88,24 @@

Hypergeometric Distribution-based Hypothesis Testing

Arguments

-
term_genes
+ + +
term_genes

vector of genes in the selected term gene set

-
chosen_genes
+
chosen_genes

vector containing the set of input genes

-
background_genes
+
background_genes

vector of background genes (i.e. universal set of genes in the experiment)

Value

- - -

the p-value as determined using the hypergeometric distribution.

+

the p-value as determined using the hypergeometric distribution.

Details

@@ -137,16 +137,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + Package index • pathfindR - +
- +
@@ -305,16 +305,16 @@

Misc. functions
-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -93,7 +93,9 @@

Process Input

Arguments

-
input
+ + +
input

the input data that pathfindR uses. The input must be a data frame with three columns:

  1. Gene Symbol (Gene Symbol)

  2. Change value, e.g. log(fold change) (OPTIONAL)

  3. @@ -101,18 +103,18 @@

    Arguments

-
p_val_threshold
+
p_val_threshold

the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)

-
pin_name_path
+
pin_name_path

Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')

-
convert2alias
+
convert2alias

boolean to indicate whether or not to convert gene symbols in the input that are not found in the PIN to an alias symbol found in the PIN (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.

@@ -120,9 +122,7 @@

Arguments

Value

- - -

This function first filters the input so that all p values are less +

This function first filters the input so that all p values are less than or equal to the threshold. Next, gene symbols that are not found in the PIN are identified. If aliases of these gene symbols are found in the PIN, the symbols are converted to the corresponding aliases. The @@ -173,16 +173,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,7 +88,9 @@

Input Testing

Arguments

-
input
+ + +
input

the input data that pathfindR uses. The input must be a data frame with three columns:

  1. Gene Symbol (Gene Symbol)

  2. Change value, e.g. log(fold change) (OPTIONAL)

  3. @@ -96,16 +98,14 @@

    Arguments

-
p_val_threshold
+
p_val_threshold

the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)

Value

- - -

Only checks if the input and the threshold follows the required +

Only checks if the input and the threshold follows the required specifications.

@@ -133,16 +133,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,15 +88,15 @@

Check if value is a valid color

Arguments

-
x
+ + +
x

value

Value

- - -

TRUE if x is a valid color, otherwise FALSE

+

TRUE if x is a valid color, otherwise FALSE

@@ -111,16 +111,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -138,16 +138,16 @@

Author

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -97,46 +97,46 @@

Plot the Heatmap of Score Matrix of Enriched Terms per Sample

Arguments

-
score_matrix
+ + +
score_matrix

Matrix of agglomerated enriched term scores per sample. Columns are samples, rows are enriched terms

-
cases
+
cases

(Optional) A vector of sample names that are cases in the case/control experiment. (default = NULL)

-
label_samples
+
label_samples

Boolean value to indicate whether or not to label the samples in the heatmap plot (default = TRUE)

-
case_title
+
case_title

Naming of the 'Case' group (as in cases) (default = 'Case')

-
control_title
+
control_title

Naming of the 'Control' group (default = 'Control')

-
low
+
low

a string indicating the color of 'low' values in the coloring gradient (default = 'green')

-
mid
+
mid

a string indicating the color of 'mid' values in the coloring gradient (default = 'black')

-
high
+
high

a string indicating the color of 'high' values in the coloring gradient (default = 'red')

Value

- - -

A `ggplot2` object containing the heatmap plot. x-axis indicates +

A `ggplot2` object containing the heatmap plot. x-axis indicates the samples. y-axis indicates the enriched terms. 'Score' indicates the score of the term in a given sample. If cases are provided, the plot is divided into 2 facets, named by case_title and control_title.

@@ -164,16 +164,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,16 +88,16 @@

Process Data frame of Protein-protein Interactions

Arguments

-
pin_df
+ + +
pin_df

data frame of protein-protein interactions with 2 columns: 'Interactor_A' and 'Interactor_B'

Value

- - -

processed PIN data frame (removes self-interactions and +

processed PIN data frame (removes self-interactions and duplicated interactions)

@@ -113,16 +113,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -100,7 +100,9 @@

Return The Path to Given Protein-Protein Interaction Network (PIN)

Arguments

-
pin_name_path
+ + +
pin_name_path

Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')

@@ -108,9 +110,7 @@

Arguments

Value

- - -

The absolute path to chosen PIN.

+

The absolute path to chosen PIN.

See also

@@ -120,9 +120,9 @@

See also

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 pin_path <- return_pin_path('GeneMania')
-}
+} # }
 
@@ -137,16 +137,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -103,7 +103,9 @@

Wrapper Function for pathfindR - Active-Subnetwork-Oriented Enrichment Workf

Arguments

-
input
+ + +
input

the input data that pathfindR uses. The input must be a data frame with three columns:

  1. Gene Symbol (Gene Symbol)

  2. Change value, e.g. log(fold change) (OPTIONAL)

  3. @@ -111,7 +113,7 @@

    Arguments

-
gene_sets
+
gene_sets

Name of the gene sets to be used for enrichment analysis. Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All', 'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'. @@ -119,73 +121,71 @@

Arguments

must be specified. (Default = 'KEGG')

-
min_gset_size
+
min_gset_size

minimum number of genes a term must contain (default = 10)

-
max_gset_size
+
max_gset_size

maximum number of genes a term must contain (default = 300)

-
custom_genes
+
custom_genes

a list containing the genes involved in each custom term. Each element is a vector of gene symbols located in the given custom term. Names should correspond to the IDs of the custom terms.

-
custom_descriptions
+
custom_descriptions

A vector containing the descriptions for each custom term. Names of the vector should correspond to the IDs of the custom terms.

-
pin_name_path
+
pin_name_path

Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')

-
p_val_threshold
+
p_val_threshold

the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)

-
enrichment_threshold
+
enrichment_threshold

adjusted-p value threshold used when filtering enrichment results (default = 0.05)

-
convert2alias
+
convert2alias

boolean to indicate whether or not to convert gene symbols in the input that are not found in the PIN to an alias symbol found in the PIN (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.

-
plot_enrichment_chart
+
plot_enrichment_chart

boolean value. If TRUE, a bubble chart displaying the enrichment results is plotted. (default = TRUE)

-
output_dir
+
output_dir

the directory to be created where the output and intermediate files are saved (default = NULL, a temporary directory is used)

-
list_active_snw_genes
+
list_active_snw_genes

boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = FALSE)

-
...
+
...

additional arguments for active_snw_enrichment_wrapper

Value

- - -

Data frame of pathfindR enrichment results. Columns are:

ID
+

Data frame of pathfindR enrichment results. Columns are:

ID

ID of the enriched term

Term_Description
@@ -220,8 +220,6 @@

Value

results linked to the visualizations of the enriched terms in addition to the table of converted gene symbols. This report can be found in 'output_dir/results.html' under the current working directory.

- -

By default, a bubble chart of top 10 enrichment results are plotted. The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched terms. Sizes of the bubbles indicate the number of significant genes in the given terms. @@ -263,9 +261,9 @@

See also

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 run_pathfindR(example_pathfindR_input)
-}
+} # }
 
@@ -280,16 +278,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -95,7 +95,9 @@

Calculate Agglomerated Scores of Enriched Terms for Each Subject

Arguments

-
enrichment_table
+ + +
enrichment_table

a data frame that must contain the 3 columns below:

Term_Description

Description of the enriched term (necessary if use_description = TRUE)

@@ -112,43 +114,41 @@

Arguments

-
exp_mat
+
exp_mat

the experiment (e.g., gene expression/methylation) matrix. Columns are samples and rows are genes. Column names must contain sample names and row names must contain the gene symbols.

-
cases
+
cases

(Optional) A vector of sample names that are cases in the case/control experiment. (default = NULL)

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

-
plot_hmap
+
plot_hmap

Boolean value to indicate whether or not to draw the heatmap plot of the scores. (default = TRUE)

-
...
+
...

Additional arguments for plot_scores for aesthetics of the heatmap plot

Value

- - -

Matrix of agglomerated scores of each enriched term per sample. +

Matrix of agglomerated scores of each enriched term per sample. Columns are samples, rows are enriched terms. Optionally, displays a heatmap of this matrix.

Conceptual Background

- +

For an experiment matrix (containing expression, methylation, etc. values), the rows of which are genes and the columns of which are samples, @@ -193,16 +193,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -115,119 +115,121 @@

Active Subnetwork Search + Enrichment Analysis Wrapper for a Single Iteratio

Arguments

-
i
+ + +
i

current iteration index (default = NULL)

-
dirs
+
dirs

vector of directories for parallel runs

-
input_processed
+
input_processed

processed input data frame

-
pin_path
+
pin_path

path/to/PIN/file

-
score_quan_thr
+
score_quan_thr

active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)

-
sig_gene_thr
+
sig_gene_thr

threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2

-
search_method
+
search_method

algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = 'GR').

-
silent_option
+
silent_option

boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because during parallel runs, the console messages get disorderly printed.

-
use_all_positives
+
use_all_positives

if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)

-
geneInitProbs
+
geneInitProbs

For SA and GA, probability of adding a gene in initial solution (default = 0.1)

-
saTemp0
+
saTemp0

Initial temperature for SA (default = 1.0)

-
saTemp1
+
saTemp1

Final temperature for SA (default = 0.01)

-
saIter
+
saIter

Iteration number for SA (default = 10000)

-
gaPop
+
gaPop

Population size for GA (default = 400)

-
gaIter
+
gaIter

Iteration number for GA (default = 200)

-
gaThread
+
gaThread

Number of threads to be used in GA (default = 5)

-
gaCrossover
+
gaCrossover

Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)

-
gaMut
+
gaMut

For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)

-
grMaxDepth
+
grMaxDepth

Sets max depth in greedy search, 0 for no limit (default = 1)

-
grSearchDepth
+
grSearchDepth

Search depth in greedy search (default = 1)

-
grOverlap
+
grOverlap

Overlap threshold for results of greedy search (default = 0.5)

-
grSubNum
+
grSubNum

Number of subnetworks to be presented in the results (default = 1000)

-
gset_list
+
gset_list

list for gene sets

-
adj_method
+
adj_method

correction method to be used for adjusting p-values. (default = 'bonferroni')

-
enrichment_threshold
+
enrichment_threshold

adjusted-p value threshold used when filtering enrichment results (default = 0.05)

-
list_active_snw_genes
+
list_active_snw_genes

boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = FALSE)

@@ -235,9 +237,7 @@

Arguments

Value

- - -

Data frame of enrichment results using active subnetwork search results

+

Data frame of enrichment results using active subnetwork search results

@@ -252,16 +252,16 @@

Value

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,7 +88,9 @@

Summarize Enrichment Results

Arguments

-
enrichment_res
+ + +
enrichment_res

a dataframe of combined enrichment results. Columns are:

ID

ID of the enriched term

@@ -111,7 +113,7 @@

Arguments

-
list_active_snw_genes
+
list_active_snw_genes

boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = FALSE)

@@ -119,9 +121,7 @@

Arguments

Value

- - -

a dataframe of summarized enrichment results (over multiple iterations). Columns are:

ID
+

a dataframe of summarized enrichment results (over multiple iterations). Columns are:

ID

ID of the enriched term

Term_Description
@@ -150,9 +150,9 @@

Value

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 summarize_enrichment_results(enrichment_res)
-}
+} # }
 
@@ -167,16 +167,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -95,7 +95,9 @@

Create Term-Gene Graph

Arguments

-
result_df
+ + +
result_df

A dataframe of pathfindR results that must contain the following columns:

Term_Description

Description of the enriched term (necessary if use_description = TRUE)

@@ -115,34 +117,32 @@

Arguments

-
num_terms
+
num_terms

Number of top enriched terms to use while creating the graph. Set to NULL to use all enriched terms (default = 10, i.e. top 10 terms)

-
layout
+
layout

The type of layout to create (see ggraph for details. Default = 'stress')

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

-
node_size
+
node_size

Argument to indicate whether to use number of significant genes ('num_genes') or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes')

-
node_colors
+
node_colors

vector of 3 colors to be used for coloring nodes (colors for term nodes, up, and down, respectively)

Value

- - -

a ggraph object containing the term-gene graph. +

a ggraph object containing the term-gene graph. Each node corresponds to an enriched term (beige), an up-regulated gene (green) or a down-regulated gene (red). An edge between a term and a gene indicates that the given term involves the gene. Size of a term node is proportional @@ -179,16 +179,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -99,7 +99,9 @@

Create Terms by Genes Heatmap

Arguments

-
result_df
+ + +
result_df

A dataframe of pathfindR results that must contain the following columns:

Term_Description

Description of the enriched term (necessary if use_description = TRUE)

@@ -119,7 +121,7 @@

Arguments

-
genes_df
+
genes_df

the input data that was used with run_pathfindR. It must be a data frame with 3 columns:

  1. Gene Symbol (Gene Symbol)

  2. Change value, e.g. log(fold change) (optional)

  3. @@ -127,47 +129,45 @@

    Arguments

The change values in this data frame are used to color the affected genes

-
num_terms
+
num_terms

Number of top enriched terms to use while creating the plot. Set to NULL to use all enriched terms (default = 10)

-
use_description
+
use_description

Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = FALSE)

-
low
+
low

a string indicating the color of 'low' values in the coloring gradient (default = 'green')

-
mid
+
mid

a string indicating the color of 'mid' values in the coloring gradient (default = 'black')

-
high
+
high

a string indicating the color of 'high' values in the coloring gradient (default = 'red')

-
legend_title
+
legend_title

legend title (default = 'change')

-
sort_terms_by_p
+
sort_terms_by_p

boolean to indicate whether to sort terms by 'lowest_p' (TRUE) or by number of genes (FALSE) (default = FALSE)

-
...
+
...

additional arguments for input_processing (used if genes_df is provided)

Value

- - -

a ggplot2 object of a heatmap where rows are enriched terms and +

a ggplot2 object of a heatmap where rows are enriched terms and columns are involved input genes. If genes_df is provided, colors of the tiles indicate the change values.

@@ -190,16 +190,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -94,19 +94,21 @@

Visualize Human KEGG Pathways

Arguments

-
kegg_pw_ids
+ + +
kegg_pw_ids

KEGG ids of pathways to be colored and visualized

-
input_processed
+
input_processed

input data processed via input_processing

-
scale_vals
+
scale_vals

should change values be scaled? (default = TRUE)

-
node_cols
+
node_cols

low, middle and high color values for coloring the pathway nodes (default = NULL). If node_cols=NULL, the low, middle and high color are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no @@ -114,16 +116,14 @@

Arguments

input_processing), only one color ('#F38F18' if NULL) is used.

-
legend.position
+
legend.position

the default position of legends ("none", "left", "right", "bottom", "top", "inside")

Value

- - -

Creates colored visualizations of the enriched human KEGG pathways +

Creates colored visualizations of the enriched human KEGG pathways and returns them as a list of ggplot objects, named by Term ID.

@@ -135,13 +135,13 @@

See also

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 input_processed <- data.frame(
   GENE = c("PKLR", "GPI", "CREB1", "INS"),
   CHANGE = c(1.5, -2, 3, 5)
 )
 gg_list <- visualize_KEGG_diagram(c("hsa00010", "hsa04911"), input_processed)
-}
+} # }
 
@@ -156,16 +156,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -97,11 +97,13 @@

Visualize Active Subnetworks

Arguments

-
active_snw_path
+ + +
active_snw_path

path to the output of an Active Subnetwork Search

-
genes_df
+
genes_df

the input data that was used with run_pathfindR. It must be a data frame with 3 columns:

  1. Gene Symbol (Gene Symbol)

  2. Change value, e.g. log(fold change) (optional)

  3. @@ -109,42 +111,40 @@

    Arguments

The change values in this data frame are used to color the affected genes

-
pin_name_path
+
pin_name_path

Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')

-
num_snws
+
num_snws

number of top subnetworks to be visualized (leave blank if you want to visualize all subnetworks)

-
layout
+
layout

The type of layout to create (see ggraph for details. Default = 'stress')

-
score_quan_thr
+
score_quan_thr

active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)

-
sig_gene_thr
+
sig_gene_thr

threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2

-
...
+
...

additional arguments for input_processing

Value

- - -

a list of ggplot objects of graph visualizations of identified active +

a list of ggplot objects of graph visualizations of identified active subnetworks. Green nodes are down-regulated genes, reds are up-regulated genes and yellows are non-input genes

@@ -182,16 +182,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -88,27 +88,27 @@

Visualize Interactions of Genes Involved in the Given Enriched Terms

Arguments

-
result_df
+ + +
result_df

Data frame of enrichment results. Must-have columns are: 'Term_Description', 'Up_regulated' and 'Down_regulated'

-
pin_name_path
+
pin_name_path

Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')

-
show_legend
+
show_legend

Boolean to indicate whether to display the legend (TRUE) or not (FALSE) (default: TRUE)

Value

- - -

list of ggplot objects (named by Term ID) visualizing the interactions of genes involved +

list of ggplot objects (named by Term ID) visualizing the interactions of genes involved in the given enriched terms (annotated in the result_df) in the PIN used for enrichment analysis (specified by pin_name_path).

@@ -131,10 +131,10 @@

See also

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 result_df <- example_pathfindR_output[1:2, ]
 gg_list <- visualize_term_interactions(result_df, pin_name_path = 'IntAct')
-}
+} # }
 
@@ -149,16 +149,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- + - +
- +
@@ -94,30 +94,32 @@

Create Diagrams for Enriched Terms

Arguments

-
result_df
+ + +
result_df

Data frame of enrichment results. Must-have columns for KEGG human pathway diagrams (is_KEGG_result = TRUE) are: 'ID' and 'Term_Description'. Must-have columns for the rest are: 'Term_Description', 'Up_regulated' and 'Down_regulated'

-
input_processed
+
input_processed

input data processed via input_processing, not necessary when is_KEGG_result = FALSE

-
is_KEGG_result
+
is_KEGG_result

boolean to indicate whether KEGG gene sets were used for enrichment analysis or not (default = TRUE)

-
pin_name_path
+
pin_name_path

Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')

-
...
+
...

additional arguments for visualize_KEGG_diagram (used when is_KEGG_result = TRUE) or visualize_term_interactions (used when is_KEGG_result = FALSE)

@@ -125,12 +127,9 @@

Arguments

Value

- - -

Depending on the argument is_KEGG_result, creates visualization of - interactions of genes involved in the list of enriched terms in

-

-

result_df. Returns a list of ggplot objects named by Term ID.

+

Depending on the argument is_KEGG_result, creates visualization of + interactions of genes involved in the list of enriched terms in + result_df. Returns a list of ggplot objects named by Term ID.

Details

@@ -150,7 +149,7 @@

See also

Examples

-
if (FALSE) {
+    
if (FALSE) { # \dontrun{
 input_processed <- data.frame(
   GENE = c("PARP1", "NDUFA1", "STX6", "SNAP23"),
   CHANGE = c(1.5, -2, 3, 5)
@@ -159,7 +158,7 @@ 

Examples

gg_list <- visualize_terms(result_df, input_processed) gg_list2 <- visualize_terms(result_df, is_KEGG_result = FALSE, pin_name_path = 'IntAct') -} +} # }
@@ -174,16 +173,16 @@

Examples

-

Site built with pkgdown 2.0.9.

+

Site built with pkgdown 2.1.1.

- +