Definition of background/query gene sets #478

ftucos · 2022-06-20T15:19:17Z

ftucos
Jun 20, 2022

Hi, comparing some discrepancies between the results of Over Representation Analysis (one-sided Fisher exact test, a.k.a. hypergeometric test) performed with enricher() and with other web tools such as MsigDB, I realized there is an unaddressed ambiguity on the definition of genes in the query list (eg. upregulated genes) and genes in the universe/background:

While other tools and general workshops suggest that k is the complete query list and N is the universe of measurable genes (e.g. the whole transcriptome for RNAseq), ClusterProfiler restricts the analysis to only genes present in the annotation set in use. Specifying the universe parameter, results in only intersecting the list of genes provided with genes in the annotation set. That leads of course to generally larger p-values than what we would get with the conventional approach. I feel that restricting the analysis to only annotated genes is reasonable and more specific, but I think it's worth opening a discussion about that.
Which approach do you usually use/recommend?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Definition of background/query gene sets #478

{{title}}

Replies: 0 comments

Select a reply

Definition of background/query gene sets #478

ftucos Jun 20, 2022

Replies: 0 comments

ftucos
Jun 20, 2022