You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, comparing some discrepancies between the results of Over Representation Analysis (one-sided Fisher exact test, a.k.a. hypergeometric test) performed with enricher() and with other web tools such as MsigDB, I realized there is an unaddressed ambiguity on the definition of genes in the query list (eg. upregulated genes) and genes in the universe/background:
While other tools and general workshops suggest that k is the complete query list and N is the universe of measurable genes (e.g. the whole transcriptome for RNAseq), ClusterProfiler restricts the analysis to only genes present in the annotation set in use. Specifying the universe parameter, results in only intersecting the list of genes provided with genes in the annotation set. That leads of course to generally larger p-values than what we would get with the conventional approach. I feel that restricting the analysis to only annotated genes is reasonable and more specific, but I think it's worth opening a discussion about that.
Which approach do you usually use/recommend?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, comparing some discrepancies between the results of Over Representation Analysis (one-sided Fisher exact test, a.k.a. hypergeometric test) performed with
enricher()
and with other web tools such as MsigDB, I realized there is an unaddressed ambiguity on the definition of genes in the query list (eg. upregulated genes) and genes in the universe/background:While other tools and general workshops suggest that k is the complete query list and N is the universe of measurable genes (e.g. the whole transcriptome for RNAseq), ClusterProfiler restricts the analysis to only genes present in the annotation set in use. Specifying the
universe
parameter, results in only intersecting the list of genes provided with genes in the annotation set. That leads of course to generally larger p-values than what we would get with the conventional approach. I feel that restricting the analysis to only annotated genes is reasonable and more specific, but I think it's worth opening a discussion about that.Which approach do you usually use/recommend?
Beta Was this translation helpful? Give feedback.
All reactions