Interpreting big-map outputs #18

flashton2003 · 2023-05-04T13:40:08Z

Hello,

I've run BiG-MAP to identify the gene clusters in some microbiome samples comparing health and disease.

There seem to be interesting patterns in my dataset of particular gene cluster types being associated with health or disease:

Do you have any suggestions on approaches to test whether these patterns are statistically significant?

Each gene cluster example is statistically significantly associated with e.g. health. For example, gb.KB291615.1.region001.GC_DNA..Entryname.acetate2butyrate..OS.Clostridium_celatum_DSM_1785_genomic_scaffold..SMASHregion.region001..NR.1 was associated with health.

But then, what about at the gene cluster level? Could just look at whether the counts in the screenshot above are significant, but that seems to be discarding a lot of information.

I was also thinking about whether the RPKMs could be "summed" at the "gene cluster type" level (e.g. acetate2butyrate), and compared between health and disease.

Any thoughts welcome!

Thanks,

Phil

I have two countries in my study, so I've narrowed down the list of hits by filtering for only pathways that are consistently associated with health/disease in both coun

The text was updated successfully, but these errors were encountered:

HAugustijn · 2023-05-10T15:59:59Z

Do you have any suggestions on approaches to test whether these patterns are statistically significant?

Did you happen to look at the statistical methods offered in the analysis module (BiG-MAP.analyse)?

I was also thinking about whether the RPKMs could be "summed" at the "gene cluster type" level (e.g. acetate2butyrate), and compared between health and disease.

Yes the reads for similar cluster types with the same end products can be summed. Here is an example of how we applied this to create Fig. 3. An alternative is to modify the settings of the family module to create larger gene cluster families.

flashton2003 · 2023-05-10T18:50:52Z

Hi Hannah,

Yes, I ran BiG-MAP.analyse but the output didn't really make sense to me. This is the kruskall-wallis csv I got, but it doesn't seem to be grouped by condition (I've sub-sampled it, but the parts I deleted were the same kind of thing - MGCs vs samples). Perhaps I mis-specified something?

Acute_TyphivsControl_HealthySerosurvey_GC_kw.subsample.csv

This was the output run with:

python3 ~/programs/BiG-MAP/src/BiG-MAP.analyse.py --explore --compare -B clean/biom-results/BiG-MAP.mapcore.metacore.dec.biom -T metagenomic -M DiseaseStatus -g Acute_Typhi Control_HealthySerosurvey -O clean/analyse_output/

Thanks for the example!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpreting big-map outputs #18

Interpreting big-map outputs #18

flashton2003 commented May 4, 2023 •

edited

Loading

HAugustijn commented May 10, 2023

flashton2003 commented May 10, 2023

Interpreting big-map outputs #18

Interpreting big-map outputs #18

Comments

flashton2003 commented May 4, 2023 • edited Loading

HAugustijn commented May 10, 2023

flashton2003 commented May 10, 2023

flashton2003 commented May 4, 2023 •

edited

Loading