Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting big-map outputs #18

Open
flashton2003 opened this issue May 4, 2023 · 2 comments
Open

Interpreting big-map outputs #18

flashton2003 opened this issue May 4, 2023 · 2 comments

Comments

@flashton2003
Copy link

flashton2003 commented May 4, 2023

Hello,

I've run BiG-MAP to identify the gene clusters in some microbiome samples comparing health and disease.

There seem to be interesting patterns in my dataset of particular gene cluster types being associated with health or disease:

Screenshot 2023-05-04 at 15 33 11

Do you have any suggestions on approaches to test whether these patterns are statistically significant?

Each gene cluster example is statistically significantly associated with e.g. health. For example, gb.KB291615.1.region001.GC_DNA..Entryname.acetate2butyrate..OS.Clostridium_celatum_DSM_1785_genomic_scaffold..SMASHregion.region001..NR.1 was associated with health.

But then, what about at the gene cluster level? Could just look at whether the counts in the screenshot above are significant, but that seems to be discarding a lot of information.

I was also thinking about whether the RPKMs could be "summed" at the "gene cluster type" level (e.g. acetate2butyrate), and compared between health and disease.

Any thoughts welcome!

Thanks,

Phil

I have two countries in my study, so I've narrowed down the list of hits by filtering for only pathways that are consistently associated with health/disease in both coun

@HAugustijn
Copy link
Collaborator

Do you have any suggestions on approaches to test whether these patterns are statistically significant?

Did you happen to look at the statistical methods offered in the analysis module (BiG-MAP.analyse)?

I was also thinking about whether the RPKMs could be "summed" at the "gene cluster type" level (e.g. acetate2butyrate), and compared between health and disease.

Yes the reads for similar cluster types with the same end products can be summed. Here is an example of how we applied this to create Fig. 3. An alternative is to modify the settings of the family module to create larger gene cluster families.

@flashton2003
Copy link
Author

Hi Hannah,

Yes, I ran BiG-MAP.analyse but the output didn't really make sense to me. This is the kruskall-wallis csv I got, but it doesn't seem to be grouped by condition (I've sub-sampled it, but the parts I deleted were the same kind of thing - MGCs vs samples). Perhaps I mis-specified something?

Acute_TyphivsControl_HealthySerosurvey_GC_kw.subsample.csv

This was the output run with:

python3 ~/programs/BiG-MAP/src/BiG-MAP.analyse.py --explore --compare -B clean/biom-results/BiG-MAP.mapcore.metacore.dec.biom -T metagenomic -M DiseaseStatus -g Acute_Typhi Control_HealthySerosurvey -O clean/analyse_output/

Thanks for the example!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants