Constructing pan-genome #1252

Tonitsk8264 · 2023-12-07T15:11:12Z

Tonitsk8264
Dec 7, 2023

Dear Developer.

I am currently using Minigraph-Cactus to perform a wheat pan-genome build on 24 wheat genomes, sequences from specific regions on the same chromosome. However, in the result file (gfa), I find that only part of the wheat genome is included on path, not all of it.

I suspect this may be due to the high level of divergence between the different samples. Although the value of the minIdentity parameter has been set to 0.5 in the cactus_progressive_config.xml configuration file, it did not achieve the results I was expecting. Therefore, I would like to ask for advice on how to modify the parameter in the configuration file to better handle the issue of divergence between samples and ensure that all chromosomes of the wheat genome are correctly included in the pan-genome, thus enabling a complete pan-genome construction for all samples.

Thank you for your time and support.

Best regards.

W       Avent_RM271     0       Chr6N   0       37412338        >1>2>3>4>5>7>8>10>11>13>14>15>16>18>19>21>22>24>25>27>28>30>31>33>34>3>
W       Taest_CDCStanley        0       2A      0       33570816        >3>4>5>7>8>10>11>13>14>15>16>18>19>21>22>24>25>27>28>30>31>33>>
W       Taest_Jagger    0       2A      0       32615472        >3>4>5>7>8>10>11>13>14>15>16>18>19>21>22>24>25>27>28>30>31>33>34>36>37>
W       Taest_Mace      0       2A      0       33050520        >564491>564493>564494>564496>564497>564498>564499>564501>564502>564503>
W       Taest_Renan     0       chr2A   0       34099443        >2>3>4>5>6>8>9>11>12>14>16>17>19>20>22>23>25>26>28>29>31>32>34>35>37>3>
W       Taest_SYMattis  0       2A      0       31852674        >3>4>5>7>8>10>11>13>14>15>16>18>19>21>22>24>25>27>28>30>31>33>34>36>37

glennhickey · 2023-12-09T18:53:32Z

glennhickey
Dec 9, 2023
Maintainer

Minigraph doesn't work well at high divergences. Near the beginning of the log, you should be able to see the mash distances of all your genomes to the reference, and it will even give you a warning if any seem too high. Are you able to share this part of your log?

0 replies

Tonitsk8264 · 2023-12-14T09:48:52Z

Tonitsk8264
Dec 14, 2023
Author

cactus-pangenome.log

Yes, some wheat genomes have higher mash distances from the reference. In this case, can we adjust the parameters to add these genomes to the pan-genome?

0 replies

glennhickey · 2023-12-14T14:00:00Z

glennhickey
Dec 14, 2023
Maintainer

Yeah, there's supposed to be a warning for distances > 0.02 -- strange that it's not in your log. But anyway, 0.097 is way higher than minigraph-cactus is used to dealing with, and I don't think there are any parameters to change this.

You'd have to cut down your inputs to only genomes <0.02 from the reference, or you can make a tree (ex with mashtree) and properly align this data with Progressive Cactus. You can also try PGGB, which lets you map with more sensitive parameters, but if your final graph has a mutation at every position, you may struggle to use it for anything.

mash distance of Turar_G1812 (size = 24691619) to reference Avent_RM271 = 0.0974869
mash distance of Tmono_TA299 (size = 27091863) to reference Avent_RM271 = 0.0968463
mash distance of Taest_LongReachLancer (size = 25471049) to reference Avent_RM271 = 0.0968463
mash distance of Tmono_PI306540 (size = 27777195) to reference Avent_RM271 = 0.0949813
mash distance of Tduru_Svevo (size = 32440310) to reference Avent_RM271 = 0.0949813
mash distance of Taest_Aikang58 (size = 25900446) to reference Avent_RM271 = 0.0949813
mash distance of Taest_Fielder (size = 31607142) to reference Avent_RM271 = 0.0943778
mash distance of Taest_CDCLandmark (size = 26632610) to reference Avent_RM271 = 0.0937829
mash distance of Tmono_TA10622 (size = 28080119) to reference Avent_RM271 = 0.0926182
mash distance of Taest_Norin61 (size = 26012014) to reference Avent_RM271 = 0.0926182
mash distance of Taest_Kenong9204 (size = 30582808) to reference Avent_RM271 = 0.0926182
mash distance of Taest_Kariega (size = 30594703) to reference Avent_RM271 = 0.0914855
mash distance of Taest_Julius (size = 30246075) to reference Avent_RM271 = 0.0914855
mash distance of Taest_ChineseSpring (size = 29059918) to reference Avent_RM271 = 0.0914855
mash distance of Taest_ArinaLrFor (size = 27350083) to reference Avent_RM271 = 0.0914855
mash distance of Ttibe_Zang1817 (size = 29365020) to reference Avent_RM271 = 0.0898429
mash distance of Tspel_PI190962 (size = 30978064) to reference Avent_RM271 = 0.0893096
mash distance of Tdico_Zavitan (size = 30618420) to reference Avent_RM271 = 0.0857608
mash distance of Taest_Renan (size = 34099443) to reference Avent_RM271 = 0.00461146
mash distance of Taest_SYMattis (size = 31852674) to reference Avent_RM271 = 0.00157453
mash distance of Taest_Jagger (size = 32615472) to reference Avent_RM271 = 0.00128842
mash distance of Taest_CDCStanley (size = 33570816) to reference Avent_RM271 = 0.00105797
mash distance of Taest_Mace (size = 33050520) to reference Avent_RM271 = 0.0010072

0 replies

Tonitsk8264 · 2023-12-14T16:39:30Z

Tonitsk8264
Dec 14, 2023
Author

Thanks for your reply and suggestions!

0 replies

Tonitsk8264 · 2023-12-15T02:16:21Z

Tonitsk8264
Dec 15, 2023
Author

Sorry to bother you again, but I have another question about pan-genome construction. I hope you can help me figure it out：

Does minigraph-cactus support gradual increase? For instance, by initially building a pan-genome 'Pn' using n sequences, and subsequently adding a new sequence labeled 'x' to extend the pan-genome from 'Pn' to 'Pn+1', instead of starting the construction of the pan-genome from scratch with these n+1 sequences.

0 replies

glennhickey · 2023-12-15T14:29:23Z

glennhickey
Dec 15, 2023
Maintainer

No. You can add genomes in minigraph but not minigraph-cactus.

1 reply

Tonitsk8264 Dec 15, 2023
Author

That means if I could only use these n+1 genomes to build a pangenome from scratch. Is that right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ComparativeGenomicsToolkit

Constructing pan-genome #1252

{{title}}

Replies: 6 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

ComparativeGenomicsToolkit

Constructing pan-genome #1252

Tonitsk8264 Dec 7, 2023

Replies: 6 comments · 1 reply

glennhickey Dec 9, 2023 Maintainer

Tonitsk8264 Dec 14, 2023 Author

glennhickey Dec 14, 2023 Maintainer

Tonitsk8264 Dec 14, 2023 Author

Tonitsk8264 Dec 15, 2023 Author

glennhickey Dec 15, 2023 Maintainer

Tonitsk8264 Dec 15, 2023 Author

Tonitsk8264
Dec 7, 2023

Replies: 6 comments 1 reply

glennhickey
Dec 9, 2023
Maintainer

Tonitsk8264
Dec 14, 2023
Author

glennhickey
Dec 14, 2023
Maintainer

Tonitsk8264
Dec 14, 2023
Author

Tonitsk8264
Dec 15, 2023
Author

glennhickey
Dec 15, 2023
Maintainer

Tonitsk8264 Dec 15, 2023
Author