Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seqwish output with simulated data #69

Open
egoltsman opened this issue Dec 4, 2020 · 1 comment
Open

seqwish output with simulated data #69

egoltsman opened this issue Dec 4, 2020 · 1 comment

Comments

@egoltsman
Copy link

Hi Erik,
I am doing some precision/recall analysis on a simulated set of 13 samples where each "sample" is a random mutant of a real-life plant chromosome. I introduced exactly 200 SVs per sample and the types range between deletion, inversion, tandem-duplication, and translocations. The variant sizes are fixed at 500bp and 10kb. After using edyeet+seqwish to construct the graphs with these sequences, plus the original reference, I now have 14 graph of increasing complexity and would like to see how well the variants can be "deconstructed" from them. So I took the GFA->vg route for each graph and used 'vg snarls' to get the bubbles out. It reports a lot more variants than what I had introduced, even in a 2-sample graph. My suspicion is that edyeet misaligned some of the regions, and I want to try it again with more stringent parameters. Do you think this is something worth pursuing, or is edyeet not designed to handle this scenario?

Another question is about the GFA tags that seqwish puts it. Sorry if this is described in some obvious place, but what are the DP: RC: tags for?

@egoltsman egoltsman changed the title tags in output GFA seqwish output with simulated data Dec 4, 2020
@ekg
Copy link
Owner

ekg commented Dec 5, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants