Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in generating .HWE.vcf file #5

Closed
OmidJa opened this issue Apr 24, 2019 · 4 comments
Closed

Error in generating .HWE.vcf file #5

OmidJa opened this issue Apr 24, 2019 · 4 comments

Comments

@OmidJa
Copy link

OmidJa commented Apr 24, 2019

Dear Alana,
hope you are doing well.
This is Omid Jafari. Previously I was working with stacks ver.1 pipeline and you helped me with solving my error in using your package by updating it. But now I have generated a vcf file from stacks ver 2 which in column ID there is some changes and I think the error backs to that. It should be mentioned that my pipeline was genome reference-based.

Error: 'populations.0.65_0.9.0.01_3.HWE.vcf' does not exist in current working directory ('/home/omid/Khazar-vcf-mehrshad/hwe').
Execution halted

In the GBS_SNP_filter.txt file, at the last lane I changed _.* to :.* and it gets a bit running but then again I face with error.

There were 50 or more warnings (use warnings() to see the first 50)
[1] "Up to 1 out of 5 populations"
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn,  :
  length of 'dimnames' [2] not equal to array extent
Calls: unlist -> lapply -> FUN -> which -> Ops.data.frame -> matrix
In addition: Warning message:
Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `enframe(name = NULL)` instead.
This warning is displayed once per session.
Execution halted
ls: cannot access *pop.vcf: No such file or directory
Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: readr
Loading required package: stringr
Loading required package: tibble
Parsed with column specification:
cols(
  X1 = col_character()
)
Error: 'populations.recode.0.65_0.9.0.01_3.HWE.vcf' does not exist in current working directory ('/home/omid/Khazar-vcf-mehrshad/hwe').
Execution halted

I'll share my original vcf file, popmap.txt and GBS_SNP_filter.txt and will be so grateful if you can help me to pass over this error.

I should add the point that I think the the error backs to my vcf file, because when apply for using the package on some other .vcf files (generated from stacks 2) with the change in last line code of GBS_SNP_filter.txt file (as mentioned above) it works fluently, so I think there is some thing wrong in that vcf file!!

Regards,
Omid

@laninsky
Copy link
Owner

Hi Omid,

There were a couple of things working against you here (not least some bugs in my code, but more on that in a moment!).

The first issue I needed to solve involved your GBS_SNP_filter.txt file:
-- populations.snp.vcf is the name you have in the GBS_SNP_filter.txt file, but this is not the correct name - your vcf file is actually called populations.snps.vcf
-- And as you figured out yourself already, “_.” was the separator for your ID column for the last run, but given the format has changed to having the locus separated from the SNP position by a colon, :. needs to be on line 8 of GBS_SNP_filter.txt, rather than _.*

The second issue involved your popmap.txt file: this didn’t have the same sample names as are in the vcf file. All of these sample names in the vcf file have “-sorted” on the end, and this is missing from your popmap.txt file. I tweaked this by:
mv popmap.txt oldpopmap.txt
sed -r 's/(.)(-[0-9][0-9][0-9])(.*)/\1\2-sorted\3/g' oldpopmap.txt > popmap.txt

After solving these issues, however, I then discovered a few bugs in the code left over from some of the bigger changes I made recently, so even if all your files had been correct the pipeline would probably have failed! (Sorry!) This should all be fixed now, and I’ve also placed the output files from running your files into the most recent dropbox folder you shared with me.

One thing I did notice is that it looks like the read depth for your SNPs is pretty low - is this something that you were expecting based on your sequencing run?

Thanks again for your patience!

Alana

P.S. I’ll leave this issue open for a week just in case you have any further issues and then close it if it is all good.

@OmidJa
Copy link
Author

OmidJa commented Apr 25, 2019

Hi dear Alana,
First of all I should make an excuse because of some mistakes in sending the files. Because I had many files and did that mistake to send the real file. But it should be mentioned that now the updated package works fluently on my vcf file, without any bug. Thanks for all your great supports.
Further to the read depth, yes you are right and I should check their quality and do some filtering based on the read depth, but generally speaking we didn't expect a high depth of coverage. Excuse me are there some thing you are thinking about? I mean do you have a suggestion to me?

Again I am really grateful for all your help.

Cheers,
Omid

@laninsky
Copy link
Owner

Hi Omid,

No worries on the files - like I said, even if they had been correct, the code still had bugs in it! I was mostly asking about the read depth just because it appeared to be a fair bit lower than the previous run you had sent to me, and I just wanted to make sure that is what you'd expect (and not that the code was doing something additionally funny!), so no strong suggestions (other than if you are working with very low depth data but doing population genetics where you don't need to know individual genotypes, this kind of approach might be worth looking into: https://onlinelibrary.wiley.com/doi/full/10.1111/1755-0998.12990).

Anyhow, as the pipeline is now working correctly for you, I'll go ahead and close this issue.

Good luck with your downstream analysis!

Alana

@OmidJa
Copy link
Author

OmidJa commented Apr 26, 2019

Hi Alana,
Well actually this vcf file was generated by someone else without considering some options, so maybe that is the most probable reason of the observed differences and I will regenerate it.
Thanks for sending the link of the paper and your support.

Cheers,
Omid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants