Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vcf windows #61

Merged
merged 37 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
9288a3e
check vcf parser
dramanica Oct 18, 2024
34019d9
Merge branch 'main' into vcf_windows
dramanica Oct 19, 2024
80f8c63
test vcf on windows
dramanica Oct 19, 2024
24004ef
load all
dramanica Oct 19, 2024
8732195
devtools
dramanica Oct 19, 2024
cb11a1d
fix tests
dramanica Oct 19, 2024
b5774f8
fix signedness warnings
dramanica Oct 19, 2024
bfac505
more diagnostics
dramanica Oct 19, 2024
64cf3af
additional tests
dramanica Oct 19, 2024
79d9e88
print out
dramanica Oct 19, 2024
d55762f
More diagnostics
dramanica Oct 19, 2024
f90b587
Update test_gen_tibble.R
dramanica Oct 19, 2024
7333951
rerun diagnostics
dramanica Oct 21, 2024
ba89655
Merge remote-tracking branch 'origin/vcf_windows' into vcf_windows
dramanica Oct 21, 2024
1413dd8
trying test message using info
eviecarter33 Oct 23, 2024
0385db2
adding test repeats to try and catch error
eviecarter33 Oct 23, 2024
788ad76
move failing test
eviecarter33 Oct 23, 2024
5bf64bd
Merge branch 'main' into vcf_windows
dramanica Nov 20, 2024
34c85fa
simplify messages when testing
dramanica Nov 20, 2024
f8b0291
new test
dramanica Nov 20, 2024
3968a04
fix cpp parser for mixed ploidy markers
dramanica Nov 21, 2024
91d144e
better vcf fix for mixed ploidy
dramanica Nov 21, 2024
5c3f70d
remove old parsing for :
dramanica Nov 21, 2024
5fe2ac1
make sure that max_ploidy is enough
dramanica Nov 21, 2024
3e546ea
clean up
dramanica Nov 21, 2024
c6e2f54
vcf haploid first marker test
eviecarter33 Nov 29, 2024
95c4f43
catch ploidy problems in vcf
dramanica Nov 29, 2024
7e5881e
Update documentation for vcf parsers, spell check error message
eviecarter33 Nov 29, 2024
f6de03f
Haplopid marker in middle of vcf test, and chr_int fix
eviecarter33 Nov 29, 2024
8632a55
haploid vcf updated
eviecarter33 Nov 29, 2024
eff9af5
Merge branch 'main' into vcf_windows
dramanica Nov 29, 2024
c107e80
typo
dramanica Nov 29, 2024
c5b6ef6
update casting to int for chromosome
dramanica Nov 29, 2024
5330b87
temp fix to using info from bigsnpr
dramanica Nov 29, 2024
5079cbb
fix loci_ld_clump
dramanica Nov 29, 2024
fe2e45e
small doc update
dramanica Dec 1, 2024
61e2661
improved clumping test
eviecarter33 Dec 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion R/gen_tibble.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,17 @@
#' A `gen_tibble` stores genotypes for individuals in a tidy format. DESCRIBE
#' here the format
#'
#' When loading packedancestry files, missing alleles will be converted from
#'- *VCF* files: the fast `cpp` parser is used by default. Both `cpp` and `vcfR` parsers
#' attempt to establish ploidy from the first variant; if that variant is found in a
#' sex chromosome (or mtDNA), the parser will fail with 'Error: a genotype has more
#' than max_ploidy alleles...'. To use the fast parser, change the order of variants
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'To parse the vcf file, ' as either parser will then work, not just the fast on.

#' so that the first chromosome is an autosome using a tool such as `vcftools`.
#' Currently, only biallelic SNPs are supported. If haploid variants (e.g. sex
#' chromosomes) are included in the vcf, they are not transformed into homozygous
#' calls. Instead, reference alleles will be counted as 0 and alternative alleles
#' will be counted as 1.
#'
#' - *packedancestry* files: When loading *packedancestry* files, missing alleles will be converted from
#' 'X' to NA
#'
#' @param x can be:
Expand Down
2 changes: 1 addition & 1 deletion R/vcf_to_fbm_vcfR.R
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ poly_genotype_dosage <- function (x, max_ploidy){
if (dosage<(max_ploidy+1)){
return(as.raw(dosage))
} else{
stop("a genotype has more than max_ploidy alleles. We estimate max_plody from the first variant in the vcf file, make sure that variant is representative of ploidy (e.g. it is not on a sex chromosome).")
stop("a genotype has more than max_ploidy alleles. We estimate max_ploidy from the first variant in the vcf file, make sure that variant is representative of ploidy (e.g. it is not on a sex chromosome).")
}
} else {
return(as.raw(max_ploidy+1))
Expand Down
2 changes: 1 addition & 1 deletion src/vcf_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ int countAlternateAlleles(const std::string &genotype, const size_t missingValue
// Rcout<<pos_i<<"\t"<<missingValue<<std::endl;
// }
if (pos_i> ((missingValue-1)*2)){
Rcpp::stop("a genotype has more than max_ploidy alleles. We estimate max_plody from the first variant in the vcf file, make sure that variant is representative of ploidy (e.g. it is not on a sex chromosome).");
Rcpp::stop("a genotype has more than max_ploidy alleles. We estimate max_ploidy from the first variant in the vcf file, make sure that variant is representative of ploidy (e.g. it is not on a sex chromosome).");
}

//
Expand Down
2 changes: 1 addition & 1 deletion tests/testthat/test_gen_tibble.R
Original file line number Diff line number Diff line change
Expand Up @@ -638,7 +638,7 @@ test_that("vcf's with haploid markers first give errors",{
# the cpp parser catches the problem
expect_error(pop_a_vcf_gt_hap_cpp <- gen_tibble(vcf_path_haploid, quiet=TRUE,backingfile = tempfile(), parser="cpp"),
"a genotype")
# vcfR fails to raise an error
# vcfR catches the problem
expect_error(pop_a_vcf_gt_hap_vcfR <- gen_tibble(vcf_path_haploid, quiet=TRUE,backingfile = tempfile(), parser="vcfR"),
"a genotype")
})
Expand Down