-
Notifications
You must be signed in to change notification settings - Fork 28
Common issues
-
Verify that your JSON configuration files are syntactically correct before invoking GenomicsDB tools
The GenomicsDB tools assume that you provide a syntactically correct JSON - always validate your JSON files using tools such as json_verify before invoking GenomicsDB tools.
cat <json_file> | json_verify
-
If you are using GenomicsDB version < 0.8.0, you cannot move/copy the TileDB workspace/array directory arbitrarily after loading. If you do copy/move files to a different machine, ensure that the absolute path of the workspace/array directory is the same as that on the machine on which data is loaded/imported. This limitation is removed in version 0.8.0.
-
I have prepared my JSON files correctly, yet I get an exception with the following message:
Could not open vid mapping file "~/<directory>/vid_mapping_file.json OR Could not open callsets file "~/<directory>/callset_mapping_file.json
The character '~' is interpreted by shells (bash, tcsh) as the home directory of the user - it is NOT interpreted by file I/O system calls invoked by the TileDB/GenomicsDB executables/libraries. Hence, you must specify the path of the file without special characters that are interpreted by shells.
Correct examples:
/home/<user>/<directory>/vid_mapping_file.json OR ../../<directory>/vid_mapping_file.json
-
I get an exception with the following message:
Unhandled overlapping variants at columns <col1> and <col2> for row <row>
TileDB/GenomicsDB cannot deal with overlapping variants within a single sample - more details documented here. Workarounds exist for dealing with overlapping deletions and gVCF reference blocks (intervals with <NON_REF> as the only alternate allele). When overlapping variants which are neither deletions nor reference blocks are found in the input VCF file, then the above exception is thrown. Check whether bcftools can help you - see point 2 in the section on organizing your data.
-
I have both row_partitions and column_partitions in my loader JSON file and I get an exception message:
Cannot have both "row_partitions" and "column_partitions" simultaneously in the JSON file
A TileDB array in the context of GenomicsDB can be partitioned by rows or columns but not both simultaneously - see this page for more information.
-
I have setup all my JSON files correctly, but the import program finishes almost immediately without importing any data from my VCFs:
There could be many reasons, but here are the common issues we have seen users running into:
- Contig/chromosome names don't match in the vid_mapping_file and the input VCFs: The contig/chromosome names in the VCF and the vid mapping JSON file MUST match EXACTLY. For example, if the vid file has a contig named "1" (as per the 1000 genomes naming convention) while the VCF has a contig named "chr1" (as per the UCSC convention), GenomicsDB will ignore all data corresponding to "chr1".
-
I have setup all my JSON files correctly, but the import program doesn't load data for some of the samples:
There could be many reasons, but here are the common issues we have seen users running into:
- Incorrect value(s) of idx_in_file in the callset_mapping_file: Note that row_idx is the globally unique value of the TileDB row index corresponding to a given sample/CallSet. idx_in_file is useful mostly for multi-sample VCFs and specifies the index of the sample in a given VCF. For single sample VCFs, this field should be 0 (or omitted altogether).
- Incorrect partition bounds in the loader JSON or incorrectly specified partition index in the command line: Read the section on running the program in the import data wiki section. Also, please re-check your partition bounds in the loader JSON.
-
I see an incorrect cell order found error as:
$ vcf2tiledb loader.json terminate called after throwing an instance of 'VCF2TileDBException' what(): VCF2TileDBException : Incorrect cell order found - cells must be in column major order. Previous cell: [ 0, 114111 ] current cell: [ 0, 114111 ] Aborted
The error occurs if alleles at the same position span across multiple lines, for example
chrX 114112 . TCT T 999 PASS . GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0 chrX 114112 . TCT TTT 999 PASS . GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
The fix is to run bcftools norm as described here which will merge the alleles as
chrX 114112 . TCT T,TTT 999 PASS . GT:DP:GQ:MIN_DP:PL 0/2:0:0:0:0,0,0,0,0,0
-
I see an error message:
Cannot open VCF/BCF file <path.vcf.gz>
even when the file and its index exist. What's going on?
If you are importing data from many files (>1000), then it's likely that you are hitting the limit on the number of open files set in your machine(s). Find out how to increase the limit.
- Overview of GenomicsDB
- Compiling GenomicsDB
-
Importing variant data into GenomicsDB
- Create a TileDB workspace
- Importing data from VCFs/gVCFs into TileDB/GenomicsDB
- Importing data from CSVs into TileDB/GenomicsDB
- Incremental import into TileDB/GenomicsDB
- Overlapping variant calls in a sample
- Java interface for importing VCF/CSV files into TileDB/GenomicsDB
- Dealing with multiple GenomicsDB partitions
- Querying GenomicsDB
- HDFS or S3 or GCS support in GenomicsDB
- MPI with GenomicsDB
- GenomicsDB utilities
- Try out with Docker
- Common issues
- Bug report
- External Contributions