This repository has been archived by the owner on Jan 3, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 28
Useful external tools
Karthik Gururaj edited this page Sep 8, 2016
·
7 revisions
Useful bcftools commands
bcftools norm -m +any [-O <output_format> -o <output>] <input_file>
Output format can be one of the following strings: "z" (compressed VCF),"b" (compressed BCF) or "bu" (uncompressed BCF). If nothing is specified, the default is uncompressed VCF. If the -o parameter is omitted, the output is printed to stdout.
bcftools view [-O <output_format> -o <output>] <input_file>
The output file name should ideally end with the suffix ".[vcf|bcf].gz".
bcftools index [-f] <file>
The above command will create a CSI index. To produce a tabix index:
bcftools index [-f] -t <file>
The -f parameter will cause bcftools to overwrite an existing index file.
NOTE: Older versions of bcftools required the user to pass -m0 instead of -t for creating a tabix index.
Sorting CSV files before an import
You must have GNU coreutils installed in your system.
sort [-T <tmp_directory>] -t, -k2,2n -k1,1n -o <sorted_output.csv> <input.csv>
If you have a list of sorted CSV files and wish to merge them into a single sorted CSV file:
sort [-T <tmp_directory>] -m -t, -k2,2n -k1,1n -o <sorted_output.csv> <input_csv_list>
- Overview of GenomicsDB
- Compiling GenomicsDB
-
Importing variant data into GenomicsDB
- Create a TileDB workspace
- Importing data from VCFs/gVCFs into TileDB/GenomicsDB
- Importing data from CSVs into TileDB/GenomicsDB
- Incremental import into TileDB/GenomicsDB
- Overlapping variant calls in a sample
- Java interface for importing VCF/CSV files into TileDB/GenomicsDB
- Dealing with multiple GenomicsDB partitions
- Querying GenomicsDB
- HDFS or S3 or GCS support in GenomicsDB
- MPI with GenomicsDB
- GenomicsDB utilities
- Try out with Docker
- Common issues
- Bug report
- External Contributions