Skip to content

SQL-like query language for the SAM/BAM file format

License

Notifications You must be signed in to change notification settings

maragkakislab/samql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

samql Go Report Card GoDoc

SQL-like query language for the SAM/BAM file format

Install

Download the latest executable:

Latest.

Otherwise, to install the Go library and executable:

go get github.com/maragkakislab/samql/...

Objective

To provide a command line utility and a clean API for filtering SAM/BAM files using simple SQL-like commands. The samql command showcases the envisioned functionality.

Usage

Filters a SAM/BAM file using the SQL clause provided
samql 1.7
Usage: samql [--where WHERE] [--count] [--sam] [--parr PARR] [--obam] INPUT [INPUT ...]

Positional arguments:
  INPUT                  file (- for STDIN)

Options:
  --where WHERE          SQL clause to match records
  --count, -c            print only the count of matching records
  --sam, -S              interpret input as SAM, otherwise BAM
  --parr PARR, -p PARR   Number of cores for parallelization. Uses all available, if not provided.
  --obam, -b             Output BAM
  --help, -h             display this help and exit
  --version              display version and exit

Examples

# Simple
samql --where "RNAME = chr1" test.bam    # Reference name is "chr1"
samql --where "QNAME = read1" test.bam   # Query name is "read1"
samql --where "POS > 100" test.bam       # Position (0-based) greater than 100
samql --where "REVERSE" test.bam         # Negative strand
samql --where "FLAG & 16 = 16" test.bam  # Ditto using flag arithmetics

# More than one files
samql --where "REVERSE" test1.bam test2.bam # Reads are returned in the order of the files

# Regex
samql --where "CIGAR =~ /^15M/" test.bam # Alignment starts with 15 matches

# More complex
samql --where "RNAME = chr1 OR QNAME = read1 AND POS > 100" test.bam

# Just counting
samql -c --where "RNAME = chr1" test.bam

# Very complex
# Uniquely mapped reads, with first pair on chr1 after
# position 1000000 and second pair on chr1 or chrX that
# start with 15 matches/mismatches, are shorter than 75 nts
# or begin with an ATG and are located on the reverse strand.

samql --where "(RNAME = 'chr1' AND POS > 1000000) AND \
               (RNEXT = 'chr1' OR RNEXT = 'chrX') AND \
               NH:i = 1 AND \
               CIGAR =~ /^15M/ AND \
               (LENGTH < 75 OR SEQ =~ /^ATG/) AND \
               PAIRED AND REVERSE"

Keywords

QNAME         // QNAME corresponds to the SAM record query name.
FLAG          // FLAG corresponds to the SAM record alignment flag.
RNAME         // RNAME corresponds to the SAM record reference name
POS           // POS corresponds to the SAM record position (0-based).
MAPQ          // MAPQ corresponds to the SAM record mapping quality.
CIGAR         // CIGAR corresponds to the SAM record CIGAR string.
RNEXT         // RNEXT corresponds to the reference name of the mate read.
PNEXT         // PNEXT corresponds to the position of the mate read.
TLEN          // TLEN corresponds to SAM record template length.
SEQ           // SEQ corresponds to SAM record segment sequence.
QUAL          // QUAL corresponds to SAM record quality.
LENGTH        // LENGTH corresponds to the alignment length.
PAIRED        // PAIRED corresponds to SAM flag 0x1.
PROPERPAIR    // PROPERPAIR corresponds to SAM flag 0x2.
UNMAPPED      // UNMAPPED corresponds to SAM flag 0x4.
MATEUNMAPPED  // MATEUNMAPPED corresponds to SAM flag 0x8.
REVERSE       // REVERSE corresponds to SAM flag 0x10.
MATEREVERSE   // MATEREVERSE corresponds to SAM flag 0x20.
READ1         // READ1 corresponds to SAM flag 0x40.
READ2         // READ2 corresponds to SAM flag 0x80.
SECONDARY     // SECONDARY corresponds to SAM flag 0x100.
QCFAIL        // QCFAIL corresponds to SAM flag 0x200.
DUPLICATE     // DUPLICATE corresponds to SAM flag 0x400.
SUPPLEMENTARY // SUPPLEMENTARY corresponds to SAM flag 0x800.
END           // END corresponds to the alignment end.

API example

// Open github.com/biogo/hts/sam reader
f := os.Open("test.sam")
sr, _ := sam.NewReader(f)

// Do the filtering
r := samql.NewReader(sr)
filter, _ := samql.Where("POS = 1")
r.Filters = append(r.Filters, filter)
for {
	rec, err := r.Read()
	if err != nil {
		if err == io.EOF {
			break
		}
		panic(err)
	}

	// Do sth with rec
}