ADAM Changelog

Version 0.31.0

Closed issues:

Add deprecated annotations for code to be removed to support Spark 3 #2254
Update bdg-utils dependency version to 0.2.16 #2252
Bump Apache Spark dependency version to 2.4.5 #2248
FastqRecordConvert incompatible with single tube long fragment read headers #2246
Bam files with no unmapped reads fails to sort #2242
Unit test failure when building from release tarball #2241
Adam without HDFS #2238
Jenkins build status icon link is broken #2228
Write block-gzipped (bgzf) feature formats #2191
adam-submit is not exiting until I hit ctrl+C #2040
WARN VariantContextConverter:924 - Ran into Array Out of Bounds when accessing indices 0,1,2 of genotype . #2024
Add doc for running on HPC with PBS #2002
loadFastq with paired gzipped FASTQ files fails via s3a URLs #1855
Where to put lift over function #1811
Add transform to fix chromosome prefixes to genomic RDDs and CLIs #1757
Support using Spark-BAM to load BAM files #1683
Handling Validation Stringency without repeated code #1572
New model PartitionMap for Array[Option[(ReferenceRegion, ReferenceRegion)]] #1558
Revisit double-negative command line options (e.g. -disable_fast_concat) #1503
Improve test coverage for SAMRecord<->AlignmentRecord #1284
Allow alphabets to canonicalize strings #797
Update MdTag.getReference for CIGAR N #742
Replace contig length maps with sequence dictionary #572
Use tool like Scala Refactoring to enforce import guidelines #445

Merged and closed pull requests:

[ADAM-2254] Add deprecated annotations for code to be removed to support Spark 3 #2256 (heuermh)
[ADAM-2252] Update bdg-utils dependency version to 0.2.16 #2253 (heuermh)
[ADAM-2248] Bump Apache Spark dependency version to 2.4.5 #2249 (heuermh)
[ADAM-2241] Commit template substitution may not be available if building from tarball #2243 (heuermh)
[ADAM-2228] Remove Jenkins build status badge #2240 (heuermh)
remove 2.7 support checks #2222 (akmorrow13)
[ADAM-2023] Implemented Duplicate Marking algorithm in Spark SQL #2045 (jonpdeaton)
use readlink to properly source source dir #2036 (mtdeguzis)
Don't discard unmapped reads in indel realignment #2019 (pauldwolfe)
Refactor/mark buckets #2015 (jondeaton)
Adding a BamLoader class to have only 1 header parse for multiple ind… #1966 (ffinfo)
Added additional arguments to GenomicRDD.pipe() #1758 (gunjanbaid)
Migrate bdg-formats to new adam-formats module. #1689 (heuermh)
[ADAM-1683] Pull in Spark-BAM as a secondary loading path. #1686 (fnothaft)
Add SortedGenomicRDD trait, refactor shuffle joins and pipe #1590 (fnothaft)
[ADAM-1513] Strandedness for FeatureRDDs #1555 (devin-petersohn)

Version 0.30.0

Closed issues:

Github changes plugin used in release script does not use two-factor authentication #2235
Update bdg-formats dependency version to 0.15.0 #2233
7 tests failing on HEAD #2231
BUILD FAILURE - Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:java #2227
GenomicDataset saveAsParquet incorrectly named parameter compressCodec #2224
Add printAttributes methods for Reads, Sequences, Slices #2219
Add default Set.empty to printAttributes key method parameter #2218
Add Avro-friendly ctrs in rdd.variant package #2215
Cannot resolve adam-shade-spark2_2.11 dependency #2211

Merged and closed pull requests:

[ADAM-2235] Update github-changes-maven-plugin dependency version to 1.1 #2236 (heuermh)
[ADAM-2233] Update bdg-formats dependency version to 0.15.0. #2234 (heuermh)
Update maven plugin dependency versions. #2230 (heuermh)
[ADAM-2224] Complete refactoring of compressionCodec for named parameter. #2229 (heuermh)
[ADAM-2224] Use compressionCodec for named parameter. #2226 (heuermh)
[ADAM-2219] Add printAttributes methods for Reads, Sequences, Slices #2223 (heuermh)
[ADAM-2218] Add default Set.empty to printAttributes key method parameter. #2220 (heuermh)
Rename AlignmentRecord to Alignment. #2217 (heuermh)
[ADAM-2215] Add Avro-friendly ctrs to rdd.variant package #2216 (heuermh)

Version 0.29.0

Closed issues:

Bump bdg-formats dependency version to 0.14.0 #2208
Bump Apache Spark dependency version to 2.4.4 #2202
Add missing loadVariantContexts(String, ValidationStringency) method #2197
Jenkins builds failing due to Coveralls API submission #2194
Confirm block-gzipped (bgzf) interleaved FASTQ is supported #2193
TransformGenotype/Variant do not support compressed VCF #2190
Add htsjdk conversion methods to VariantContextDataset #2189
TransformVariants is missing partition arguments #2188
StackOverflowError when saving to BAM in adam-shell #2186
loadFastaDna usage not obvious due to default method parameter #2183
loadFastaDna does not seem to work #2182
kryo buffer overflow when converting fastas from CLI to adam #1660

Merged and closed pull requests:

[ADAM-2208] Bump bdg-formats dependency version to 0.14.0 #2209 (heuermh)
Add FASTA in formatter for sequence datasets #2207 (heuermh)
Remove Avro 1.8.x download step from Jenkins Scala 2.12 installation. #2206 (heuermh)
Use qualityScores for base quality scores #2205 (heuermh)
[ADAM-2189] Add htsjdk conversion methods to VariantContextDataset #2204 (heuermh)
[ADAM-2202] Bump Apache Spark dependency version to 2.4.4. #2203 (heuermh)
[ADAM-2183] Drop default value for maximumLength #2201 (heuermh)
[ADAM-2197] Add missing loadVariantContexts(String, ValidationStringency) method #2200 (heuermh)
[ADAM-2194] Disable coveralls reporting from Jenkins test script #2196 (heuermh)
[ADAM-2188] Add partition cli args to TransformVariants,Features. #2192 (heuermh)
Bump htsjdk dependency version to 2.19.0 #2184 (heuermh)
Update required Maven version in docs #2181 (heuermh)

Version 0.28.0

Closed issues:

Bump bdg-formats dependency version to 0.13.0 #2177
Rename reads to alignments in methods where appropriate #2172
Add command line option re: creating references from FASTA sources #2168
Add command line support for loading references in TransformFeatures #2167
Add load methods for data frames #2159
Transform VCF to adam file not found exception. #2076
NoClassDefFoundError: javax/tools/ToolProvider on openjdk 10.0.2 #2030
NotSerializableException: com.netflix.servo.monitor.LongGauge #1952
Should NucleotideContigFragmentRDD create sequence dictionary on load? #1894
converting fasta to adam eats a huge ammount of time and memory #1891
Support minPartitions parameter across load calls #1792
make reading fasta less memory hungry #1458
Improve unit test coverage for NucleotideContigFragmentRDD #1413
Support for INSDC Sequence records (i.e., Genbank/EMBL format)? #1219

Merged and closed pull requests:

[ADAM-2177] Bump bdg-formats dependency version to 0.13.0 #2178 (heuermh)
[ADAM-2172] Rename reads to alignments in methods where appropriate #2176 (heuermh)
[ADAM-1891] Reimplement FASTA sequence and slice converters for performance #2175 (heuermh)
[ADAM-2168] Add command line option re: creating references from FASTA sources #2170 (heuermh)
[ADAM-2167] Add command line support for loading references in TransformFeatures #2169 (heuermh)
bump adam-python version #2165 (akmorrow13)
Convert fragment dataset to alignment dataset directly #2162 (heuermh)
[ADAM-2159] Add load methods for data frames #2158 (heuermh)
Post 0.27.0 release cleanup and doc fixes. #2155 (heuermh)
Add direct conversion from DatasetBoundFragmentRDD to DatasetBoundAli… #2016 (henrydavidge)
Add ADAMContext APIs to create genomic RDDs from dataframes #2000 (henrydavidge)
Adding ReadRDD, SequenceRDD, and SliceRDD. #1895 (heuermh)

Version 0.27.0

Closed issues:

Add Scala 2.12 artifacts to release script #2153
Tried to access method org.apache.avro.specific.SpecificData.()V from class ProcessingStep #2151
Update maven-jar-plugin dependency version to 3.1.2 #2147
Homebrew and Bioconda packages fail against Spark 2.4.2 #2146
Add Spark 2.4.3 and Scala 2.12 to Jenkins build #2145
Can encounter empty reduce when BAM header fails validation #2143
Build failing in jenkins from Spark 2.2.3 #2139
Make SamRecordConverter public #2138
python API does not match API #2127
Error when run : mvn install #2123
Always use Spark SQL in GenomicDataset read path #2114
Update bdg-utils dependency version to 0.2.14 #2106
NoSuchMethodError: org.apache.parquet.column.ParquetProperties.getAllocator()Lorg/apache/parquet/bytes/ByteBufferAllocator #2098
ClassNotFoundException: org.apache.avro.message.BinaryMessageEncoder #2091
Release script needs to touch Version in R DESCRIPTION file #2089
org.apache.avro.SchemaParseException: Can't redefine: list #2058
Support Spark 2.4 and Scala 2.12 #2044
Fail early when output directory already exists #2034
NoClassDefFoundError o.a.parquet.hadoop.metadata.CompressionCodecName #1742
Log with parameterized messages consistently for performance #1712

Merged and closed pull requests:

[ADAM-2153] Add Scala 2.12 artifacts to release script #2154 (heuermh)
[ADAM-2089] Bump Version in R DESCRIPTION file #2152 (heuermh)
[ADAM-2145] Add Spark 2.4.3 and Scala 2.12 to Jenkins build #2149 (heuermh)
[ADAM-2147] Update maven-jar-plugin dependency version to 3.1.2. #2148 (heuermh)
[ADAM-2143] Use fold instead of reduce when loading SAM/BAM/CRAM headers #2144 (fnothaft)
Remove parquet-scala dependency from dependencyManagement. #2142 (heuermh)
[ADAM-2139] Update Spark version to 2.3.3 for Jenkins test #2141 (heuermh)
[ADAM-1712] Replace utils.Logger with grizzled.slf4j.Logger #2136 (heuermh)
[ADAM-2034] Check output path is writeable before running transformations #2135 (heuermh)
jenkins scripts deletes conda envs #2133 (akmorrow13)
Update htsjdk dependency version to 2.18.2 #2132 (heuermh)
[ADAM-2127] Update python doc per GenomicRdd --> GenomicDataset change #2128 (heuermh)
Update python and R versions. #2126 (heuermh)
use parquet-scala_2.11 fork #2108 (ryan-williams)
[ADAM-2106] Update bdg-utils dependency version to 0.2.14 #2107 (heuermh)
[ADAM-2044] Update Spark version to 2.4.3, add move to Scala 2.12 script #2056 (heuermh)

Version 0.26.0

Closed issues:

Bump Spark dependency to version 2.3.3 #2120
Update Spark version on Jenkins to 2.2.3 #2115
Inverted duplicates are not found in mark duplicates #2102
Py4JError: org.bdgenomics.adam.algorithms.consensus.ConsensusGenerator.fromKnowns does not exist in the JVM #2099
Update Bioconda recipe for ADAM 0.25.0 #2088
Update Homebrew formula for ADAM 0.25.0 #2087
Error: Dependency package(s) 'SparkR' not available #2086
Java-friendly indel realignment method doesn't allow passing reference #2013
Use consistent (Scala-specific) (Java-specific) qualifiers in method scaladoc #1986
Clarify GenomicRDD vs. GenomicDataset name #1954
Support validation stringency in out formatters #1949
Compute coverage by sample #1498

Merged and closed pull requests:

Bump bdg-formats dependency to version 0.12.0. #2124 (heuermh)
[ADAM-2120] Bump Spark dependency to version 2.3.3. #2121 (heuermh)
Filter supplemental reads from scoring #2119 (pauldwolfe)
[ADAM-2115] Update Spark version on Jenkins to 2.2.3. #2118 (heuermh)
Refactor AlignmentRecord, RecordGroup, and ProcessingStep #2113 (heuermh)
removed anaconda requirement for venv during jenkins test #2109 (akmorrow13)
Propagate read negative flag to SAM records for unmapped reads #2105 (henrydavidge)
Add consensus targets to realignment targets #2104 (pauldwolfe)
[ADAM-2099] Add python realignIndelsFromKnownIndels method #2103 (heuermh)
[ADAM-2102] Inverted duplicates are not found in mark duplicates #2101 (pauldwolfe)
Rename contig to reference #2100 (heuermh)
[ADAM-1986] Add java-specific methods where missing. #2097 (heuermh)
[ADAM-2013] Add java-friendly indel realignment method that accepts reference. #2095 (heuermh)
Use build-helper-maven-plugin for build timestamp #2093 (heuermh)
bump adam-python version to 0.25.0a0 #2092 (akmorrow13)
[ADAM-2085] Update R installation docs re: libgit2 and SparkR. #2090 (heuermh)
[ADAM-1954] Complete refactoring GenomicRDD to GenomicDataset. #1981 (heuermh)
[ADAM-1949] Support validation stringency in out formatters. #1969 (heuermh)

Version 0.25.0

Closed issues:

Expand illumina metadata regex to include "N" character #2079
Remove support for Hadoop 2.6 #2073
NumberFormatException: For input string: "nan" in VCF #2068
Support Spark 2.3.2 #2062
Arrays should be passed to HTSJDK in the JVM primitive type #2059
toCoverage() function for alignments does not distinguish samples #2049
Building from adam-core module directory fails to generate Scala code for sql package #2047
Data Sets #2043
saveAsBed writes missing score values as '.' instead of '0' #2039
Fix GFF3 parser to handle trailing FASTA #2037
Add StorageLevel as an optional parameter to loadPairedFastq #2032
Error: File name too long when building on encrypted file system #2031
Fail to transform a VCF file containing multiple genome data (Muliple sample) #2029
Dataset and RDD constructors are missing from CoverageRDD #2027
How to create a single RDD[Genotype] object out of multiple VCF files? #2025
ReadTheDocs github banner is broken #2020
-realign_indels throws serialization error with instrumentation enabled #2007
Support 0 length FASTQ reads #2006
Speed of Reading into ADAM RDDs from S3 #2003
Support Python 3 #1999
Unordered list of region join types in doc is missing nested levels #1997
Add VariantContextRDD.saveAsPartitionedParquet, ADAMContext.loadPartitionedParquetVariantContexts #1996
VCF annotation question #1994
Fastq reader clips long reads at 10,000 bp #1992
adam-submit Error: Number of executors must be a positive number on EMR 5.13.0/Spark 2.3.0 #1991
Test against Spark 2.3.1, Parquet 1.8.3 #1989
END does not get set when writing a gVCF #1988
Support saving single files to filesystems that don't implement getScheme #1984
Add additional filter by convenience methods #1978
Limiting FragmentRDD pipe paralellism #1977
Consider javadoc.io for API documentation linking #1976
FASTQ Reader leaks connections #1974
Update bioconda recipe for version 0.24.0 #1971
Update homebrew formula at brewsci/homebrew-bio for version 0.24.0 #1970
loadPartitionedParquetAlignments fails with Reference.all #1967
Caused by: java.lang.VerifyError: class com.fasterxml.jackson.module.scala.ser.ScalaIteratorSerializer overrides final method withResolved #1953
FASTQ input format needs to support index sequences #1697
Changelog must be edited and committed manually during release process #936

Merged and closed pull requests:

added pyspark mock modules for API documentation #2084 (akmorrow13)
Added mock python modules for API python documentation #2082 (akmorrow13)
[ADAM-2079] Expand illumina metadata regex to include "N" character #2081 (pauldwolfe)
ADAM-2079 Added "N" to regexs for illumina metadata #2080 (pauldwolfe)
Update docs with new template and documentation #2078 (akmorrow13)
[ADAM-1992] Make maximum FASTQ read length configurable. #2077 (heuermh)
[ADAM-2059] Properly pass back primitive typed arrays to HTSJDK. #2075 (heuermh)
Update dependency versions, including htsjdk to 2.16.1 and guava to 27.0-jre #2072 (heuermh)
[ADAM-1999] Support Python 3 #2070 (akmorrow13)
[ADAM-2068] Prevent NumberFormatException for nan vs NaN in VCF files. #2069 (heuermh)
Update python MAKE file #2067 (Georgehe4)
Update python MAKE file #2066 (Georgehe4)
Update jenkins script to test python 3.6 #2060 (Georgehe4)
[ADAM-2062] Update Spark version to 2.3.2 #2055 (heuermh)
Clean up fields and doc in fragment. #2054 (heuermh)
[ADAM-2037] Support GFF3 files containing FASTA formatted sequences. #2053 (heuermh)
modified CoverageRDD and FeatureRDD to extend MultisampleGenomicDataset #2051 (akmorrow13)
Multi-sample coverage #2050 (akmorrow13)
[ADAM-2047] Use source directory relative to project.basedir for adam codegen. #2048 (heuermh)
[ADAM-2039] Adding support for writing BED format per UCSC definition #2042 (heuermh)
Update Jenkins Spark version to 2.2.2 #2035 (akmorrow13)
[ADAM-2032] Add StorageLevel as an optional parameter to loadPairedFastq #2033 (heuermh)
[ADAM-2027] Add RDD and Dataset constructors to CoverageRDD. #2028 (heuermh)
Allow for export of query name sorted SAM files #2026 (karenfeng)
[ADAM-2020] Fix ReadTheDocs Github banner. #2021 (fnothaft)
[ADAM-1988] Add copyVariantEndToAttribute method to support gVCF END attribute … #2017 (heuermh)
[ADAM-936] Use github-changes-maven-plugin to update CHANGES.md. #2014 (heuermh)
[ADAM-1992] Make maximum FASTQ read length configurable. #2011 (fnothaft)
[ADAM-1697] Expand Illumina metadata regex to cover interleaved index sequences. #2010 (heuermh)
[ADAM-2007] Make IndelRealignmentTarget implement Serializable. #2009 (fnothaft)
[ADAM-2006] Support loading 0-length reads as FASTQ. #2008 (fnothaft)
[ADAM-1697] Expand Illumina metadata regex to cover index sequences #2004 (pauldwolfe)
[ADAM-1996] Load and save VariantContexts as partitioned Parquet. #2001 (heuermh)
[ADAM-1997] Nest list of region join types in joins doc. #1998 (heuermh)
[ADAM-1877] Add filterToReferenceName(s) to SequenceDictionary. #1995 (heuermh)
[ADAM-1984] Support file systems that don't set the scheme. #1985 (fnothaft)
[ADAM-1978] Add additional filter by convenience methods. #1983 (heuermh)
Adding printAttribute methods for alignment records, features, and samples. #1982 (heuermh)
Fix partitioning code to use Long instead of Int #1980 (fnothaft)
[ADAM-1976] Adding core API documentation link and badge. #1979 (heuermh)
[ADAM-1974] Close unclosed stream in FastqInputFormat. #1975 (fnothaft)
Set defaults to schemas #1972 (ffinfo)
Add loadPairedFastqAsFragments method. #1866 (heuermh)
Adding loadPairedFastqAsFragments method #1828 (ffinfo)

Version 0.24.0

Closed issues:

Phred values from 156–254 do not round trip properly between log space #1964
Support VCF lines with positions at 0 #1959
Don't initialize non-ref values to Int.MinValue #1957
Support downsampling in recalibration #1955
Cannot waive validation stringency for INFO Number=.,Type=Flag fields #1939
Clip phred scores below Int.MaxValue #1934
ADAMContext.getFsAndFilesWithFilter should throw exception if paths null or empty #1932
Bump to Spark 2.3.0 #1931
util.FileExtensions should be public for use downstream in Cannoli #1927
Reduce logging level for ADAMKryoRegistrator #1925
Revisit performance implications of commit 1eed8e8 #1923
add akmorrow13 to PyPl for bdgenomics.adam #1919
Read the Docs build failing with TypeError: super() argument 1 must be type, not None #1917
Bump Hadoop-BAM dependency to 7.9.2. #1915
cannot run pyadam from adam distribution 0.23.0 #1914
adam2fasta/q are missing asSingleFile, disableFastConcat #1912
Pipe API doesn't properly handle multiple arguments and spaces #1909
Bump to HTSJDK 2.13.2 #1907
S3A error: HTTP request: Timeout waiting for connection from pool #1906
InputStream passed to VCFHeaderReader does not get closed #1900
Support INFO fields set to missing #1898
CLI to transfer between cloud storage and HDFS #1896
Jenkins does not run python or R tests #1889
pyadam throws application option error #1886
ReferenceRegion in python does not exist #1884
Caching GenomicRDD in pyspark #1883
adam-submit aborts if ADAM_HOME is set #1882
Allow piped commands to timeout #1875
loadVcf does not dedupe sample ID #1874
Add coverage command for reporting read coverage #1873
Only python 2? #1871
Support VariantContextRDD from SQL #1867
Cannot find find-adam-assembly.sh in bioconda build #1862
_jvm.java.lang.Class.forName does not work for certain configurations #1858
Formatting error in CHANGES.md #1857
Various improvements to readthedocs documentation #1853
add filterByOverlappingRegion(query: ReferenceRegion) to R and python APIs #1852
Support adding VCF header lines from Python #1840
Support loadIndexedBam from Python #1836
Add link to awesome list of applications that extend ADAM #1832
loadIndexed bam lazily throws Exception if index does not exist #1830
OAuth credentials for Github in Coveralls configuration are no longer valid #1829
base counts per position #1825
Issues loading BAM files in Google FS #1816
Error when writing a vcf file to Parquet #1810
transformAlignments cannot repartition files #1808
GenotypeRDD should support toVariants method #1806
Add support for python and R in Homebrew formula #1796
Add transformVariantContexts or similar to cli #1793
Issue while using Sorting option #1791
Issue with adam2vcf #1787
Remove explicit <compile> scopes from submodule POMs #1786
java.nio.file.ProviderNotFoundException (Provider "s3" not found) #1732
Accessing GenomicRDD join functions in python #1728
ArrayIndexOutOfBoundsException in PhredUtils$.phredToSuccessProbability #1714
Add ability to specify region bounds to pipe command #1707
Unable to run pyadam, SQLException: Failed to start database 'metastore_db' #1666
SAMFormatException: Unrecognized tag type: ^@ #1657
IndexOutOfBoundsException in BAMInputFormat.getSplits #1656
overlaps considers that Strand.FORWARD cannot overlap with Strand.INDEPENDENT #1650
migration converters #1629
RFC: Removing Spark 1.x, Scala 2.10 support in 0.24.0 release #1597
Eliminate unused ConcreteADAMRDDFunctions class #1580
Add set theory/statistics packages to ADAM #1533
Evaluate Apache Carbondata INDEXED column store file format for genomics #1527
Stranded vs unstranded in getReferenceRegions() for features #1513
Question:How to tranform a line of sam to AlignmentRecord? #1425
Excessive compilation warnings about multiple scala libraries #695
Support Hive-style partitioning #651

Merged and closed pull requests:

[ADAM-1964] Lower point where phred conversions are done using log code. #1965 (fnothaft)
Add utility methods for adam-shell. #1958 (heuermh)
[ADAM-1955] Add support for downsampling during recalibration table generation #1963 (fnothaft)
[ADAM-1957] Don't initialize missing likelihoods to MinValue. #1961 (fnothaft)
[ADAM-1959] Support VCF rows at position 0. #1960 (fnothaft)
[ADAM-651] Implement Hive-style partitioning by genomic range of Parquet backed datasets #1948 (fnothaft)
[ADAM-1914] Python profile needs to be specified for egg to be in distribution. #1946 (fnothaft)
[ADAM-1917] Delete dependency on fulltoc. #1944 (fnothaft)
[ADAM-1917] Try 3: fix Sphinx fulltoc. #1943 (fnothaft)
[ADAM-1917] Set Sphinx version in requirements.txt. #1942 (fnothaft)
[ADAM-1917] Set minimal Sphinx version for Readthedocs build. #1941 (fnothaft)
[ADAM-1939] Allow validation stringency to waive off FLAG arrays. #1940 (fnothaft)
[ADAM-1915] Bump to Hadoop-BAM 7.9.2. #1938 (fnothaft)
[ADAM-1934] Clip phred values to 3233, instead of Int.MaxValue. #1936 (fnothaft)
Ignore VCF INFO fields with number=G when stringency=LENIENT #1935 (jpdna)
[ADAM-1931] Bump to Spark 2.3.0. #1933 (fnothaft)
[ADAM-1840] Support adding VCF header lines from Python. #1930 (fnothaft)
[ADAM-1927] Increase visibility for util.FileExtensions for use downstream. #1929 (heuermh)
[ADAM-1925] Reduce logging level for ADAMKryoRegistrator. #1928 (heuermh)
[ADAM-1923] Revert 1eed8e8 #1926 (fnothaft)
Use SparkFiles.getRootDirectory in local mode. #1924 (heuermh)
[ADAM-651] Implement Hive-style partitioning by genomic range of Parquet backed datasets #1922 (jpdna)
Make Spark SQL APIs supported across all types #1921 (fnothaft)
[ADAM-1909] Refactor pipe cmd parameter from String to Seq[String]. #1920 (heuermh)
Add Google Cloud documentation #1918 (Georgehe4)
[ADAM-1917] Load sphinxcontrib.fulltoc with imp.load_sources. #1916 (akmorrow13)
[ADAM-1912] Add asSingleFile, disableFastConcat to adam2fasta/q. #1913 (heuermh)
[ADAM-651] Hive-style partitioning of parquet files by genomic position #1911 (jpdna)
Minor unit test/style fixes. #1910 (heuermh)
[ADAM-1907] Bump to HTSJDK 2.13.2. #1908 (fnothaft)
[ADAM-1882] Don't abort adam-submit if ADAM_HOME is set. #1905 (fnothaft)
[ADAM-1806] Add toVariants conversion from GenotypeRDD. #1904 (fnothaft)
[ADAM-1882] Return true if ADAM_HOME is set, not exit 0. #1903 (heuermh)
[ADAM-1900] Close stream after reading VCF header. #1901 (fnothaft)
[ADAM-1898] Support converting INFO fields set to empty ('.'). #1899 (fnothaft)
Add Kryo registration for two classes required for Spark 2.3.0. #1897 (jpdna)
[ADAM-1853] Various improvements to readthedocs documentation. #1893 (heuermh)
[ADAM-1889][ADAM-1884] updated ReferenceRegion in python #1892 (akmorrow13)
[ADAM-1889] Run R/Python tests. #1890 (fnothaft)
[ADAM-1886] fix for pyadam to recognize >1 egg file #1887 (akmorrow13)
[ADAM-1883] Python and R caching #1885 (akmorrow13)
[ADAM-1875] Add ability to timeout a piped command. #1881 (fnothaft)
[ADAM-1871] Fix print call that broke python 3 support. #1880 (fnothaft)
[ADAM-1832] Use awesome list style and link to bigdatagenomics/awesome-adam. #1879 (heuermh)
[ADAM-651] Hive-style partitioning of parquet files by genomic position #1878 (jpdna)
[ADAM-1874] Dedupe samples when loading VCFs. #1876 (fnothaft)
Fixes Coverage python API and adds tests #1870 (akmorrow13)
added filterByOverlappingRegion for python #1869 (akmorrow13)
Add command line option for populating nested variant.annotation field in Genotype records. #1865 (heuermh)
Hive partitioned(v4) rebased #1864 (jpdna)
[ADAM-1597] Move to Scala 2.11 and Spark 2.x. #1861 (heuermh)
[ADAM-1857] Fix formatting error due to forward slashes. #1860 (heuermh)
[ADAM-1858] Use getattr instead of Class.forName from python API. #1859 (fnothaft)
[ADAM-1836] Adds loadIndexedBam API to Python and Java. #1837 (fnothaft)
Added check for bam index files in loadIndexedBam #1831 (akmorrow13)
[ADAM-1793] Adding vcf2adam and adam2vcf that handle separate variant and genotype data. #1794 (heuermh)
added adam notebook #1778 (akmorrow13)
[ADAM-1666] SQLContext creation fix for Spark 2.x #1777 (akmorrow13)
Add optional accumulator for VCF header lines to VCFOutFormatter. #1727 (heuermh)
add hive style partitioning for contigName #1620 (jpdna)
Add loadReadsFromSamString function into ADAMContext #1434 (xubo245)

Version 0.23.0

Closed issues:

Readthedocs build error #1854
Add pip release to release scripts #1847
Publish scaladoc script still attempts to build markdown docs #1845
Allow variant annotations to be loaded into genotypes #1838
Specify correct extensions for SAM/BAM output #1834
Fix link anchors and other issues in readthedocs #1822
Sphinx fulltoc is not included #1821
Readme link to bigdatagenomics/lime 404s #1819
Bump to Hadoop-BAM 7.9.1 #1817
LoadVariants Header Format #1815
Right and Left Outer Shuffle Region Join don't match #1813
Pipe command can fail with empty partitions #1807
adam files with outdated formats throw FileNotFoundException #1804
Move GenomicRDD.writeTextRDD outside of GenomicRDD #1803
find-adam-assembly fails to recognize more than 1 jar #1801
tests/testthat.R failed on git head #1799
Run python and R tests conditionally in build #1795
scala-lang should be a provided dependency #1789
loadIndexedBam does an unnecessary union #1784
Release bdgenomics.adam R package on CRAN #1783
Issue with transformVariant // Adam to vcf #1782
Add code of conduct #1779
Reinstantiation of SQLContext in pyadam ADAMContext #1774
Genotypes should only contain the core variant fields #1770
Add SingleFASTQInFormatter #1768
INDEL realigner can emit negative partition IDs #1763
Request for a new release #1762
INDEL realigner generates targets for reads with more than 1 INDEL #1753
Fragment Issue #1752
Variant Caller!!! #1751
Spark Version!! #1750
ReferenceRegion.subtract eliminating valid regions #1747
New Shuffle Join Implementation - Left Outer + Group By Left #1745
command failure after build success #1744
Recalibrate_base_Qualities #1743
Standardize regionFn for ShuffleJoin returned objects #1740
Shuffle, Broadcast Joins with threshold #1739
Adam on Spark 2.1 #1738
Opening up permission on GenericGenomicRDD constructor #1735
Consistency on ShuffleRegionJoin returns #1734
vcf2adam support #1731
Cloud-scale BWA MEM #1730
Aligned Human Genome couldn't convert to Adam #1729
Mark Duplicates #1726
Genomics Pipeline #1724
.fastq Alignment #1723
Is it correct Adam file #1720
.fastQ to .adam #1718
Unable to create .adam from .sam #1717
Add adam- prefix to distribution module name #1716
Python load methods don't have ability to specify validation stringency #1715
NPE when trying to map loadVariants over RDD #1713
Add left normalization of INDELs as an RDD level primitive #1709
Allow validation stringency to be set in AnySAMOutFormatter #1703
InterleavedFastqInFormatter should sort by readInFragment #1702
Allow silencing the # of reads in fragment warning in InterleavedFastqInFormatter #1701
GenomicRDD.toXxx method names should be consistent #1699
Exception thrown in VariantContextConverter.formatAllelicDepth despite SILENT validation stringency #1695
Make GenomicRDD.toString more adam-shell friendly #1694
Add adam-shell friendly VariantContextRDD.saveAsVcf method #1693
change bdgenomics.adam package name for adam-python to bdg-adam #1691
Conflict in bdg-formats dependency version due to org.hammerlab:genomic-loci #1688
Convert and store variant quality field. #1682
Region join shows non-determinism #1680
Shuffle region join throws multimapped exception for unmapped reads #1679
Push validation checks down to INFO/FORMAT fields #1676
IndexOutOfBounds thrown when saving gVCF with no likelihoods #1673
Generate docs from R API for distribution #1672
Support loading a subset of VCF fields #1670
Error with metadata: Multivalued flags are not supported for INFO lines #1669
Include bdg.adam-0.23.0.tar.gz in distribution tarballs #1668
Include bdgenomics.adam-0.23.0_SNAPSHOT-py2.7.egg in distribution tarball #1667
Add SUPPORT.md file to complement CONTRIBUTING.md #1664
Can't merge BAM files containing the same sample #1663
Incorrect README.md kmer.scala loadAliments method parameter name #1662
Add performance benchmarks similar to Samtools CRAM benchmarking page #1661
Transient bad GZIP header bug when loading BGZF FASTQ #1658
bdgenomics.adam vs bdg.adam for R/Python APIs #1655
Need adamR script #1649
incorrect grep for assembly jars in bin/pyadam #1647
VariantRDD union creates multiple records for the same SNP ID #1644
S3 access documentation #1643
Algorithms docs formatting #1639
Building downstream apps docs reformatting #1638
FastqInputFormat.FILE_SPLITTABLE in conf not getting passed properly #1635
Add benchmarks to documentation #1634
Intro docs contain outdated/incompatible code #1633
Intro docs missing a number of active projects #1632
Installation instructions for Homebrew missing from documentation #1631
Architecture section is missing from docs #1630
Seq vs. Seq with javac #1625
ProcessingStep missing from adam-codegen #1623
Add ADAM recipe to bioconda #1618
adam-submit cannot find assembly jar if installed as symlink #1616
Expose transform/transmute in Java/Python/R #1615
Expose VariantContextRDD in R/Python #1614
Expose pipe API from Python/R #1611
Serialization issue with TwoBitFile #1610
Snapshot Distribution Does not include jar files #1607
ManualRegionPartitioner is broken for ParallelFileMerger codepath #1602
VariantRDD doesn't save partition map #1601
Scala copy method not supported in abstract classes such as AlignmentRecordRDD #1599
Interleaved FASTQ recognizes only /1 suffix pattern #1589
Use empty sequence dictionary when loading features #1588
New Illumina FASTQ spec adds metadata to read name line #1585
first run of ADAM #1582
Add unit test coverage for BED12 parser and writer #1579
Spark 1.x Scala 2.10 snapshot artifacts missing since 31 March 2017 #1578
Unable to save GenomicRDDs after a join. #1576
Add filterBySequenceDictionary to GenomicRDD #1575
Unaligned Trait does nothing #1573
Bump to bdg-formats 0.11.1 #1570
PhredUtils conversion to log probabilities has insufficient resolution for PLs #1569
Reference model import code is borked #1568
SequenceDictionary vs Feature[RDD] of reference length features #1567
giab-NA12878 truth_small_variants.vcf.gz header issues #1566
VCF header read from stream ignored in VCFOutFormatter #1564
VCF genotype Number=A attribute throws ArrayIndexOutOfBoundsException #1562
Save compressed single file VCF via HadoopBAM #1554
bucketing strategy #1553
Is parquet using delta encoding for positions? #1552
Export to VCF does not include symbolic non-ref if site has a called alt #1551
Refactor filterByOverlappingRegions not to require a List #1549
Move docs to Sphinx/pure Markdown #1548
java.lang.IncompatibleClassChangeError: Implementing class #1544
Support locus predicate in TransformAlignments #1539
Visibility from Java, jrdd has private access in AvroGenomicRDD #1538
Rename o.b.adam.apis.java package to o.b.adam.api.java #1537
VCF header genotype reserved key FT cardinality clobbered by htsjdk #1535
Compute a SequenceDictionary from a *.genome file #1534
Queryname sorted check should check for queryname grouped as well #1530
Bump to bdg-formats 0.11.0 #1520
Move to Spark 2.2, Parquet 1.8.2 #1517
Minor refactor for TreeRegionJoin for consistency #1514
Allow +Inf and -Inf Float values when reading VCF #1512
SparkFiles temp directory path should be accessible as a variable #1510
SparkFiles.get expects just the filename #1509
Split apart #1324 #1507
Where can I find "Phred-scaled quality score" (QUAL)? #1506
Alignment Record sort is not consistent with samtools #1504
Sequence dictionary records in TwoBitFile are not stable #1502
Move coverage counter over to Dataset API #1501
Allow users to set the minimum partition count across all load methods #1500
Enable reuse of broadcast object across broadcast region joins #1499
Take union across genomic RDDs #1497
Adam files created by vcf2adam is not recognizable #1496
Scalatest log output disappears with Maven 3.5.0 #1495
ArrayOutOfBoundsException in vcf2adam (spark2_2.11-0.22.0) on UK10K VCFs (VCFv4.1) #1494
ReferenceRegion overlaps and covers returns false if overlap is 1 #1492
Provide asSingleFile parameter for saveAsFastq and related #1490
Min Phred score gets bumped by 33 twice in BQSR #1488
Should throw error when BAM header load fails #1486
Default value for reads.toCoverage(collapse) should be false #1483
Refactor ADAMContext loadXxx methods for consistency #1481
loadGenotypes three time #1480
Fall back to sequential concat when HDFS concat fails #1478
VCF line with . ALT gets dropped #1476
ADAM works on Cloudera but does NOT work on MAPR #1475
Clean up ReferenceRegion.scala #1474
Allow joins on regions that are within a threshold (instead of requiring overlap) #1473
FeatureRDD.toCoverage throws NullPointerException when there is no coverage information #1471
Add quality score binner #1462
Splittable compression and FASTQ #1457
Don't convert .{different-type}.adam in loadAlignments and loadFragments #1456
New primitives for adam-core #1454
Port over code for populating SequenceDictionaries from .dict files #1449
Ignore failed push to Coveralls during CI builds #1444
No asSingleFile parameter for saveAsFasta in NucleotideContigFragmentRDD #1438
shufflejoin and ArrayIndexOutOfBoundsException #1436
Document using ADAM snapshot #1432
Improve metrics coverage across ADAMContext load methods #1428
loadReferenceFile missing from Java API #1421
loadCoverage missing from Java API #1420
Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecordRDD]? #1419
Clean up possibly unused methods in Projection #1417
Problem loading SNPeff annotated VCF #1390
RecordGroupDictionary should support isEmpty #1380
Get rid of mutable collection transformations in ShuffleRegionJoin #1379
Add tab5/6 as native output format for AlignmentRecordRDD #1377
ValidationStringency in MDTagging should apply to reads on unknown references #1365
Assembly final name doesn't include spark2 for Spark 2.x builds #1361
Merge reads2fragments and fragments2reads into a single CLI #1359
Investigate failures to load ExAC.0.3.GRCh38.vcf variants #1351
adam-shell does not allow additional jars via Spark jars argument #1349
Loading GZipped VCF returns an empty RDD #1333
Bump Spark 2 build to Spark 2.1.0 #1330
Rename Transform command TransformAlignments or similar #1328
Replace ADAM2Vcf and Vcf2ADAM commands with TransformGenotypes and TransformVariants #1327
FeatureRDD instantiation tries to cache the RDD #1321
Repository for Pipe API wrappers for bioinformatics tools #1314
Trying to get Spark pipeline working with slightly out of date code. #1313
Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) #1312
Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311
Don't include log4j.properties in published JAR #1300
Removing ProgramRecords info when saving data to sam/bam? #1257
ADAM on Slurm/LSF #1229
Maintaining sorted/partitioned knowledge #1216
Evaluate bdg-convert external conversion library proposal #1197
Port AMPCamp Tutorial over #1174
Top level WrappedRDD or similar abstraction #1173
GFF3 formatted features written as single file must include gff-version pragma #1169
Can probably eliminate sort in RealignIndels #1137
Load SV type info field - need for allele uniquness #1134
BroadcastRegionJoin is not a broadcast join #1110
AlignmentRecordRDD does not extend GenomicRDD per javac #1092
Add generic ReferenceRegion pushdown for parquet files #1047
Use of dataset api in ADAM #1018
Difference running markdups with and without projection #1014
ADAM to BAM conversion fails using relative path #1012
Refactor SequenceDictionary to use Contig instead of SequenceRecord #997
NoSuchMethodError due to kryo minor-version mismatch #955
Autogen field names in projection package #941
Future of schemas in bdg-formats #925
genotypeType for genotypes with multiple OtherAlt alleles? #897
How to filter genotype RDD with FeatureRDD #890
How to convert genotype DataFrame to VariantContext DataFrame / RDD #886
R language package for Adam #882
How to count genotypes with a 10 node Spark/Adam cluster faster than with BCFTools on a single machine? #879
Ensure Java API is up-to-date with Scala API #855
BroadcastRegionJoin fails with unmapped reads #821
Resolve Fragment vs. SingleReadBucket #789
Updating/Publishing the docs/ directory #774
Next on empty iterator in BroadcastRegionJoin #661
Cleanup code smell in sort work balancing code #635
Provide low-impact alternative to transform -repartition for reducing partition size #594
Create an ADAM Python API #538
Migrate serialization libraries out of ADAM core #448
Create standardized, interpretable exceptions for error reporting #420
Build info/version info inside ADAM-generated files #188

Merged and closed pull requests:

[ADAM-1854] Add requirements.txt file for RTD. #1856 (fnothaft)
[ADAM-1783] Resolve check issues that block pushing to CRAN. #1849 (fnothaft)
[ADAM-1847] Update ADAM scripts to support self-contained pip install. #1848 (fnothaft)
[ADAM-1845] Only build and publish scaladocs in publish-scaladoc.sh. #1846 (heuermh)
[ADAM-1843] Install sources before calling scala:doc in publish scaladoc #1844 (fnothaft)
Remove python and R profiles from release script #1842 (heuermh)
[ADAM-1817] Bump to Hadoop-BAM 7.9.1. #1841 (fnothaft)
[ADAM-1838] Make populating variant.annotation field in Genotype configurable #1839 (fnothaft)
[ADAM-1834] Add proper extensions for SAM/BAM/CRAM output formats. #1835 (fnothaft)
[ADAM-1822] Misc docs cleanup #1827 (fnothaft)
Added missing init.py for fulltoc. #1824 (fnothaft)
[ADAM-1821] Add missing fulltoc for Sphinx documentation. #1823 (fnothaft)
Fix link to documentation #1820 (nzachow)
[ADAM-1634] Add algorithm benchmarks to documentation. #1818 (fnothaft)
[ADAM-1813] Delegate right outer shuffle region join to left OSRJ implementation. #1814 (fnothaft)
[ADAM-1807] Check for empty partition when running a piped command. #1812 (fnothaft)
[ADAM-1803] Refactor GenomicRDD.writeTextRdd to util.TextRddWriter. #1809 (heuermh)
Added Filter error when file loaded does not match schema #1805 (akmorrow13)
changed num_jars count #1802 (akmorrow13)
[ADAM-1795] Map -DskipTests=true to exec.skip for Python and R tests. #1800 (heuermh)
[ADAM-1672] Use working directory for R devtools::document(). #1798 (heuermh)
[ADAM-1789] Move scala-lang to provided scope. #1790 (fnothaft)
[ADAM-1784] loadIndexedBam should pass the raw globbed path to Hadoop-BAM #1785 (fnothaft)
[ADAM-1664] Add SUPPORT.md file to complement CONTRIBUTING.md. #1781 (heuermh)
[ADAM-1779] Adding code of contact adapted from the Contributor Convenant, version 1.4. #1780 (heuermh)
[ADAM-1661] Add file storage benchmarks. #1772 (fnothaft)
[ADAM-1770] Genotype should only store core variant fields. #1771 (fnothaft)
[ADAM-1768] Add InFormatter for unpaired FASTQ. #1769 (fnothaft)
[ADAM-1643] Add S3 access documentation. #1767 (fnothaft)
[ADAM-1763] Apply absolute value to destination partition in ModPartitioner #1766 (fnothaft)
Add R and Python into distribution artifacts #1765 (fnothaft)
[ADAM-1655] Move R package to bdgenomics.adam. #1764 (fnothaft)
[ADAM-1753] Only emit realignment targets for reads containing a single INDEL #1756 (fnothaft)
[ADAM-1715] Support validation stringency in Python/R. #1755 (fnothaft)
[ADAM-1680] Eliminate non-determinism in the ShuffleRegionJoin. #1754 (fnothaft)
update to _replaceRdd with tests #1749 (akmorrow13)
[ADAM-1747] Fixed subtract bug and tests #1748 (devin-petersohn)
[ADAM-1745] Adding LeftOuterShuffleRegionJoinAndGroupByLeft and tests #1746 (devin-petersohn)
Enabled thresholding for joins and standardized regionFn #1741 (devin-petersohn)
Making join return types consistent #1737 (devin-petersohn)
Opening up permissions on GenericGenomicRDD #1736 (devin-petersohn)
[ADAM-1716] Add adam- prefix to distribution module name. #1733 (heuermh)
[ADAM-1695] Check for illegal genotype index after splitting multi-allelic variants. #1725 (heuermh)
[ADAM-1517] Bump Parquet version in a manner compatible with Spark 2.2.x #1722 (fnothaft)
[ADAM-1512] Support VCFs with +Inf/-Inf float values. #1721 (fnothaft)
[ADAM-1709] Add ability to left normalize reads containing INDELs. #1711 (fnothaft)
[ADAM-1691] Move bdgenomics.adam to use a namespace package. #1706 (fnothaft)
moved bdgenomics.adam package to bdgenomics-adam #1705 (akmorrow13)
Misc cleanup needed for bigdatagenomics/cannoli#65 #1704 (fnothaft)
[ADAM-1699] Make GenomicRDD.toXxx method names consistent. #1700 (heuermh)
[ADAM-1694] Add short readable descriptions for toString in subclasses of GenomicRDD. #1698 (heuermh)
[ADAM-1693] Add adam-shell friendly VariantContextRDD.saveAsVcf method. #1696 (heuermh)
[ADAM-1688] Add bdg-formats exclusion to org.hammerlab:genomic-loci dependency. #1690 (heuermh)
[ADAM-1679] Unmapped items should not get caught in requirement when sorting #1687 (fnothaft)
[ADAM-1566] Merge VCF header lines with VCFHeaderLineCount.INTEGER correctly. #1685 (heuermh)
[ADAM-1682] Add variant quality field. #1684 (fnothaft)
Remove adam- prefix from module directory names. #1681 (heuermh)
Update to hadoop-bam 7.9.0 and htsjdk 2.11.0. #1678 (heuermh)
[ADAM-1676] Add more finely grained validation for INFO/FORMAT fields. #1677 (fnothaft)
Python API fixes for AlignmentRecordRDD #1675 (akmorrow13)
[ADAM-1673] Don't set PL to empty when no PL is attached to a gVCF record #1674 (fnothaft)
[ADAM-1670] Add ability to selectively project VCF fields. #1671 (fnothaft)
[ADAM-1663] Enable read groups with repeated names when unioning. #1665 (fnothaft)
Maint 2.11 0.18.0 #1659 (Douglas-H)
[ADAM-1630] Overhauled docs introduction and added architecture section. #1653 (fnothaft)
Add adamR script #1651 (fnothaft)
[ADAM-1647] Fix bad JAR discovery grep in bin/pyadam. #1648 (fnothaft)
[ADAM-1548] Generate reStructuredText from pandoc markdown. #1646 (fnothaft)
Algorithms docs formatting #1645 (gunjanbaid)
Cleaned up docs. #1642 (gunjanbaid)
Making example code compatible with current ADAM build #1641 (devin-petersohn)
Cleaning up formatting and spacing of docs. #1640 (devin-petersohn)
added ExtractRegions #1637 (antonkulaga)
[ADAM-1635] Eliminate passing FASTQ splittable status via config. #1636 (fnothaft)
[ADAM-1614] Add VariantContextRDD to R and Python APIs. #1628 (fnothaft)
[ADAM-1615] Add transform and transmute APIs to Java, R, and Python #1627 (fnothaft)
[ADAM-1625] Use explicit types for header lines #1626 (heuermh)
[ADAM-1623] Add ProcessingStep to adam-codegen. #1624 (heuermh)
[ADAM-1607] Update distribution assembly task to attach assembly überjar #1622 (fnothaft)
[ADAM-1490] Add asSingleFile to saveAsFastq and related. #1621 (heuermh)
Update load method docs in Python and R. #1619 (heuermh)
[ADAM-1616] Resolve installation directory if scripts are symlinks. #1617 (heuermh)
[ADAM-1611] Extend pipe APIs to Java, Python, and R. #1613 (fnothaft)
[ADAM-1610] Mark non-serializable field in TwoBitFile as transient. #1612 (fnothaft)
[ADAM-1554] Support saving BGZF VCF output. #1608 (fnothaft)
Adding examples of how to use joins in the real world #1605 (devin-petersohn)
[ADAM-1599] Add explicit functions for updating GenomicRDD metadata. #1600 (fnothaft)
[ADAM-1576] Allow translation between two different GenomicRDD types. #1598 (fnothaft)
[ADAM-1444] Ignore failed push to Coveralls. #1595 (fnothaft)
Testing, testing, 1... 2... 3... #1592 (fnothaft)
[ADAM-1417] Removed unused Projection.apply method, add test for Filter. #1591 (fnothaft)
[ADAM-1579] Add unit test coverage for BED12 format. #1587 (fnothaft)
[ADAM-1585] Support additional Illumina FASTQ metadata. #1586 (fnothaft)
[ADAM-1438] Add ability to save FASTA back as a single file. #1581 (fnothaft)
Bump bdg-formats correctly to 0.11.1, not SNAPSHOT. #1577 (fnothaft)
[ADAM-1573] Remove unused Unaligned trait. #1574 (fnothaft)
Slurm deployment readme #1571 (jpdna)
[ADAM-1564] Read VCF header from stream in VCFOutFormatter. #1565 (heuermh)
[ADAM-1562] Index off by one for VCF genotype Number=A attributes. #1563 (heuermh)
[ADAM-1533] Set Theory #1561 (devin-petersohn)
Freebayes FORMAT=<ID=AO,Number=A attribute throws ArrayIndexOutOfBoundsException #1560 (heuermh)
[ADAM-1551] Emit non-reference model genotype at called sites. #1559 (fnothaft)
[ADAM-1449] Add loadSequenceDictionary to ADAM context. #1557 (heuermh)
[ADAM-1537] Rename o.b.adam.apis.java package to o.b.adam.api.java #1556 (heuermh)
[ADAM-1549] Make regions provided to filterByOverlappingRegions an Iterable. #1550 (fnothaft)
[ADAM-941] Automatically generate projection enums. #1547 (fnothaft)
[ADAM-1361] Fix misnamed ADAM überjar. #1546 (fnothaft)
[ADAM-1257] Add program record support for alignment/fragment files. #1545 (fnothaft)
[ADAM-1359] Merge reads2fragments and fragments2reads into transformFragments #1543 (fnothaft)
Fix minor format mistakes (and typo) in docs #1542 (kkaneda)
Add a simple unit test to SingleFastqInputFormat #1541 (kkaneda)
Support locus predicate in Transform #1540 (fnothaft)
[ADAM-1421] Add java API for loadReferenceFile. #1536 (fnothaft)
Refactor Vcf2ADAM and ADAM2Vcf into TransformGenotypes and TransformVariants #1532 (heuermh)
[ADAM-1530] Support loading GO:query (S/CR/B)AMs as fragments. #1531 (fnothaft)
[ADAM-1169] Write GFF header line pragma in single file mode. #1529 (fnothaft)
[ADAM-1501] Compute coverage using Dataset API. #1528 (fnothaft)
[ADAM-1497] Add union to GenomicRDD. #1526 (fnothaft)
[ADAM-1486] Respect validation stringency if BAM header load fails. #1525 (fnothaft)
[ADAM-1499] Enable reuse of broadcasted objects in region join. #1524 (fnothaft)
[ADAM-1520] Bump to bdg-formats 0.11.0. #1523 (fnothaft)
Adding fragment InFormatter for Bowtie tab5 format #1522 (heuermh)
[ADAM-1328] Rename Transform to TransformAlignments. #1521 (fnothaft)
[ADAM-1517] Move to Parquet 1.8.2 in preparation for moving to Spark 2.2.0 #1518 (fnothaft)
Fixed minor typos in README. #1516 (gunjanbaid)
Making TreeRegionJoin consistent with ShuffleRegionJoin #1515 (devin-petersohn)
Resolve #1508, #1509 for Pipe API #1511 (fnothaft)
[ADAM-1502] Preserve contig ordering in TwoBitFile sequence dictionary. #1508 (fnothaft)
[ADAM-1483] Remove collapse parameter from AlignmentRecordRDD.toCoverage #1493 (fnothaft)
[ADAM-1377] Adding fragment InFormatter for Bowtie tab6 format #1491 (heuermh)
[ADAM-1488] Only increment BQSR min quality by 33 once. #1489 (fnothaft)
[ADAM-1481] Refactor ADAMContext loadXxx methods for consistency #1487 (heuermh)
Add quality score binner #1485 (fnothaft)
Clean up ReferenceRegion.scala and add thresholded overlap and covers #1484 (devin-petersohn)
[ADAM-1456] Remove .{type}.adam file extension conversions in type-guessing methods. #1482 (heuermh)
[ADAM-1480] Add switch to disable the fast concat method. #1479 (fnothaft)
[ADAM-1476] Treat . ALT allele as symbolic non-ref. #1477 (fnothaft)
Adding require for Coverage Conversion and related tests #1472 (devin-petersohn)
Add cache argument to loadFeatures, additional Feature timers #1427 (heuermh)
[ADAM-882] R API #1397 (fnothaft)
[ADAM-1018] Add support for Spark SQL Datasets. #1391 (fnothaft)
WIP Python API #1387 (fnothaft)
[ADAM-1365] Apply validation stringency to reads on missing contigs when MD tagging #1366 (fnothaft)
Update dependency and plugin versions #1360 (heuermh)
[ADAM-1330] Move to Spark 2.1.0. #1332 (fnothaft)
Efficient Joins and (re)Partitioning #1324 (devin-petersohn)

Version 0.22.0

Closed issues:

Realign all reads at target site, not just reads with no mismatches #1469
Parallel file merger fails if the output file is smaller than the HDFS block size #1467
Add new realigner arguments to docs #1465
Recalibrate method misspelled as recalibateBaseQualities #1463
FASTQ may try to split GZIPed files #1459
Update to Hadoop-BAM 7.8.0 #1455
Publish Markdown and Scaladoc to the interwebs #1453
Make VariantContextConverter public #1451
Apply method in FragmentRDD is package private #1445
Thread pool will block inside of pipe command for streams too large to buffer #1442
FeatureRDD.apply() does not allow addition of other parameters with defaults in the case class #1439
Question : Why the number of paired sequence in adam-0.21.0 less than adam-0.19.0? #1424
loadCoverage missing from Java API #1420
Estimate contig lengths in SequenceDictionary for BED, GFF3, GTF, and NarrowPeak feature formats #1410
loadIntervalList FeatureRDD has empty SequenceDictionary #1409
problem using transform command #1406
Add coveralls #1403
INDEL realigner binary search conditional is flipped #1402
Delete adam-scripts/R #1398
Data missing when transfroming FASTQ to Adam #1393
java.io.FileNotFoundException when file exists #1385
Off-by-1 error in FASTQ InputFormat start positioning code #1383
Set the wrong value for end for symbolic alts #1381
RecordGroupDictionary should support isEmpty #1380
Add pipe API in and out formatters for Features #1374
Increase visibility for SupportedHeaderLines.allHeaderLines #1372
Bits of VariantContextConverter don't get ValidationStringencied #1371
Add Markdown docs for Pipe API #1368
Array[Consensus] not registered #1367
ValidationStringency in MDTagging should apply to reads on unknown references #1365
When doing a release, the SNAPSHOT should bump by 0.1.0, not 0.0.1 #1364
FromKnowns consensus generator fails if no reads overlap a consensus #1362
Performance tune-up in BQSR #1358
Increase visibility for ADAMContext.sc and/or getFs... methods #1356
Pipe API formatters need to be public #1354
Version 0.21.0: VariantContextConverter fails for 1000G VCF data #1353
ConsensusModel's can't really be instantiated #1352
Runtime conflicts in transitive versions of Guava dependency #1350
Transcript Effects ignored if more than 1 #1347
Remove "fork" tag from releases #1344
Refactor isSorted boolean parameters to sorted #1341
Loading GZipped VCF returns an empty RDD #1333
Follow up on error messages in build scripts #1331
Bump Spark 2 build to Spark 2.1.0 #1330
FeatureRDD instantiation tries to cache the RDD #1321
Load queryname sorted BAMs as Fragments #1303
Run Duplicate Marking on Fragments #1302
GenomicRDD.pipe may hang on failure error codes #1282
IllegalArgumentException Wrong FS for vcf_head files on HDFS #1272
java.io.NotSerializableException: org.bdgenomics.formats.avro.AlignmentRecord #1240
Investigate sorted join in dataset api #1223
Support looser validation stringency for loading some VCF Integer fields #1213
Add new feature-overlap command to demonstrate new region joins #1194
What should our API at the command line look like? #1178
Split apart partition and join in ShuffleRegionJoin #1175
Merging files should be multithreaded #1164
File _rgdict.avro does not exist #1150
how to collect the .adam files from Spark cluster multiple nodes and some questions about avocado #1140
JFYI: tiny forked adam-core "0.20.0" release #1139
Samtools (htslib) integration testing #1120
AlignmentRecordRDD does not extend GenomicRDD per javac #1092
Release ADAM version 0.21.0 #1088
Difference running markdups with and without projection #1014
ADAM to BAM conversion fails using relative path #1012
Refactor SequenceDictionary to use Contig instead of SequenceRecord #997
Customize adam-main cli from configuration file #918
genotypeType for genotypes with multiple OtherAlt alleles? #897
How to convert genotype DataFrame to VariantContext DataFrame / RDD #886
Ensure Java API is up-to-date with Scala API #855
Improve parallelism during FASTA output #842
Explicitly validate user args passed to transform enhancement #841
BroadcastRegionJoin fails with unmapped reads #821
Resolve Fragment vs. SingleReadBucket #789
Add profile for skipping test compilation/resolution #713
Next on empty iterator in BroadcastRegionJoin #661
Cleanup code smell in sort work balancing code #635
Remove reliance on MD tags #622
Provide low-impact alternative to transform -repartition for reducing partition size #594
Clean up Rich records #577
Create standardized, interpretable exceptions for error reporting #420
Create ADAM Benchmarking suite #120

Merged and closed pull requests:

[ADAM-1469] Don't filter on whether reads have mismatches during realignment #1470 (fnothaft)
[ADAM-1467] Skip concat call if there is only one shard. #1468 (fnothaft)
[ADAM-1465] Updating realigner CLI docs. #1466 (fnothaft)
[ADAM-1463] Rename recalibateBaseQualities method as recalibrateBaseQualities #1464 (heuermh)
[ADAM-1453] Add hooks to publish ADAM docs from CI flow. #1461 (fnothaft)
[ADAM-1459] Don't split FASTQ when compressed. #1459 (fnothaft)
[ADAM-1451] Make VariantContextConverter class and convert methods public #1452 (fnothaft)
Moving API overview from building apps doc to new source file. #1450 (heuermh)
[ADAM-1424] Adding test for reads dropped in 0.21.0. #1448 (heuermh)
[ADAM-1439] Add inferSequenceDictionary ctr to FeatureRDD. #1447 (heuermh)
[ADAM-1445] Make apply method for FragmentRDD public. #1446 (fnothaft)
[ADAM-1442] Fix thread pool deadlock in GenomicRDD.pipe #1443 (fnothaft)
[ADAM-1164] Add parallel file merger. #1441 (fnothaft)
Dependency version bump + BroadcastRegionJoin fix #1440 (fnothaft)
added JavaApi for loadCoverage #1437 (akmorrow13)
Update versions, etc. in build docs #1435 (heuermh)
Add test sample(verify number of reads in loadAlignments function) and ADAM SNAPSHOT document #1433 (xubo245)
Add cache argument to loadFeatures, additional Feature timers #1427 (heuermh)
feat: speed up 2bit file extract #1426 (Blaok)
BQSR refactor for perf improvements #1423 (fnothaft)
Add ADAMContext/GenomicRDD/pipe docs #1422 (fnothaft)
INDEL realigner cleanup #1412 (fnothaft)
Estimate contig lengths in SequenceDictionary for BED, GFF3, GTF, and NarrowPeak feature formats #1411 (heuermh)
Add coveralls badge to README.md. #1408 (fnothaft)
[ADAM-1403] Push coverage reports to Coveralls. #1404 (fnothaft)
Added instrumentation timers around joins. #1401 (fnothaft)
Add Apache Spark version to --version text #1400 (heuermh)
[ADAM-1398] Delete adam-scripts/R. #1399 (fnothaft)
[ADAM-1383] Use gt instead of gteq in FASTQ input format line size checks #1396 (fnothaft)
Maint spark2 2.11 0.21.0 #1395 (A-Tsai)
[ADAM-1393] fix missing reads when transforming fastq to adam #1394 (A-Tsai)
[ADAM-1380] Adds isEmpty method to RecordGroupDictionary. #1392 (fnothaft)
[ADAM-1381] Fix Variant end position. #1389 (fnothaft)
Make javac see that AlignmentRecordRDD extends GenomicRDD #1386 (fnothaft)
Added ShuffleRegionJoin usage docs #1384 (devin-petersohn)
Misc. INDEL realigner bugfixes #1382 (fnothaft)
Add pipe API in and out formatters for Features #1378 (heuermh)
[ADAM-1356] Make ADAMContext.getFsAndFiles and related protected visibility #1376 (heuermh)
[ADAM-1372] Increase visibility for DefaultHeaderLines.allHeaderLines #1375 (heuermh)
[ADAM-1371] Wrap ADAM->htsjdk VariantContext conversion with validation stringency. #1373 (fnothaft)
[ADAM-1367] Register Consensus array for serialization. #1369 (fnothaft)
[ADAM-1365] Apply validation stringency to reads on missing contigs when MD tagging #1366 (fnothaft)
[ADAM-1362] Fixing issue where FromKnowns consensus model fails if no reads hit a target. #1363 (fnothaft)
[ADAM-1352] Clean up consensus model usage. #1357 (fnothaft)
Increase visibility for InFormatter case classes from package private to public #1355 (heuermh)
Use htsjdk getAttributeAsList for VCF INFO ANN key #1348 (heuermh)
Fixes parsing variant annotations for multi-allelic rows #1346 (majkiw)
Sort pull requests by id #1345 (heuermh)
HBase genotypes backend -revised #1335 (jpdna)
[ADAM-1330] Move to Spark 2.1.0. #1332 (fnothaft)
Support deduping fragments #1309 (fnothaft)
[ADAM-1280] Silence CRAM logging in tests. #1294 (fnothaft)
Added test to try and repro #1282. #1292 (fnothaft)

Version 0.21.0

Closed issues:

Update Markdown docs with ValidationStringency in VCF<->ADAM CLI #1342
Variant VCFHeaderLine metadata does not handle wildcards properly #1339
Close called multiple times on VCF header stream #1337
BroadcastRegionJoin has serialization failures #1334
adam-cli uses git-commit-id-plugin which breaks release? #1322
move_to_xyz scripts should have interlocks... #1317
Lineage for partitionAndJoin in ShuffleRegionJoin causes StackOverflow Errors #1308
Add move_to_spark_1.sh script and update README to mention #1307
adam-submit transform fails with Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class #1306
private ADAMContext constructor? #1296
AlignmentRecord.mateAlignmentEnd never set #1290
how to submit my own driver class via adam-submit? #1289
ReferenceRegion on Genotype seems busted? #1286
Clarify strandedness in ReferenceRegion apply methods #1285
Parquet and CRAM debug logging during unit tests #1280
Add more ANN field parsing unit tests #1273
loadVariantAnnotations returns empty RDD #1271
Implement joinVariantAnnotations with region join #1259
Count how many chromosome in the range of the kmer #1249
ADAM minor release to support htsjdk 2.7.0? #1248
how to config kryo.registrator programmatically #1245
Does the nested record Flattener drop Maps/Arrays? #1244
Dead-ish code cleanup in org.bdgenomics.adam.utils #1242
java.io.FileNotFoundException for old adam file after upgrade to adam0.20 #1240
please add maven-source-plugin into the pom file #1239
Assembly jar doesn't get rebuilt on CLI changes #1238
how to compare with the last the column for the same chromosome name? #1237
Need a way for users to add VCF header lines #1233
Enhancements to VCF save #1232
Must we split multi-allelic sites in our Genotype model? #1231
Can't override default -collapse in reads2coverage #1228
Reads2coverage NPEs on unmapped reads #1227
Strand bias doesn't get exported #1226
Move ADAMFunSuite helper functions upstream to SparkFunSuite #1225
broadcast join using interval tree #1224
Instrumentation is lost in ShuffleRegionJoin #1222
Bump Spark, Scala, Hadoop dependency versions #1221
GenomicRDD shuffle region join passes partition count to partition size #1220
Scala compile errors downstream of Spark 2 Scala 2.11 artifacts #1218
Javac error: incompatible types: SparkContext cannot be converted to ADAMContext #1217
Release 0.20.0 artifacts failed Sonatype Nexus validation #1212
Release script failed for 0.20.0 release #1211
gVCF - can't load multi-allelic sites #1202
Allow open-ended intervals in loadIndexedBam #1196
Interval tree join in ADAM #1171
spark-submit throw exception in spark-standalone using .adam which transformed from .vcf #1121
BroadcastRegionJoin is not a broadcast join #1110
Improve test coverage of VariantContextConverter #1107
Variant dbsnp rs id tracking in vcf2adam and ADAM2Vcf #1103
Document core ADAM transform methods #1085
Document deploying ADAM on Toil #1084
Clean up packages #1083
VariantCallingAnnotations is getting populated with INFO fields #1063
How to load DatabaseVariantAnnotation information ? #1049
Release ADAM version 0.20.0 #1048
Support VCF annotation ANN field in vcf2adam and adam2vcf #1044
How to create a rich(er) VariantContext RDD? Reconstruct VCF INFO fields. #878
Add biologist targeted section to the README #497
Update usage docs running for EC2 and CDH #493
Add docs about building downstream apps on top of ADAM #291
Variant filter representation #194

Merged and closed pull requests:

[ADAM-1342] Update CLI docs after #1288 merged. #1343 (fnothaft)
[ADAM-1339] Use glob-safe method to load VCF header metadata for Parquet #1340 (fnothaft)
[ADAM-1337] Remove os.{flush,close} calls after writing VCF header. #1338 (fnothaft)
[ADAM-1334] Clean up serialization issues in Broadcast region join. #1336 (fnothaft)
[ADAM-1307] move_to_spark_2 fails after moving to scala 2.11. #1329 (fnothaft)
unroll/optimize some JavaConversions #1326 (ryan-williams)
clean up *Join type-params/scaldocs #1325 (ryan-williams)
[ADAM-1322] Skip git commit plugin if .git is missing. #1323 (fnothaft)
Supports access to indexed fa and fasta files #1320 (akmorrow13)
Add interlocks for move_to_xyz scripts. #1319 (fnothaft)
[ADAM-1307] Add script for moving to Spark 1. #1318 (fnothaft)
Update move_to_spark_2.sh #1316 (creggian)
[ADAM-1308] Fix stack overflow in join with custom iterator impl. #1315 (fnothaft)
Why Adam? section added to README.md #1310 (tverbeiren)
Add docs about using ADAM's Kryo registrator from another Kryo registrator. #1305 (fnothaft)
Add docs about building downstream applications #1304 (heuermh)
[ADAM-493] Add ADAM-on-Spark-on-YARN docs. #1301 (fnothaft)
Code style fixes #1299 (heuermh)
Make ADAMContext and JavaADAMContext constructors public #1298 (heuermh)
Remove back reference between VariantAnnotation and Variant #1297 (fnothaft)
[ADAM-1280] Silence CRAM logging in tests. #1294 (fnothaft)
HBase as a separate repo #1293 (jpdna)
Reference region cleanup #1291 (fnothaft)
Clean rewrite of VariantContextConverter #1288 (fnothaft)
add function:filterByOverlappingRegions #1287 (liamlee)
Populate fields on VariantAnnotation #1283 (heuermh)
Add VCF headers for fields in Variant and VariantAnnotation records #1281 (heuermh)
CGCloud deploy docs #1279 (jpdna)
some style nits #1278 (ryan-williams)
use ParsedLoci in loadIndexedBam #1277 (ryan-williams)
Increasing unit test coverage for VariantContextConverter #1276 (heuermh)
Expose FeatureRDD to public #1275 (Georgehe4)
Clean up CLI operation categories and names, and add documentation for CLI #1274 (fnothaft)
Rename org.bdgenomics.adam.rdd.variation package to o.b.a.rdd.variant #1270 (heuermh)
use testFile in some tests #1268 (ryan-williams)
[ADAM-1083] Cleaning up org.bdgenomics.adam.models. #1267 (fnothaft)
make py file py3-forward-compatible #1266 (ryan-williams)
rm accidentally-added file #1265 (fnothaft)
Finishing up the cleanup on org.bdgenomics.adam.rdd. #1264 (fnothaft)
Clean up org.bdgenomics.adam.rich package. #1263 (fnothaft)
Add docs for transform pipeline, ADAM-on-Toil #1262 (fnothaft)
updates for bdg utils 0.2.9-SNAPSHOT #1261 (akmorrow13)
[ADAM-1233] Expose header lines in Variant-related GenomicRDDs #1260 (fnothaft)
[ADAM-1221] Bump Spark/Hadoop versions. #1258 (fnothaft)
Rename org.bdgenomics.adam.rdd.features package to o.b.a.rdd.feature #1256 (heuermh)
Clean up documentation in org.bdgenomics.adam.projection. #1255 (fnothaft)
[ADAM-1221] Bump Spark/Hadoop versions. #1254 (fnothaft)
Misc shuffle join fixes. #1253 (fnothaft)
[ADAM-1196] Add support for open ReferenceRegions. #1252 (fnothaft)
[ADAM-1225] Move helper functions from ADAMFunSuite to SparkFunSuite. #1251 (fnothaft)
Merge VariantAnnotation and DatabaseVariantAnnotation records #1250 (heuermh)
Miscellaneous VCF fixes #1247 (fnothaft)
HBase backend for Genotypes #1246 (jpdna)
[ADAM-1242] Clean up dead code in org.bdgenomics.adam.util. #1243 (fnothaft)
Small cleanup of "replacing uses of deprecated class SAMFileReader" #1236 (fnothaft)
replacing uses of deprecated class SAMFileReader #1235 (lbergelson)
[ADAM-1224] Replace BroadcastRegionJoin with tree based algo. #1234 (fnothaft)
Fix reads2coverage issues #1230 (fnothaft)
[ADAM-1212] Add empty assembly object, allows Maven build to create sources and javadoc artifacts #1215 (heuermh)
[ADAM-1211] Fix call to move_to_scala_2.sh, reorder Spark 2.x Scala 2.10 and 2.10 sections #1214 (heuermh)
demonstrate multi-allelic gVCF failure - test added #1205 (jpdna)
Merge VariantAnnotation and DatabaseVariantAnnotation records #1144 (heuermh)
Upgrade to bdg-formats-0.10.0 #1135 (fnothaft)

Version 0.20.0

Closed issues:

Sorting by reference index seems doesn't work or sorted by DESC order? #1204
master won't compile #1200
VCF format tag SB field parse error in loading #1199
Publish sources JAR with snapshots #1195
Type SparkFunSuite in package org.bdgenomics.utils.misc is not available #1193
MDTagging fails on GRCh38 #1192
Fix stack overflow in IndelRealigner serialization #1190
Delete ./scripts/commit-pr.sh #1188
Hadoop globStatus returns null if no glob matches #1186
Swapping out IntervalRDD under GenomicRDDs #1184
How to get "SO coordinate" instead of "SO unsorted"? #1182
How to read glob of multiple parquet Genotype #1179
Update command line doc and examples in README.md #1176
FastqRecordConverter needs cleanup and tests #1172
TransformFormats write to .gff3 file path incorrectly writes as parquet #1168
Should be able to merge shards across two different file systems #1165
RG ID gets written as the index, not the record group name #1162
Users should be able to save files as -single without merging them #1161
Users should be able to set size of buffer used for merging files #1160
Bump Hadoop-BAM to 7.7.0 #1158
adam-shell prints command trace to stdout #1154
Map IntervalList format column four to feature name or attributes? #1152
Parquet storage of VariantContext #1151
vcf2adam unparsable vcf record #1149
Reorder kryo.register statements in ADAMKryoRegistrator #1146
Make region joins public again #1143
Support CRAM input/output #1141
Transform should run with spark.kryo.requireRegistration=true #1136
adam-shell not handling bash args correctly #1132
Remove Gene and related models and parsing code #1129
Generate Scoverage reports when running CI #1124
Remove PairingRDD #1122
SAMRecordConverter.convert takes unused arguments #1113
Add Pipe API #1112
Improve coverage in Feature unit tests #1106
K-mer.scala code #1105
add -single file output option to ADAM2Vcf #1102
adam2vcf Fails with Sample not serializable #1100
ReferenceRegion.apply(AlignmentRecord) should not NPE on unmapped reads #1099
Add outer region join implementations #1098
VariantContextConverter never returns DatabaseVariantAnnotation #1097
loadvcf: conflicting require statement #1094
ADAM version 0.19.0 will not run on Spark version 2.0.0 #1093
Be more rigorous with FileSystem.get #1087
Remove network-connected and default test-related Maven profiles #1073
Releases should get pushed to Spark Packages #1067
Invalid POM for cli on 0.19.0 #1066
scala.MatchError RegExp does not catch colons in value part properly #1061
Support writing IntervalList header for features #1059
Add -single support when writing features in native formats #1058
Remove workaround for gzip/BGZF compressed VCF headers #1057
Clean up if clauses in Transform #1053
Adam-0.18.2 can not load Adam-0.14.0 adamSave function data (sam) #1050
filterByOverlappingRegion Incorrect for Genotypes #1042
Move Interval trait to utils, added in #75 #1041
Remove implicit GenomicRDD to RDD conversion #1040
VCF sample metadata - proposal for a GenotypedSampleMetadata object #1039
[build system] ADAM test builds pollute /tmp, leaving lots of cruft... #1038
adamMarkDuplicates function in AlignmentRecordRDDFunctions class can not mark the same read? #1037
test MarkDuplicatesSuite with two similar read in ref and start position and different avgPhredScore, error! #1035
Explore protocol buffers vs Avro #1031
Increase Avro dependency version to 1.8.0 #1029
ADAM specific logging #1024
Reenable Travis CI for pull request builds #1023
Bump Apache Spark version to 1.6.1 in Jenkins #1022
ADAM compatibility with Spark 2.0 #1021
ADAM to BAM conversion failing on 1000G file #1013
Factor out *RDDFunctions classes #1011
Port single file BAM and header code to VCF #1009
Roll Jenkins JDK 8 changes into ./scripts/jenkins-test #1008
Support GFF3 format #1007
Separate fat jar build from adam-cli to new maven module #1006
adam-cli POM invalid: maven.build.timestamp #1004
Sub-partitioning of Parquet file for ADAM #1003
Flattening the Genotype schema #1002
install adam 0.19 error! #1001
How to solve it please? #1000
Has the project realized alignment reads to reference genome algorithm? #996
All file-based input methods should support running on directories, compressed files, and wildcards #993
Contig to ContigName Change not reflected in AlignmentRecordField #991
Add homebrew guidelines to release checklist or automate PR generation #987
fix deprecation warnings #985
rename fragments package #984
Explore if SeqDict data can be factored out more aggressively #983
Make "Adam" all caps in filename Adam2Fastq.scala #981
Adam2Fastq should output reverse complement when 0x10 flag is set for read #980
Allow lowercase letters in jar/version names #974
Add stringency parameter to flagstat #973
Arg-array parsing problem in adam-submit #971
Pass recordGroup parameter to loadPairedFastq #969
Send a number of partitions to sc.textFile calls #968
adamGetReferenceString doesn't reduce pairs correctly #967
Update ADAM formula in homebrew-science to version 0.19.0 #963
BAM output in ADAM appears to be corrupt #962
Remove code workarounds necessary for Spark 1.2.1/Hadoop 1.0.x support #959
Issue with version 18.0.2 #957
Expose sorting by reference index #952
.rgdict and .seqdict files are not placed in the adam directory #945
Why does count_kmers not return k-mers that are split between two records? #930
Load legacy file formats to Spark SQL Dataframes #912
Clean up RDD method names #910
Load/store sequence dictionaries alongside Genotype RDDs #909
vcf2adam -print_metrics throws IllegalStateException on Spark 1.5.2 or later #902
error: no reads in first split: bad BAM file or tiny split size? #896
FastaConverter.FastaDescriptionLine not kryo-registered #893
Work With ADAM fasta2adam in a distributed mode #881
vcf2adam -> Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; #871
Code coverage profile is broken #849
Building Adam on OS X 10.10.5 with Java 1.8 #835
Normalize AlignmentRecord.recordGroup* fields onto a separate record type #828
Gracefully handle missing Spark- and Hadoop-versions in jenkins-test; document how to set them. #827
Use Adam File with Hive #820
How do we handle reads that don't have original quality scores when converting to FASTQ with original qualities? #818
SAMFileHeader "sort order" attribute being un-set during file-save job #800
Use same sort order as Samtools #796
RNAME and RNEXT fields jumbled on transform BAM->ADAM->BAM #795
Support loading multiple indexed read files #787
Duplicate OUTPUT command line argument metaVar in adam2fastq #776
Allow Variant to ReferenceRegion conversion #768
Spark Errors References Deprecated SPARK_CLASSPATH #767
Spark Errors References Deprecated SPARK_CLASSPATH #766
adam2vcf fails with -coalesce #735
Writing to a BAM file with adamSAMSave consistently fails #721
BQSR on C835.HCC1143_BL.4 uses excessive amount of driver memory #714
Support writing RDD[Feature] to various file formats #710
adamParquetSave has a menacing false error message about *.adam extension #681
BAMHeader not set when running on a cluster #676
spark 1.3.1 upgarde to hortonworks HDP 2.2.4.2-2? #675
Symbol case class is nucleotide-centric #672
xAssembler cannot be build using mvn #658
adam-submit VerifyError #642
vcf2adam : Unsupported type ENUM #638
Update CDH documentation #615
Remove and generalize plugin code #602
Fix record oriented shuffle #599
Migrate preprocessing stages out of ADAM #598
Publish/socialize a roadmap #591
Eliminate format detection and extension checks for loading data #587
Improve error message when we can't find a ReferenceRegion for a contig #582
Do reference partitioners restrict a partition to contain keys from a single contig? #573
Connection refused errors when transforming BAM file with BQSR #516
ReferenceRegion shouldn't extend Ordered #511
Documentation for common usecases #491
Improve handling of "*" sequences during BQSR #484
Original qualities are parsed out, but left in attribute fields #483
Need a FileLocator that mirrors the use of Path in HDFS #477
FileLocator should support finding "child" locators. #476
Add S3 based Parquet directory loader #463
Should FASTQ output use reads' "original qualities"? #436
VcfStringUtils unused? #428
We should be able to filter genotypes that overlap a region #422
Create a simplified vocabulary for naming projections. #419
Update documentation #406
Bake off different region join implementations #395
Handle no-ops more intelligently when creating MD tags #392
Remove all the commands in the "CONVERSION OPERATIONS" CommandGroup #373
Fail to Write RDD into HDFS with Parquet Format #344
Refactor ReferencePositionWithOrientation #317
Add docs about SPARK_LOCAL_IP #305
PartitionAndJoin should throw an exception if it sees an unmapped read #297
Add insert size calculation #296
Newbie questions - learning resources? Reading a range of records from Adam? #281
Add variant effect ontology #261
Don't flatten optional SAM tags into a string #240
Characterize impact of partition size on pileup creation #163
Need to support BCF output format #153
Allow list of commands to be injected into adam-cli AdamMain #132
Parse out common annotations stored in VCF format #118
Update normalization code to enable normalization of sequences with more than two indels #64
Add clipping heuristic to indel realigner #63
BQSR should support recalibration across multiple ADAM files #58

Merged and closed pull requests:

fix SB tag parsing #1209 (fnothaft)
Fastq record converter #1208 (fnothaft)
Doc suggested partitionSize in ShuffleRegionJoin #1207 (jpdna)
Test demonstrating region join failure #1206 (jpdna)
fix SB tag parsing #1203 (jpdna)
fix build #1201 (ryan-williams)
[ADAM-1192] Correctly handle other whitespace in FASTA description. #1198 (fnothaft)
[ADAM-1190] Manually (un)pack IndelRealignmentTarget set. #1191 (fnothaft)
[ADAM-1188] Delete scripts/commit-pr.sh #1189 (fnothaft)
[ADAM-1186] Mask null from fs.globStatus. #1187 (fnothaft)
Fastq record converter #1185 (zyxue)
[ADAM-1182] isSorted=true should write SO:coordinate in SAM/BAM/CRAM header. #1183 (fnothaft)
Add scoverage aggregator and fail on low coverage. #1181 (fnothaft)
[ADAM-1179] Improve error message when globbing a parquet file fails. #1180 (fnothaft)
[ADAM-1176] Update command line doc and examples in README.md #1177 (heuermh)
Refactor CLIs for merging sharded files #1167 (fnothaft)
Update Hadoop-BAM to version 7.7.0 #1166 (heuermh)
[ADAM-1162] Write record group string name. #1163 (fnothaft)
Map IntervalList format column four to feature name #1159 (heuermh)
Make AlignmentRecordConverter public so that it can be used from other projects #1157 (tomwhite)
added predicate option to loadCoverage #1156 (akmorrow13)
[ADAM-1154] Change set -x to set -e in ./bin/adam-shell. #1155 (fnothaft)
Remove Gene and related models and parsing code #1153 (heuermh)
Reorder kryo.register statements in ADAMKryoRegistrator #1148 (heuermh)
Updated GenomicPartitioners to accept additional key. #1147 (akmorrow13)
[ADAM-1141] Add support for saving/loading AlignmentRecords to/from CRAM. #1145 (fnothaft)
misc pom/test/resource improvements #1142 (ryan-williams)
[ADAM-1136] Transform runs successfully with kryo registration required #1138 (fnothaft)
[ADAM-1132] Fix improper quoting of bash args in adam-shell. #1133 (fnothaft)
Remove StructuralVariant and StructuralVariantType, add names field to Variant #1131 (heuermh)
Remove StructuralVariant and StructuralVariantType, add names field to Variant #1130 (heuermh)
PR #1108 with issue #1122 #1128 (fnothaft)
[ADAM-1038] Eliminate writing to /tmp during CI builds. #1127 (fnothaft)
Update for bdg-formats code style changes #1126 (heuermh)
[ADAM-1124] Add Scoverage and generate coverage reports in Jenkins. #1125 (fnothaft)
[ADAM-1093] Move to support Spark 2.0.0. #1123 (fnothaft)
remove duplicated dependency #1119 (ryan-williams)
Clean up ADAMContext #1118 (fnothaft)
[ADAM-993] Support loading files using globs and from directory paths. #1117 (fnothaft)
[ADAM-1087] Migrate away from FileSystem.get #1116 (fnothaft)
[ADAM-1099] Make reference region not throw NPE. #1115 (fnothaft)
Add pipes API #1114 (fnothaft)
[ADAM-1105] Use assembly jar in adam-shell. #1111 (fnothaft)
Add outer joins #1109 (fnothaft)
Modified CalculateDepth to calcuate coverage from alignment files #1108 (akmorrow13)
Resolves various single file save/header issues #1104 (fnothaft)
[ADAM-1100] Resolve Sample Not Serializable exception #1101 (fnothaft)
added loadIndexedVcf and loadIndexedBam for multiple ReferenceRegions #1096 (akmorrow13)
Added support for Indexed VCF files #1095 (akmorrow13)
[ADAM-582] Eliminate .get on option in FragmentCoverter. #1091 (fnothaft)
[ADAM-776] Rename duplicate OUTPUT metaVar in ADAM2Fastq. #1090 (fnothaft)
refactored ReferenceFile to require SequenceDictionary #1086 (akmorrow13)
[ADAM-1073] Remove network-connected and default test-related Maven profiles #1082 (heuermh)
[ADAM-1053] Clean up Transform #1081 (fnothaft)
[ADAM-1061] Clean up attributes regex and denormalized fields #1080 (fnothaft)
Extended TwoBitFile and NucleotideContigFragmentRDDFunctions to behave more similar #1079 (akmorrow13)
Refactor variant and genotype annotations #1078 (heuermh)
[ADAM-1039] Add basic support for Sample record. #1077 (fnothaft)
Remove code workarounds necessary for Spark 1.2.1/Hadoop 1.0.x support #1076 (heuermh)
[ADAM-194] Use separate filtersFailed and filtersPassed arrays for variant quality filters #1075 (heuermh)
Whitespace code style fixes #1074 (heuermh)
[ADAM-1006] Split überjar out to adam-assembly submodule. #1072 (fnothaft)
Remove code coverage profile #1071 (heuermh)
[ADAM-768] ReferenceRegion from variant/genotypes #1070 (fnothaft)
[ADAM-1044] Support VCF annotation ANN field #1069 (heuermh)
[ADAM-1067] Add release documentation and scripting for Spark Packages. #1068 (fnothaft)
[ADAM-602] Remove plugin code. #1065 (fnothaft)
Refactoring org.bdgenomics.adam.io package. #1064 (fnothaft)
Cleanup in org.bdgenomics.adam.converters package. #1062 (fnothaft)
[ADAM-1057] Remove workaround for gzip/BGZF compressed VCF headers #1057 (heuermh)
Cleanup on org.bdgenomics.adam.algorithms.smithwaterman package. #1056 (fnothaft)
Documentation cleanup and minor refactor on the consensus package. #1055 (fnothaft)
Add KEYS with public code signing keys #1054 (heuermh)
Adding GA4GH 0.5.1 converter for reads. #1052 (fnothaft)
[ADAM-1011] Refactor to add GenomicRDDs for all Avro types #1051 (fnothaft)
removed interval trait and redirected to interval in utils-intervalrdd #1046 (akmorrow13)
[ADAM-952] Expose sorting by reference index. #1045 (fnothaft)
overlap query reflects new formats #1043 (erictu)
Changed loadIndexedBam to use hadoop-bam InputFormat #1036 (fnothaft)
Increase Avro dependency version to 1.8.0 #1034 (heuermh)
Improved README fix using feedback from other approach review. #1034 (InvisibleTech)
Error in the README.md for kmer.scala example, need to get rdd first. #1032 (InvisibleTech)
Add fragmentEndPosition to NucleotideContigFragment #1030 (heuermh)
Logging to be done by ADAM utils code rather than Spark #1028 (jpdna)
add maxScore #1027 (xubo245)
[ADAM-1008] Modify jenkins-test script to support Java 8 build. #1026 (fnothaft)
whitespace change, do not merge #1025 (shaneknapp)
require kryo registration in tests #1020 (ryan-williams)
print full stack traces on test failures #1019 (ryan-williams)
bump commons-io version #1017 (ryan-williams)
exclude javadoc jar in adam-shell #1016 (ryan-williams)
[ADAM-909] Refactoring variation RDDs. #1015 (fnothaft)
Modified CalculateDepth to get coverage on whole alignment adam files #1010 (akmorrow13)
[ADAM-1004] Remove recursive maven.build.timestamp declaration #1005 (heuermh)
Maint 2.11 0.19.0 #999 (tushu1232)
[ADAM-710] Add saveAs methods for feature formats GTF, BED, IntervalList, and NarrowPeak #998 (heuermh)
Moving Adam2Fastq to ADAM2Fastq #995 (heuermh)
Update release doc for CHANGES.md and homebrew #994 (heuermh)
Update to AlignmentRecordField and its usages as contig changed to co… #992 (jpdna)
[ADAM-974] Short term fix for multiple ADAM cli assembly jars check #990 (heuermh)
Update hadoop-bam dependency version to 7.5.0 #989 (heuermh)
Replaced Contig with ContigName in AlignmentRecord and related changes #988 (jpdna)
fix some deprecation/style things and rename a pkg #986 (ryan-williams)
Fix Adam2fastq in case of read with both reverse and unmapped flags #982 (jpdna)
[ADAM-510] Refactoring RDD function names #979 (heuermh)
Use .adam/_{seq,rg}dict.avro paths for Avro-formatted dictionaries #978 (heuermh)
Remove unused file VcfHeaderUtils.scala #977 (heuermh)
add validation stringency to bam parsing, flagstat #976 (ryan-williams)
more permissible jar regex in adam-submit #975 (ryan-williams)
fix bash arg array processing in adam-submit #972 (ryan-williams)
adamGetReferenceString reduces pairs correctly, fixes #967 #970 (erictu)
A few improvements #966 (ryan-williams)
improve SW performance by replacing functional reductions with imperative ones #965 (noamBarkai)
[ADAM-962] Fix corrupt single-file BAM output. #964 (fnothaft)
[ADAM-960] Updating bdg-utils dependency version to 0.2.4 #961 (heuermh)
[ADAM-946] Fixes to FlagStat for Samtools concordance issue #954 (jpdna)
Use hadoop-bam BAMInputFormat to do loadIndexedBam #953 (andrewmchen)
Add -print_metrics option to Jenkins build #947 (heuermh)
adam2vcf doesn't have info fields #939 (andrewmchen)
[ADAM-893] Register missing serializers. #933 (fnothaft)

Version 0.19.0

Closed issues:

Update bdg-utils dependency version to 0.2.4 #960
Drop support for Spark version 1.2.1, Hadoop version 1.0.x #958
Exception occurs when running tests on master #956
Flagstat results still don't match samtools flagstat #946
readInFragment value is not properly read from parquet file into RDD[AlignmentRecord] #942
adam2vcf -sort_on_save flag broken #940
Transform -limit_projection requires .sam.seqdict file #937
MarkDuplicates fails if library name is not set #934
fastqtobam or sam #928
Vcf2Adam uses SB field instead of FS field for fisher exact test for strand bias #923
Add back limit_projection on Transform #920
BAM header is not getting set on partition 0 with headerless BAM output format #916
Add numParts apply method to GenomicRegionPartitioner #914
Add Spark version 1.6.x to Jenkins build matrix #913
Target Spark 1.5.2 as default Spark version #911
Move to bdg-formats 0.7.0 #905
secondOfPair and firstOfPair flag is missing in the newest 0.18 adam transformed results from BAM #903
Future pull request #900
error in vcf2adam #899
Importing directory of VCFs seems to fail #898
How to filter genotypeRDD on sample names? org.apache.spark.SparkException: Task not serializable? #891
Add Spark version 1.5.x to Jenkins build matrix #889
Transform DAG causes stages to recompute #883
adam-submit buildinfo is confused #880
move_to_scala_2.11 and maven-javadoc-plugin #863
NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable #837
Fix record oriented shuffle #599
Avro.GenericData error with ADAM 0.12.0 on reading from ADAM file #290

Merged and closed pull requests:

[ADAM-960] Updating bdg-utils dependency version to 0.2.4 #961 (heuermh)
[ADAM-946] Fixes to FlagStat for Samtools concordance issue #954 (jpdna)
Fix for travis build, replace reads2ref with reads2fragments #950 (heuermh)
[ADAM-940] Fix adam2vcf -sort_on_save flag #949 (massie)
Remove BuildInformation and extraneous git-commit-id-plugin configuration #948 (heuermh)
Update readme for spark 1.5.2 and hadoop 2.6.0 #944 (heuermh)
[ADAM-942] Replace first/secondInRead with readInFragment #943 (heuermh)
[ADAM-937] Adding check for aligned read predicate or limit projection flags and non-parquet input path #938 (heuermh)
[ADAM-934] Properly handle unset library name during duplicate marking #935 (fnothaft)
[ADAM-911] Move to Spark 1.5.2 and Hadoop 2.6.0 as default versions. #932 (fnothaft)
added start and end values to Interval Trait. Used for IntervalRDD #931 (akmorrow13)
Removing buildinfo command #929 (heuermh)
Removing symbolic test resource links, read from test classpath instead #927 (heuermh)
Changed fisher strand bias field for VCF2Adam from SB to FS #924 (andrewmchen)
[ADAM-920] Limit tag/orig qual flags in Transform. #921 (fnothaft)
Change the README to use adam-shell -i instead of pasting #919 (andrewmchen)
[ADAM-916] New strategy for writing header. #917 (fnothaft)
[ADAM-914] Create a GenomicRegionPartitioner given a partition count. #915 (fnothaft)
Squashed #907 and ran format-sources #908 (fnothaft)
Various small fixes #907 (huitseeker)
ADAM-599, 905: Move to bdg-formats:0.7.0 and migrate metadata #906 (fnothaft)
Rewrote the getType method to handle all ploidy levels #904 (NeillGibson)
Single file save from #733, rebased #901 (fnothaft)
Added is* genotype methods from HTS-JDK Genotype to RichGenotype #895 (NeillGibson)
[ADAM-891] Mark SparkContext as @transient. #894 (fnothaft)
Update README URLs based on HTTP redirects #892 (ReadmeCritic)
adding --version command line option #888 (heuermh)
Add exception in move_to_scala_2.11.sh for maven-javadoc-plugin #887 (heuermh)
Fix tightlist bug in Pandoc #885 (massie)
[ADAM-883] Add caching to Transform pipeline. #884 (fnothaft)

Version 0.18.2

ISSUE 877: Minor fix to commit script to support https.
ISSUE 876: Separate command line argument words by underscores
ISSUE 875: P Operator parsing for MDTag
ISSUE 873: [ADAM-872] Modify regex to capture release and SNAPSHOT jars but not javadoc or sources jars
ISSUE 866: [ADAM-864] Don't force shuffle if reducing partition count.
ISSUE 856: export valid fastq
ISSUE 847: Updating build dependency versions to latest minor versions

Version 0.18.1

ISSUE 870: [ADAM-867] add pull requests missing from 0.18.0 release to CHANGES.md
ISSUE 869: [ADAM-868] make release branch and tag names consistent
ISSUE 862: [ADAM-861] use -d to check for repo assembly dir

Version 0.18.0

ISSUE 860: New release and pr-commit scripts
ISSUE 859: [ADAM-857] Corrected handling of env vars in bin scripts
ISSUE 854: [ADAM-853] allow main class in adam-submit to be specified
ISSUE 852: [ADAM-851] Slienced Parquet logging.
ISSUE 850: [ADAM-848] TwoBitFile now support nBlocks and maskBlocks
ISSUE 846: Updating maven build plugin dependency versions
ISSUE 845: [ADAM-780] Make DecadentRead package private.
ISSUE 844: [ADAM-843] Aggressively project out metadata fields.
ISSUE 840: fix flagstat output file encoding
ISSUE 839: let flagstat write to file
ISSUE 831: Support loading paired fastqs
ISSUE 830: better validation when saving paired fastqs
ISSUE 829: fix Long != null warnings
ISSUE 819: Implement custom ReferenceRegion hashcode
ISSUE 816: [ADAM-793] adding command to convert ADAM nucleotide contig fragments to FASTA files
ISSUE 815: Upgrade to bdg-formats:0.6.0, add Fragment datatype converters
ISSUE 814: [ADAM-812] fix for javadoc errors on JDK8
ISSUE 813: [ADAM-808] build an assembly cli jar with maven shade plugin
ISSUE 810: [ADAM-807] workaround for git-commit-id/git-commit-id-maven-plugin#61
ISSUE 809: [ADAM-785] Add support for all numeric array (TYPE=B) tags
ISSUE 806: [ADAM-755] updating utils dependency version to 0.2.3
ISSUE 805: Better transform error when file doesn't exist
ISSUE 803: fix unmapped-read sorting
ISSUE 802: stop writing contig names as md5 sums
ISSUE 798: fix SAM-attr conversion bug; int[]'s not byte[]'s
ISSUE 790: optionally add MDTags to reads with transform
ISSUE 782: Fix SAM Attribute parser for numeric array tags
ISSUE 773: [ADAM-772] fix some bash var quoting
ISSUE 765: [ADAM-752] Build for many combos of Spark/Hadoop versions.
ISSUE 764: More involved README restructuring
ISSUE 762: [ADAM-132] allowing list of commands to be injected into adam-cli ADAMMain

Version 0.17.1

ISSUE 784: [ADAM-783] Write @SQ header lines in sorted order.
ISSUE 792: [ADAM-791] Add repartition parameter to Fasta2ADAM.
ISSUE 781: [ADAM-777] Add validation stringency flag for BQSR.
ISSUE 757: We should print a warning message if the user has ADAM_OPTS set.
ISSUE 770: [ADAM-769] Fix serialization issue in known indel consensus model.
ISSUE 763: Clean up README links, other nits
ISSUE 749: Remove adam-cli jar from classpath during adam-submit
ISSUE 754: Bump ADAM to Spark 1.4
ISSUE 753: Bump Spark to 1.4
ISSUE 748: Fix for mdtag issues with insertions
ISSUE 746: Upgrade to Parquet 1.8.1.
ISSUE 744: [ADAM-743] exclude conflicting jackson dependencies
ISSUE 737: Reverse complement negative strand reads in fastq output
ISSUE 731: Fixed bug preventing use of TLEN attribute
ISSUE 730: [ADAM-729] Stuff TLEN into attributes.
ISSUE 728: [ADAM-709] Remove FeatureHierarchy and FeatureHierarchySuite
ISSUE 719: [ADAM-718] Use filesystem path to get underlying file system.
ISSUE 712: unify header-setting between BAM/SAM and VCF
ISSUE 696: include SequenceRecords from second-in-pair reads
ISSUE 698: class-ify ShuffleRegionJoin, force setting seqdict
ISSUE 706: restore clause guarding pruneCache check
ISSUE 705: GeneFeatureRDDFunctions → FeatureRDDFunctions

Version 0.17.0

ISSUE 691: fix BAM/SAM header setting when writing on cluster
ISSUE 688: make adamLoad public
ISSUE 694: Fix parent reference in distribution module
ISSUE 684: a few region-join nits
ISSUE 682: [ADAM-681] Remove menacing error message about reqd .adam extension
ISSUE 680: [ADAM-674] Delete Bam2ADAM.
ISSUE 678: upgrade to bdg utils 0.2.1
ISSUE 668: [ADAM-597] Move correction out of ADAM and into a downstream project.
ISSUE 671: Bug fix in ReferenceUtils.unionReferenceSet
ISSUE 667: [ADAM-666] Clean up key not found error in partitioner code.
ISSUE 656: Update Vcf2ADAM.scala
ISSUE 652: added filterByOverlappingRegion in GeneFeatureRDDFunctions
ISSUE 650: [ADAM-649] Support transform of all BAM/SAM files in a directory.
ISSUE 647: [ADAM-646] Special case reads with '*' quality during BQSR.
ISSUE 645: [ADAM-634] Create a local ParquetLister for testing purposes.
ISSUE 633: [Adam] Tests for SAMRecordConverter.scala
ISSUE 641: [ADAM-640] Fix incorrect exclusion for org.seqdoop.htsjdk.
ISSUE 632: [ADAM-631] Allow VCF conversion to sort on output after coalescing.
ISSUE 628: [ADAM-627] Makes ReferenceFile trait extend Serializable.
ISSUE 637: check for mac brew alternate spark install structure
ISSUE 624: Conceptual fix for duplicate marking and sorting stragglers
ISSUE 629: [ADAM-604] Remove normalization code.
ISSUE 630: Add flatten command.
ISSUE 619: [ADAM-540] Move to new HTSJDK release; should support Java 8.
ISSUE 626: [ADAM-625] Enable globbing for BAM.
ISSUE 621: Removes the predicates package.
ISSUE 620: [ADAM-600] Adding RegionJoin trait.
ISSUE 616: [ADAM-565] Upgrade to Parquet filter2 API.
ISSUE 613: [ADAM-612] Point to proper k-mer counters.
ISSUE 588: [ADAM-587] Clean up loading checks.
ISSUE 592: [ADAM-513] Remove ReferenceMappable trait.
ISSUE 606: [ADAM-605] Remove visualization code.
ISSUE 596: [ADAM-595] Delete the 'comparisons' code.
ISSUE 590: [ADAM-589] Removed pileup code.
ISSUE 586: [ADAM-452] Fixes SM attribute on ADAM to BAM conversion.
ISSUE 584: [ADAM-583] Add k-mer counting functionality for nucleotide contig fragments

Version 0.16.0

ISSUE 570: A few small conversion fixes
ISSUE 579: [ADAM-578] Update end of read when trimming.
ISSUE 564: [ADAM-563] Add warning message when saving Parquet files with incorrect extension
ISSUE 576: Changed hashCode implementations to improve performance of BQSR
ISSUE 569: Typo in the narrowPeak parser
ISSUE 568: Moved the Timers object from bdg-utils back to ADAM
ISSUE 478: Move non-genomics code
ISSUE 550: [ADAM-549] Added documentation for testing and CI for ADAM.
ISSUE 555: Makes maybeLoadVCF private.
ISSUE 558: Makes Features2ADAMSuite use SparkFunSuite
ISSUE 557: Randomize ports and turn off Spark UI to reduce bind exceptions in tests
ISSUE 552: Create test suite for FlagStat
ISSUE 554: privatize ADAMContext.maybeLoad{Bam,Fastq}
ISSUE 551: [ADAM-386] Multiline FASTQ input
ISSUE 542: Variants Visualization
ISSUE 545: [ADAM-543][ADAM-544] Fix issues with ADAM scripts and classpath
ISSUE 535: [ADAM-441] put a check in for Nothing. Throws an IAE if no return type is provided
ISSUE 546: [ADAM-532] Fix wigFix intermittent test failure
ISSUE 534: [ADAM-528][ADAM-533] Adds new RegionJoin impl that is shuffle-based
ISSUE 531: [ADAM-529] Attaching scaladoc to released distribution.
ISSUE 413: [ADAM-409][ADAM-520] Added local wigfix2bed tool
ISSUE 527: [ADAM-526] VcfAnnotation2ADAM only counts once
ISSUE 523: don't open non-.adam-extension files as ADAM files
ISSUE 521: quieting wget output
ISSUE 482: [ADAM-462] Coverage region calculation
ISSUE 515: [ADAM-510] fix for bash syntax error; add ADDL_JARS check to adam-submit

Version 0.15.0

ISSUE 509: Add a 'distribution' module to create assemblies
ISSUE 508: Upgrade from Parquet 1.4.3 to 1.6.0rc4
ISSUE 498: [ADAM-496] Changes VCF to flat ADAM command name and usage
ISSUE 500: [ADAM-495] Require SPARK_HOME for adam-submit
ISSUE 501: [ADAM-499] Add -onlyvariants option to vcf2adam
ISSUE 507: [ADAM-505] Removed adam-local from docs
ISSUE 504: [ADAM-502] Add missing Long implicit to ColumnReaderInput
ISSUE 503: [ADAM-473] Make RecordCondition and FieldCondition public
ISSUE 494: Fix foreach block for vcf ingest
ISSUE 492: Documentation cleanup and style improvements
ISSUE 481: [ADAM-480] Switch assembly to single goal.
ISSUE 487: [ADAM-486] Add port option to viz command.
ISSUE 469: [ADAM-461] Fix ReferenceRegion and ReferencePosition impl
ISSUE 440: [ADAM-439] Fix ADAM to account for BDG-FORMATS-35: Avro uses Strings
ISSUE 470: added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions
ISSUE 468: refactor RDD loading; explicitly load alignments
ISSUE 474: Consolidate documentation into a single location in source.
ISSUE 471: Fixed typo on MAVEN_OPTS quotation mark
ISSUE 467: [ADAM-436] Optionally output original qualities to fastq
ISSUE 451: add adam view command, analogous to samtools view
ISSUE 466: working examples on .sam included in repo
ISSUE 458: Remove unused val from Reads2Ref
ISSUE 438: Add ability to save paired-FASTQ files
ISSUE 457: A few random Predicate-related cleanups
ISSUE 459: a few tweaks to scripts/jenkins-test
ISSUE 460: Project only the sequence when kmer/qmer counting
ISSUE 450: Refactor some file writing and reading logic
ISSUE 455: [ADAM-454] Add serializers for Avro objects which don't have serializers
ISSUE 447: Update the contribution guidelines
ISSUE 453: Better null handling for isSameContig utility
ISSUE 417: Stores original position and original cigar during realignment.
ISSUE 449: read “OQ” attr from structured SAMRecord field
ISSUE 446: Revert "[ADAM-237] Migrate to Chill serialization libraries."
ISSUE 437: random nits
ISSUE 434: Few transform tweaks
ISSUE 435: [ADAM-403] Remove seqDict from RegionJoin
ISSUE 431: A few tweaks, typo corrections, and random cleanups
ISSUE 430: [ADAM-429] adam-submit now handles args correctly.
ISSUE 427: Fixes for indel realigner issues
ISSUE 418: [ADAM-416] Removing 'ADAM' prefix
ISSUE 404: [ADAM-327] Adding gene, transcript, and exon models.
ISSUE 414: Fix error in adam-local alias
ISSUE 415: Update README.md to reflect Spark 1.1
ISSUE 412: [ADAM-411] Updated usage aliases in README. Fixes #411.
ISSUE 408: [ADAM-405] Add FASTQ output.
ISSUE 385: [ADAM-384] Adds import from FASTQ.
ISSUE 400: [ADAM-399] Fix link to schemas.
ISSUE 396: [ADAM-388] Sets Kryo serialization with --conf args
ISSUE 394: [ADAM-393] Adds knobs to SparkContext creation in SparkFunSuite
ISSUE 391: [ADAM-237] Migrate to Chill serialization libraries.
ISSUE 380: Rewrite of MarkDuplicates which seems to improve performance
ISSUE 387: fix some deprecation warnings

Version 0.14.0

ISSUE 376: [ADAM-375] Upgrade to Hadoop-BAM 7.0.0.
ISSUE 378: [ADAM-360] Upgrade to Spark 1.1.0.
ISSUE 379: Fix the position of the jar path in the submit.
ISSUE 383: Make Mdtags handle '=' and 'X' cigar operators
ISSUE 369: [ADAM-369] Improve debug output for indel realigner
ISSUE 377: [ADAM-377] Update to Jenkins scripts and README.
ISSUE 374: [ADAM-372][ADAM-371][ADAM-365] Refactoring CLI to simplify and integrate with Spark model better
ISSUE 370: [ADAM-367] Updated alias in README.md
ISSUE 368: erasure, nonexhaustive-match, deprecation warnings
ISSUE 354: [ADAM-353] Fixing issue with SAM/BAM/VCF header attachment when running distributed
ISSUE 357: [ADAM-357] Added Java Plugin hook for ADAM.
ISSUE 352: Fix failing MD tag
ISSUE 363: Adding maven assembly plugin configuration to create tarballs
ISSUE 364: [ADAM-364] Fixing remaining cs.berkeley.edu URLs.
ISSUE 362: Remove mention of uberjar from README

Version 0.13.0

ISSUE 343: Allow retrying on failure for HTTPRangedByteAccess
ISSUE 349: Fix for a NullPointerException when hostname is null in Task Metrics
ISSUE 347: Bug fix for genome browser
ISSUE 346: Genome visualization
ISSUE 342: [ADAM-309] Update to bdg-formats 0.2.0
ISSUE 333: [ADAM-332] Upgrades ADAM to Spark 1.0.1.
ISSUE 341: [ADAM-340] Adding the TrackedLayout trait and implementation.
ISSUE 337: [ADAM-335] Updated README.md to reflect migration to appassembler.
ISSUE 311: Adding several simple normalizations.
ISSUE 330: Make mismatch and deletes positions accessible
ISSUE 334: Moving code coverage into a profile
ISSUE 329: Add count of mismatches to mdtag
ISSUE 328: [ADAM-326] Adding a 5-second retry on the HttpRangedByteAccess test.
ISSUE 325: Adding documentation for commit/issue nomenclature and rebasing

Version 0.12.1

ISSUE 308: Fixing the 'index 0' bug in features2adam
ISSUE 306: Adding code for lifting over between sequences and the reference genome.
ISSUE 320: Remove extraneous implicit methods in ReferenceMappingContext
ISSUE 314: Updates to indel realigner to improve performance and accuracy.
ISSUE 319: Adding scripts for publishing scaladoc.
ISSUE 315: Added table of (wall-clock) stage durations when print_metrics is used
ISSUE 312: Fixing sources jar
ISSUE 313: Making the CredentialsProperties file optional
ISSUE 267: Parquet and indexed Parquet RDD implementations, and indices.
ISSUE 301: Add Beacon's AlleleCount
ISSUE 293: Add aggregation and display of metrics obtained from Spark
ISSUE 295: Fix broken link to ADAM specification for storing reads.
ISSUE 292: Cleaning up scaladoc generation warnings.
ISSUE 289: Modifying interleaved fastq format to be hadoop version independent.
ISSUE 288: Add ADAMFeature to Kryo registrator
ISSUE 286: Removing some debug printout that was left in.
ISSUE 287: Cleaning hadoop dependencies
ISSUE 285: Refactoring read groups to increase the amount of data stored.
ISSUE 284: Cleaning up build warnings.
ISSUE 280: Move to bdg-formats
ISSUE 283: Fix reference name comment
ISSUE 282: Minor cleanup on interleaved FASTQ input format.
ISSUE 277: Implemented HTTPRangedByteAccess.
ISSUE 274: Added clarifying note to ADAMVariantContext
ISSUE 279: Simplify format-source
ISSUE 278: Use maven license plugin to ensure source has correct license
ISSUE 268: Adding fixed depth prefix trie implementation
ISSUE 273: Fixes issue in reference models where strings are not sanitized on collection from avro.
ISSUE 272: Created command categories
ISSUE 269: Adding k-mer and q-mer counting.
ISSUE 271: Consolidate Parquet logging configuration

Version 0.12.0

ISSUE 264: Parquet-related Utility Classes
ISSUE 259: ADAMFlatGenotype is a smaller, flat version of a genotype schema
ISSUE 266: Removed extra command 'BuildInformation'
ISSUE 263: Added AdamContext.referenceLengthFromCigar
ISSUE 260: Modifying conversion code to resolve #112.
ISSUE 258: Adding an 'args' parameter to the plugin framework.
ISSUE 262: Adding reference assembly name to ADAMContig.
ISSUE 256: Upgrading to Spark 1.0
ISSUE 257: Adds toString method for sequence dictionary.
ISSUE 255: Add equals, canEqual, and hashCode methods to MdTag class

Version 0.11.0

ISSUE 254: Cleanup import statements
ISSUE 250: Adding ADAM to SAM conversion.
ISSUE 248: Adding utilities for read trimming.
ISSUE 252: Added a note about rebasing-off-master to CONTRIBUTING.md
ISSUE 249: Cosmetic changes to FastaConverter and FastaConverterSuite.
ISSUE 251: CHANGES.md is updated at release instead of per pull request
ISSUE 247: For #244, Fragments were incorrect order and incomplete
ISSUE 246: Making sample ID field in genotype nullable.
ISSUE 245: Adding ADAMContig back to ADAMVariant.
ISSUE 243: Rebase PR#238 onto master

Version 0.10.0

ISSUE 242: Upgrade to Parquet 1.4.3
ISSUE 241: Fixes to FASTA code to properly handle indices.
ISSUE 239: Make ADAMVCFOutputFormat public
ISSUE 233: Build up reference information during cigar processing
ISSUE 234: Predicate to filter conversion
ISSUE 235: Remove unused contiglength field
ISSUE 232: Add -pretty and -o to the print command
ISSUE 230: Remove duplicate mdtag field
ISSUE 231: Helper scripts to run an ADAM Console.
ISSUE 226: Fix ReferenceRegion from ADAMRecord
ISSUE 225: Change Some to Option to check for unmapped reads
ISSUE 223: Use SparkConf object to configure SparkContext
ISSUE 217: Stop using reference IDs and use reference names instead
ISSUE 220: Update SAM to ADAM conversion
ISSUE 213: BQSR updates

Version 0.9.0

ISSUE 214: Upgrade to Spark 0.9.1
ISSUE 211: FastaConverter Refactor
ISSUE 212: Cleanup build warnings
ISSUE 210: Remove Scalariform from process-sources phase
ISSUE 209: Fix Scalariform issues and Maven warnings
ISSUE 207: Change from deprecated manifest erasure to runtimeClass
ISSUE 206: Add Scalariform settings to pom
ISSUE 204: Update Avro code gen to not mark fields as deprecated.

Version 0.8.0

ISSUE 203: Move package from edu.berkeley.cs.amplab to org.bdgenomics
ISSUE 199: Updating pileup conversion code to convert sequences that use the X and = (EQ) CIGAR operators
ISSUE 191: Add repartition parameter
ISSUE 183: Fixing Job.getInstance call that breaks hadoop 1 compatibility.
ISSUE 192: Add docs and scripts for creating a release
ISSUE 193: Issue #137, clarify role of CHANGES.{md,txt}

Version 0.7.2

ISSUE 187: Add summarize_genotypes command
ISSUE 178: Upgraded to Hadoop-BAM 0.6.2/Picard 1.107.
ISSUE 173: Parse annotations out of vcf files
ISSUE 162: Refactored SequenceDictionary
ISSUE 180: BQSR using vcf loader
ISSUE 179: Update maven-surefire-plugin dependency version to 2.17, also create an ...
ISSUE 175: VariantContext converter refactor
ISSUE 169: Cleaning up mpileup command
ISSUE 170: Adding variant field enumerations

Version 0.7.1

Version 0.7.3

Version 0.7.2

ISSUE 166: Pair-wise genotype concordance of genotype RDDs, with CLI tool

Version 0.7.0

ISSUE 171: Add back in allele dosage for genotypes.

Version 0.7.0

ISSUE 167: Fix for Hadoop 1.0.x support
ISSUE 165: call PluginExecutor in apply method, fixes issue 164
ISSUE 160: Refactoring FASTA work to break contig sizes.
ISSUE 78: Upgrade to Spark 0.9 and Scala 2.10
ISSUE 138: Display Git commit info on command line
ISSUE 161: Added switches to spark context creation code
ISSUE 117: Add a "range join" method.
ISSUE 151: Vcf work concordance and genotype
ISSUE 150: Remaining variant changes for adam2vcf, unit tests, and CLI modifications
ISSUE 147: Resurrect VCF conversion code
ISSUE 148: Moving createSparkContext into core
ISSUE 142: Enforce Maven and Java versions
ISSUE 144: Merge of last few days of work on master into this branch
ISSUE 124: Vcf work rdd master merge
ISSUE 143: Changing package declaration to match test file location and removing un...
ISSUE 140: Update README.md
ISSUE 139: Update README.md
ISSUE 129: Modified pileup transforms to improve performance + to add options
ISSUE 116: add fastq interleaver script
ISSUE 125: Add design doc to CONTRIBUTING document
ISSUE 114: Changes to RDD utility files for new variant schema
ISSUE 122: Add IRC Channel to readme
ISSUE 100: CLI component changes for new variant schema
ISSUE 108: Adding new PluginExecutor command
ISSUE 98: Vcf work remove old variant
ISSUE 104: Added the port erasure to SparkFunSuite's cleanup.
ISSUE 107: Cleaning up change documentation.
ISSUE 99: Encoding tag types in the ADAMRecord attributes, adding the 'tags' command
ISSUE 105: Add initial documentation on contributing
ISSUE 97: New schema, variant context converter changes, and removal of old genoty...
ISSUE 79: Adding ability to convert reference FASTA files for nucleotide sequences
ISSUE 91: Minor change, increase adam-cli usage width to 150 characters
ISSUE 86: Fixes to pileup code
ISSUE 88: Added function for building variant context from genotypes.
ISSUE 81: Update README and cleanup top-level cli help text
ISSUE 76: Changing hadoop fs call to be compatible with Hadoop 1.
ISSUE 74: Updated CHANGES.txt to include note about the recursive-load branch.
ISSUE 73: Support for loading/combining multiple ADAM files into a single RDD.
ISSUE 72: Added ability to create regions from reads, and to merge adjacent regions
ISSUE 71: Change RecalTable to use optimized phred calculations
ISSUE 68: sonatype-nexus-snapshots repository is already in parent oss-parent-7 pom
ISSUE 67: fix for wildcard exclusion maven warnings
ISSUE 65: Create a cache for phred -> double values instead of recalculating
ISSUE 60: Bugfix for BQSR: Offset into qualityScore list was wrong
ISSUE 66: add pluginDependency section and remove versions in plugin sections
ISSUE 61: Filter utility for inverse of Projection
ISSUE 48: Fix read groups mapping and add Y as base type
ISSUE 36: Adding reads to rods transformation.
ISSUE 56: Adding Yy as base in MdTag

Version 0.6.0

ISSUE 53: Fix Hadoop 2.2.0 support, upgrade to Spark 0.8.1
ISSUE 52: Attributes: Use 't' instead of ',', as , is a valid character
ISSUE 47: Adding containsRefName to SequenceDictionary
ISSUE 46: Reduce logging for the actual adamSave job
ISSUE 45: Make MdTag immutable
ISSUE 38: Small bugfixes and cleanups to BQSR
ISSUE 40: Fixing reference position from offset implementation
ISSUE 31: Fixing a few issues in the ADAM2VCF2ADAM pipeline.
ISSUE 30: Suppress parquet logging in FieldEnumerationSuite
ISSUE 28: Fix build warnings
ISSUE 24: Add unit tests for marking duplicates
ISSUE 26: Fix unmapped reads in sequence dictionary
ISSUE 23: Generalizing the Projection class
ISSUE 25: Adding support for before, after clauses to SparkFunSuite.
ISSUE 22: Add a unit test for sorting reads
ISSUE 21: Adding rod functionality: a specialized grouping of pileup data.
ISSUE 13: Cleaning up VCF<->ADAM pipeline
ISSUE 20: Added Apache License 2.0 boilerplate to tops of all the GB-(c) files
ISSUE 19: Allow the Hadoop version to be specified
ISSUE 17: Fix transform -sort_reads partitioning. Add -coalesce option to transform.
ISSUE 16: Fixing an issue in pileup generation and in the MdTag util.
ISSUE 15: Tweaks 1
ISSUE 12: Subclass testing bug in AdamContext.adamLoad
ISSUE 11: Missing brackets in VcfConverter.getType
ISSUE 10: Moved record field name enum over to the projections package.
ISSUE 8: Fixes to sorting in ReferencePosition
ISSUE 4: New SparkFunSuite test support class, logging util and new BQSR test.
ISSUE 1: Fix scalatest configuration and fix unit tests
ISSUE 14: Converting some of the Option() calls to Some()
ISSUE 13: Cleaning up VCF<->ADAM pipeline
ISSUE 9: Adding support for a Sequence Dictionary from BAM files
ISSUE 8: Fixes to sorting in ReferencePosition
ISSUE 7: ADAM variant and genotype formats; and a VCF->ADAM converter
ISSUE 4: New SparkFunSuite test support class, logging util and new BQSR test.
ISSUE 3: Adding in implicit conversion functions for going between Java and Scala...
ISSUE 2: Update from Spark 0.7.3 to 0.8.0-incubating
ISSUE 1: Fix scalatest configuration and fix unit tests

Files

CHANGES.md

Latest commit

History

CHANGES.md

File metadata and controls

ADAM Changelog

Version 0.31.0

Version 0.30.0

Version 0.29.0

Version 0.28.0

Version 0.27.0

Version 0.26.0

Version 0.25.0

Version 0.24.0

Version 0.23.0

Version 0.22.0

Version 0.21.0

Version 0.20.0

Version 0.19.0

Version 0.18.2

Version 0.18.1

Version 0.18.0

Version 0.17.1

Version 0.17.0

Version 0.16.0

Version 0.15.0

Version 0.14.0

Version 0.13.0

Version 0.12.1

Version 0.12.0

Version 0.11.0

Version 0.10.0

Version 0.9.0

Version 0.8.0

Version 0.7.2

Version 0.7.1

Version 0.7.3

Version 0.7.2

Version 0.7.0

Version 0.7.0

Version 0.6.0