Closed issues:
- Add deprecated annotations for code to be removed to support Spark 3 #2254
- Update bdg-utils dependency version to 0.2.16 #2252
- Bump Apache Spark dependency version to 2.4.5 #2248
- FastqRecordConvert incompatible with single tube long fragment read headers #2246
- Bam files with no unmapped reads fails to sort #2242
- Unit test failure when building from release tarball #2241
- Adam without HDFS #2238
- Jenkins build status icon link is broken #2228
- Write block-gzipped (bgzf) feature formats #2191
- adam-submit is not exiting until I hit ctrl+C #2040
- WARN VariantContextConverter:924 - Ran into Array Out of Bounds when accessing indices 0,1,2 of genotype . #2024
- Add doc for running on HPC with PBS #2002
- loadFastq with paired gzipped FASTQ files fails via s3a URLs #1855
- Where to put lift over function #1811
- Add transform to fix chromosome prefixes to genomic RDDs and CLIs #1757
- Support using Spark-BAM to load BAM files #1683
- Handling Validation Stringency without repeated code #1572
- New model PartitionMap for Array[Option[(ReferenceRegion, ReferenceRegion)]] #1558
- Revisit double-negative command line options (e.g. -disable_fast_concat) #1503
- Improve test coverage for SAMRecord<->AlignmentRecord #1284
- Allow alphabets to canonicalize strings #797
- Update MdTag.getReference for CIGAR N #742
- Replace contig length maps with sequence dictionary #572
- Use tool like Scala Refactoring to enforce import guidelines #445
Merged and closed pull requests:
- [ADAM-2254] Add deprecated annotations for code to be removed to support Spark 3 #2256 (heuermh)
- [ADAM-2252] Update bdg-utils dependency version to 0.2.16 #2253 (heuermh)
- [ADAM-2248] Bump Apache Spark dependency version to 2.4.5 #2249 (heuermh)
- [ADAM-2241] Commit template substitution may not be available if building from tarball #2243 (heuermh)
- [ADAM-2228] Remove Jenkins build status badge #2240 (heuermh)
- remove 2.7 support checks #2222 (akmorrow13)
- [ADAM-2023] Implemented Duplicate Marking algorithm in Spark SQL #2045 (jonpdeaton)
- use readlink to properly source source dir #2036 (mtdeguzis)
- Don't discard unmapped reads in indel realignment #2019 (pauldwolfe)
- Refactor/mark buckets #2015 (jondeaton)
- Adding a BamLoader class to have only 1 header parse for multiple ind… #1966 (ffinfo)
- Added additional arguments to GenomicRDD.pipe() #1758 (gunjanbaid)
- Migrate bdg-formats to new adam-formats module. #1689 (heuermh)
- [ADAM-1683] Pull in Spark-BAM as a secondary loading path. #1686 (fnothaft)
- Add SortedGenomicRDD trait, refactor shuffle joins and pipe #1590 (fnothaft)
- [ADAM-1513] Strandedness for FeatureRDDs #1555 (devin-petersohn)
Closed issues:
- Github changes plugin used in release script does not use two-factor authentication #2235
- Update bdg-formats dependency version to 0.15.0 #2233
- 7 tests failing on HEAD #2231
- BUILD FAILURE - Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:java #2227
- GenomicDataset saveAsParquet incorrectly named parameter compressCodec #2224
- Add printAttributes methods for Reads, Sequences, Slices #2219
- Add default Set.empty to printAttributes key method parameter #2218
- Add Avro-friendly ctrs in rdd.variant package #2215
- Cannot resolve adam-shade-spark2_2.11 dependency #2211
Merged and closed pull requests:
- [ADAM-2235] Update github-changes-maven-plugin dependency version to 1.1 #2236 (heuermh)
- [ADAM-2233] Update bdg-formats dependency version to 0.15.0. #2234 (heuermh)
- Update maven plugin dependency versions. #2230 (heuermh)
- [ADAM-2224] Complete refactoring of compressionCodec for named parameter. #2229 (heuermh)
- [ADAM-2224] Use compressionCodec for named parameter. #2226 (heuermh)
- [ADAM-2219] Add printAttributes methods for Reads, Sequences, Slices #2223 (heuermh)
- [ADAM-2218] Add default Set.empty to printAttributes key method parameter. #2220 (heuermh)
- Rename AlignmentRecord to Alignment. #2217 (heuermh)
- [ADAM-2215] Add Avro-friendly ctrs to rdd.variant package #2216 (heuermh)
Closed issues:
- Bump bdg-formats dependency version to 0.14.0 #2208
- Bump Apache Spark dependency version to 2.4.4 #2202
- Add missing loadVariantContexts(String, ValidationStringency) method #2197
- Jenkins builds failing due to Coveralls API submission #2194
- Confirm block-gzipped (bgzf) interleaved FASTQ is supported #2193
- TransformGenotype/Variant do not support compressed VCF #2190
- Add htsjdk conversion methods to VariantContextDataset #2189
- TransformVariants is missing partition arguments #2188
- StackOverflowError when saving to BAM in adam-shell #2186
- loadFastaDna usage not obvious due to default method parameter #2183
- loadFastaDna does not seem to work #2182
- kryo buffer overflow when converting fastas from CLI to adam #1660
Merged and closed pull requests:
- [ADAM-2208] Bump bdg-formats dependency version to 0.14.0 #2209 (heuermh)
- Add FASTA in formatter for sequence datasets #2207 (heuermh)
- Remove Avro 1.8.x download step from Jenkins Scala 2.12 installation. #2206 (heuermh)
- Use qualityScores for base quality scores #2205 (heuermh)
- [ADAM-2189] Add htsjdk conversion methods to VariantContextDataset #2204 (heuermh)
- [ADAM-2202] Bump Apache Spark dependency version to 2.4.4. #2203 (heuermh)
- [ADAM-2183] Drop default value for maximumLength #2201 (heuermh)
- [ADAM-2197] Add missing loadVariantContexts(String, ValidationStringency) method #2200 (heuermh)
- [ADAM-2194] Disable coveralls reporting from Jenkins test script #2196 (heuermh)
- [ADAM-2188] Add partition cli args to TransformVariants,Features. #2192 (heuermh)
- Bump htsjdk dependency version to 2.19.0 #2184 (heuermh)
- Update required Maven version in docs #2181 (heuermh)
Closed issues:
- Bump bdg-formats dependency version to 0.13.0 #2177
- Rename reads to alignments in methods where appropriate #2172
- Add command line option re: creating references from FASTA sources #2168
- Add command line support for loading references in TransformFeatures #2167
- Add load methods for data frames #2159
- Transform VCF to adam file not found exception. #2076
- NoClassDefFoundError: javax/tools/ToolProvider on openjdk 10.0.2 #2030
- NotSerializableException: com.netflix.servo.monitor.LongGauge #1952
- Should NucleotideContigFragmentRDD create sequence dictionary on load? #1894
- converting fasta to adam eats a huge ammount of time and memory #1891
- Support minPartitions parameter across load calls #1792
- make reading fasta less memory hungry #1458
- Improve unit test coverage for NucleotideContigFragmentRDD #1413
- Support for INSDC Sequence records (i.e., Genbank/EMBL format)? #1219
Merged and closed pull requests:
- [ADAM-2177] Bump bdg-formats dependency version to 0.13.0 #2178 (heuermh)
- [ADAM-2172] Rename reads to alignments in methods where appropriate #2176 (heuermh)
- [ADAM-1891] Reimplement FASTA sequence and slice converters for performance #2175 (heuermh)
- [ADAM-2168] Add command line option re: creating references from FASTA sources #2170 (heuermh)
- [ADAM-2167] Add command line support for loading references in TransformFeatures #2169 (heuermh)
- bump adam-python version #2165 (akmorrow13)
- Convert fragment dataset to alignment dataset directly #2162 (heuermh)
- [ADAM-2159] Add load methods for data frames #2158 (heuermh)
- Post 0.27.0 release cleanup and doc fixes. #2155 (heuermh)
- Add direct conversion from DatasetBoundFragmentRDD to DatasetBoundAli… #2016 (henrydavidge)
- Add ADAMContext APIs to create genomic RDDs from dataframes #2000 (henrydavidge)
- Adding ReadRDD, SequenceRDD, and SliceRDD. #1895 (heuermh)
Closed issues:
- Add Scala 2.12 artifacts to release script #2153
- Tried to access method org.apache.avro.specific.SpecificData.()V from class ProcessingStep #2151
- Update maven-jar-plugin dependency version to 3.1.2 #2147
- Homebrew and Bioconda packages fail against Spark 2.4.2 #2146
- Add Spark 2.4.3 and Scala 2.12 to Jenkins build #2145
- Can encounter empty reduce when BAM header fails validation #2143
- Build failing in jenkins from Spark 2.2.3 #2139
- Make SamRecordConverter public #2138
- python API does not match API #2127
- Error when run : mvn install #2123
- Always use Spark SQL in GenomicDataset read path #2114
- Update bdg-utils dependency version to 0.2.14 #2106
- NoSuchMethodError: org.apache.parquet.column.ParquetProperties.getAllocator()Lorg/apache/parquet/bytes/ByteBufferAllocator #2098
- ClassNotFoundException: org.apache.avro.message.BinaryMessageEncoder #2091
- Release script needs to touch Version in R DESCRIPTION file #2089
- org.apache.avro.SchemaParseException: Can't redefine: list #2058
- Support Spark 2.4 and Scala 2.12 #2044
- Fail early when output directory already exists #2034
- NoClassDefFoundError o.a.parquet.hadoop.metadata.CompressionCodecName #1742
- Log with parameterized messages consistently for performance #1712
Merged and closed pull requests:
- [ADAM-2153] Add Scala 2.12 artifacts to release script #2154 (heuermh)
- [ADAM-2089] Bump Version in R DESCRIPTION file #2152 (heuermh)
- [ADAM-2145] Add Spark 2.4.3 and Scala 2.12 to Jenkins build #2149 (heuermh)
- [ADAM-2147] Update maven-jar-plugin dependency version to 3.1.2. #2148 (heuermh)
- [ADAM-2143] Use fold instead of reduce when loading SAM/BAM/CRAM headers #2144 (fnothaft)
- Remove parquet-scala dependency from dependencyManagement. #2142 (heuermh)
- [ADAM-2139] Update Spark version to 2.3.3 for Jenkins test #2141 (heuermh)
- [ADAM-1712] Replace utils.Logger with grizzled.slf4j.Logger #2136 (heuermh)
- [ADAM-2034] Check output path is writeable before running transformations #2135 (heuermh)
- jenkins scripts deletes conda envs #2133 (akmorrow13)
- Update htsjdk dependency version to 2.18.2 #2132 (heuermh)
- [ADAM-2127] Update python doc per GenomicRdd --> GenomicDataset change #2128 (heuermh)
- Update python and R versions. #2126 (heuermh)
- use parquet-scala_2.11 fork #2108 (ryan-williams)
- [ADAM-2106] Update bdg-utils dependency version to 0.2.14 #2107 (heuermh)
- [ADAM-2044] Update Spark version to 2.4.3, add move to Scala 2.12 script #2056 (heuermh)
Closed issues:
- Bump Spark dependency to version 2.3.3 #2120
- Update Spark version on Jenkins to 2.2.3 #2115
- Inverted duplicates are not found in mark duplicates #2102
- Py4JError: org.bdgenomics.adam.algorithms.consensus.ConsensusGenerator.fromKnowns does not exist in the JVM #2099
- Update Bioconda recipe for ADAM 0.25.0 #2088
- Update Homebrew formula for ADAM 0.25.0 #2087
- Error: Dependency package(s) 'SparkR' not available #2086
- Java-friendly indel realignment method doesn't allow passing reference #2013
- Use consistent (Scala-specific) (Java-specific) qualifiers in method scaladoc #1986
- Clarify GenomicRDD vs. GenomicDataset name #1954
- Support validation stringency in out formatters #1949
- Compute coverage by sample #1498
Merged and closed pull requests:
- Bump bdg-formats dependency to version 0.12.0. #2124 (heuermh)
- [ADAM-2120] Bump Spark dependency to version 2.3.3. #2121 (heuermh)
- Filter supplemental reads from scoring #2119 (pauldwolfe)
- [ADAM-2115] Update Spark version on Jenkins to 2.2.3. #2118 (heuermh)
- Refactor AlignmentRecord, RecordGroup, and ProcessingStep #2113 (heuermh)
- removed anaconda requirement for venv during jenkins test #2109 (akmorrow13)
- Propagate read negative flag to SAM records for unmapped reads #2105 (henrydavidge)
- Add consensus targets to realignment targets #2104 (pauldwolfe)
- [ADAM-2099] Add python realignIndelsFromKnownIndels method #2103 (heuermh)
- [ADAM-2102] Inverted duplicates are not found in mark duplicates #2101 (pauldwolfe)
- Rename contig to reference #2100 (heuermh)
- [ADAM-1986] Add java-specific methods where missing. #2097 (heuermh)
- [ADAM-2013] Add java-friendly indel realignment method that accepts reference. #2095 (heuermh)
- Use build-helper-maven-plugin for build timestamp #2093 (heuermh)
- bump adam-python version to 0.25.0a0 #2092 (akmorrow13)
- [ADAM-2085] Update R installation docs re: libgit2 and SparkR. #2090 (heuermh)
- [ADAM-1954] Complete refactoring GenomicRDD to GenomicDataset. #1981 (heuermh)
- [ADAM-1949] Support validation stringency in out formatters. #1969 (heuermh)
Closed issues:
- Expand illumina metadata regex to include "N" character #2079
- Remove support for Hadoop 2.6 #2073
- NumberFormatException: For input string: "nan" in VCF #2068
- Support Spark 2.3.2 #2062
- Arrays should be passed to HTSJDK in the JVM primitive type #2059
- toCoverage() function for alignments does not distinguish samples #2049
- Building from adam-core module directory fails to generate Scala code for sql package #2047
- Data Sets #2043
- saveAsBed writes missing score values as '.' instead of '0' #2039
- Fix GFF3 parser to handle trailing FASTA #2037
- Add StorageLevel as an optional parameter to loadPairedFastq #2032
- Error: File name too long when building on encrypted file system #2031
- Fail to transform a VCF file containing multiple genome data (Muliple sample) #2029
- Dataset and RDD constructors are missing from CoverageRDD #2027
- How to create a single RDD[Genotype] object out of multiple VCF files? #2025
- ReadTheDocs github banner is broken #2020
- -realign_indels throws serialization error with instrumentation enabled #2007
- Support 0 length FASTQ reads #2006
- Speed of Reading into ADAM RDDs from S3 #2003
- Support Python 3 #1999
- Unordered list of region join types in doc is missing nested levels #1997
- Add VariantContextRDD.saveAsPartitionedParquet, ADAMContext.loadPartitionedParquetVariantContexts #1996
- VCF annotation question #1994
- Fastq reader clips long reads at 10,000 bp #1992
- adam-submit Error: Number of executors must be a positive number on EMR 5.13.0/Spark 2.3.0 #1991
- Test against Spark 2.3.1, Parquet 1.8.3 #1989
- END does not get set when writing a gVCF #1988
- Support saving single files to filesystems that don't implement getScheme #1984
- Add additional filter by convenience methods #1978
- Limiting FragmentRDD pipe paralellism #1977
- Consider javadoc.io for API documentation linking #1976
- FASTQ Reader leaks connections #1974
- Update bioconda recipe for version 0.24.0 #1971
- Update homebrew formula at brewsci/homebrew-bio for version 0.24.0 #1970
- loadPartitionedParquetAlignments fails with Reference.all #1967
- Caused by: java.lang.VerifyError: class com.fasterxml.jackson.module.scala.ser.ScalaIteratorSerializer overrides final method withResolved #1953
- FASTQ input format needs to support index sequences #1697
- Changelog must be edited and committed manually during release process #936
Merged and closed pull requests:
- added pyspark mock modules for API documentation #2084 (akmorrow13)
- Added mock python modules for API python documentation #2082 (akmorrow13)
- [ADAM-2079] Expand illumina metadata regex to include "N" character #2081 (pauldwolfe)
- ADAM-2079 Added "N" to regexs for illumina metadata #2080 (pauldwolfe)
- Update docs with new template and documentation #2078 (akmorrow13)
- [ADAM-1992] Make maximum FASTQ read length configurable. #2077 (heuermh)
- [ADAM-2059] Properly pass back primitive typed arrays to HTSJDK. #2075 (heuermh)
- Update dependency versions, including htsjdk to 2.16.1 and guava to 27.0-jre #2072 (heuermh)
- [ADAM-1999] Support Python 3 #2070 (akmorrow13)
- [ADAM-2068] Prevent NumberFormatException for nan vs NaN in VCF files. #2069 (heuermh)
- Update python MAKE file #2067 (Georgehe4)
- Update python MAKE file #2066 (Georgehe4)
- Update jenkins script to test python 3.6 #2060 (Georgehe4)
- [ADAM-2062] Update Spark version to 2.3.2 #2055 (heuermh)
- Clean up fields and doc in fragment. #2054 (heuermh)
- [ADAM-2037] Support GFF3 files containing FASTA formatted sequences. #2053 (heuermh)
- modified CoverageRDD and FeatureRDD to extend MultisampleGenomicDataset #2051 (akmorrow13)
- Multi-sample coverage #2050 (akmorrow13)
- [ADAM-2047] Use source directory relative to project.basedir for adam codegen. #2048 (heuermh)
- [ADAM-2039] Adding support for writing BED format per UCSC definition #2042 (heuermh)
- Update Jenkins Spark version to 2.2.2 #2035 (akmorrow13)
- [ADAM-2032] Add StorageLevel as an optional parameter to loadPairedFastq #2033 (heuermh)
- [ADAM-2027] Add RDD and Dataset constructors to CoverageRDD. #2028 (heuermh)
- Allow for export of query name sorted SAM files #2026 (karenfeng)
- [ADAM-2020] Fix ReadTheDocs Github banner. #2021 (fnothaft)
- [ADAM-1988] Add copyVariantEndToAttribute method to support gVCF END attribute … #2017 (heuermh)
- [ADAM-936] Use github-changes-maven-plugin to update CHANGES.md. #2014 (heuermh)
- [ADAM-1992] Make maximum FASTQ read length configurable. #2011 (fnothaft)
- [ADAM-1697] Expand Illumina metadata regex to cover interleaved index sequences. #2010 (heuermh)
- [ADAM-2007] Make IndelRealignmentTarget implement Serializable. #2009 (fnothaft)
- [ADAM-2006] Support loading 0-length reads as FASTQ. #2008 (fnothaft)
- [ADAM-1697] Expand Illumina metadata regex to cover index sequences #2004 (pauldwolfe)
- [ADAM-1996] Load and save VariantContexts as partitioned Parquet. #2001 (heuermh)
- [ADAM-1997] Nest list of region join types in joins doc. #1998 (heuermh)
- [ADAM-1877] Add filterToReferenceName(s) to SequenceDictionary. #1995 (heuermh)
- [ADAM-1984] Support file systems that don't set the scheme. #1985 (fnothaft)
- [ADAM-1978] Add additional filter by convenience methods. #1983 (heuermh)
- Adding printAttribute methods for alignment records, features, and samples. #1982 (heuermh)
- Fix partitioning code to use Long instead of Int #1980 (fnothaft)
- [ADAM-1976] Adding core API documentation link and badge. #1979 (heuermh)
- [ADAM-1974] Close unclosed stream in FastqInputFormat. #1975 (fnothaft)
- Set defaults to schemas #1972 (ffinfo)
- Add loadPairedFastqAsFragments method. #1866 (heuermh)
- Adding loadPairedFastqAsFragments method #1828 (ffinfo)
Closed issues:
- Phred values from 156–254 do not round trip properly between log space #1964
- Support VCF lines with positions at 0 #1959
- Don't initialize non-ref values to Int.MinValue #1957
- Support downsampling in recalibration #1955
- Cannot waive validation stringency for INFO Number=.,Type=Flag fields #1939
- Clip phred scores below Int.MaxValue #1934
- ADAMContext.getFsAndFilesWithFilter should throw exception if paths null or empty #1932
- Bump to Spark 2.3.0 #1931
- util.FileExtensions should be public for use downstream in Cannoli #1927
- Reduce logging level for ADAMKryoRegistrator #1925
- Revisit performance implications of commit 1eed8e8 #1923
- add akmorrow13 to PyPl for bdgenomics.adam #1919
- Read the Docs build failing with TypeError: super() argument 1 must be type, not None #1917
- Bump Hadoop-BAM dependency to 7.9.2. #1915
- cannot run pyadam from adam distribution 0.23.0 #1914
- adam2fasta/q are missing asSingleFile, disableFastConcat #1912
- Pipe API doesn't properly handle multiple arguments and spaces #1909
- Bump to HTSJDK 2.13.2 #1907
- S3A error: HTTP request: Timeout waiting for connection from pool #1906
- InputStream passed to VCFHeaderReader does not get closed #1900
- Support INFO fields set to missing #1898
- CLI to transfer between cloud storage and HDFS #1896
- Jenkins does not run python or R tests #1889
- pyadam throws application option error #1886
- ReferenceRegion in python does not exist #1884
- Caching GenomicRDD in pyspark #1883
- adam-submit aborts if ADAM_HOME is set #1882
- Allow piped commands to timeout #1875
- loadVcf does not dedupe sample ID #1874
- Add coverage command for reporting read coverage #1873
- Only python 2? #1871
- Support VariantContextRDD from SQL #1867
- Cannot find
find-adam-assembly.sh
in bioconda build #1862 _jvm.java.lang.Class.forName
does not work for certain configurations #1858- Formatting error in CHANGES.md #1857
- Various improvements to readthedocs documentation #1853
- add filterByOverlappingRegion(query: ReferenceRegion) to R and python APIs #1852
- Support adding VCF header lines from Python #1840
- Support loadIndexedBam from Python #1836
- Add link to awesome list of applications that extend ADAM #1832
- loadIndexed bam lazily throws Exception if index does not exist #1830
- OAuth credentials for Github in Coveralls configuration are no longer valid #1829
- base counts per position #1825
- Issues loading BAM files in Google FS #1816
- Error when writing a vcf file to Parquet #1810
- transformAlignments cannot repartition files #1808
- GenotypeRDD should support
toVariants
method #1806 - Add support for python and R in Homebrew formula #1796
- Add
transformVariantContexts
or similar to cli #1793 - Issue while using Sorting option #1791
- Issue with adam2vcf #1787
- Remove explicit
<compile>
scopes from submodule POMs #1786 - java.nio.file.ProviderNotFoundException (Provider "s3" not found) #1732
- Accessing GenomicRDD join functions in python #1728
- ArrayIndexOutOfBoundsException in PhredUtils$.phredToSuccessProbability #1714
- Add ability to specify region bounds to pipe command #1707
- Unable to run pyadam, SQLException: Failed to start database 'metastore_db' #1666
- SAMFormatException: Unrecognized tag type: ^@ #1657
- IndexOutOfBoundsException in BAMInputFormat.getSplits #1656
- overlaps considers that Strand.FORWARD cannot overlap with Strand.INDEPENDENT #1650
- migration converters #1629
- RFC: Removing Spark 1.x, Scala 2.10 support in 0.24.0 release #1597
- Eliminate unused ConcreteADAMRDDFunctions class #1580
- Add set theory/statistics packages to ADAM #1533
- Evaluate Apache Carbondata INDEXED column store file format for genomics #1527
- Stranded vs unstranded in getReferenceRegions() for features #1513
- Question:How to tranform a line of sam to AlignmentRecord? #1425
- Excessive compilation warnings about multiple scala libraries #695
- Support Hive-style partitioning #651
Merged and closed pull requests:
- [ADAM-1964] Lower point where phred conversions are done using log code. #1965 (fnothaft)
- Add utility methods for adam-shell. #1958 (heuermh)
- [ADAM-1955] Add support for downsampling during recalibration table generation #1963 (fnothaft)
- [ADAM-1957] Don't initialize missing likelihoods to MinValue. #1961 (fnothaft)
- [ADAM-1959] Support VCF rows at position 0. #1960 (fnothaft)
- [ADAM-651] Implement Hive-style partitioning by genomic range of Parquet backed datasets #1948 (fnothaft)
- [ADAM-1914] Python profile needs to be specified for egg to be in distribution. #1946 (fnothaft)
- [ADAM-1917] Delete dependency on fulltoc. #1944 (fnothaft)
- [ADAM-1917] Try 3: fix Sphinx fulltoc. #1943 (fnothaft)
- [ADAM-1917] Set Sphinx version in requirements.txt. #1942 (fnothaft)
- [ADAM-1917] Set minimal Sphinx version for Readthedocs build. #1941 (fnothaft)
- [ADAM-1939] Allow validation stringency to waive off FLAG arrays. #1940 (fnothaft)
- [ADAM-1915] Bump to Hadoop-BAM 7.9.2. #1938 (fnothaft)
- [ADAM-1934] Clip phred values to 3233, instead of Int.MaxValue. #1936 (fnothaft)
- Ignore VCF INFO fields with number=G when stringency=LENIENT #1935 (jpdna)
- [ADAM-1931] Bump to Spark 2.3.0. #1933 (fnothaft)
- [ADAM-1840] Support adding VCF header lines from Python. #1930 (fnothaft)
- [ADAM-1927] Increase visibility for util.FileExtensions for use downstream. #1929 (heuermh)
- [ADAM-1925] Reduce logging level for ADAMKryoRegistrator. #1928 (heuermh)
- [ADAM-1923] Revert 1eed8e8 #1926 (fnothaft)
- Use SparkFiles.getRootDirectory in local mode. #1924 (heuermh)
- [ADAM-651] Implement Hive-style partitioning by genomic range of Parquet backed datasets #1922 (jpdna)
- Make Spark SQL APIs supported across all types #1921 (fnothaft)
- [ADAM-1909] Refactor pipe cmd parameter from String to Seq[String]. #1920 (heuermh)
- Add Google Cloud documentation #1918 (Georgehe4)
- [ADAM-1917] Load sphinxcontrib.fulltoc with imp.load_sources. #1916 (akmorrow13)
- [ADAM-1912] Add asSingleFile, disableFastConcat to adam2fasta/q. #1913 (heuermh)
- [ADAM-651] Hive-style partitioning of parquet files by genomic position #1911 (jpdna)
- Minor unit test/style fixes. #1910 (heuermh)
- [ADAM-1907] Bump to HTSJDK 2.13.2. #1908 (fnothaft)
- [ADAM-1882] Don't abort adam-submit if ADAM_HOME is set. #1905 (fnothaft)
- [ADAM-1806] Add toVariants conversion from GenotypeRDD. #1904 (fnothaft)
- [ADAM-1882] Return true if ADAM_HOME is set, not exit 0. #1903 (heuermh)
- [ADAM-1900] Close stream after reading VCF header. #1901 (fnothaft)
- [ADAM-1898] Support converting INFO fields set to empty ('.'). #1899 (fnothaft)
- Add Kryo registration for two classes required for Spark 2.3.0. #1897 (jpdna)
- [ADAM-1853] Various improvements to readthedocs documentation. #1893 (heuermh)
- [ADAM-1889][ADAM-1884] updated ReferenceRegion in python #1892 (akmorrow13)
- [ADAM-1889] Run R/Python tests. #1890 (fnothaft)
- [ADAM-1886] fix for pyadam to recognize >1 egg file #1887 (akmorrow13)
- [ADAM-1883] Python and R caching #1885 (akmorrow13)
- [ADAM-1875] Add ability to timeout a piped command. #1881 (fnothaft)
- [ADAM-1871] Fix print call that broke python 3 support. #1880 (fnothaft)
- [ADAM-1832] Use awesome list style and link to bigdatagenomics/awesome-adam. #1879 (heuermh)
- [ADAM-651] Hive-style partitioning of parquet files by genomic position #1878 (jpdna)
- [ADAM-1874] Dedupe samples when loading VCFs. #1876 (fnothaft)
- Fixes Coverage python API and adds tests #1870 (akmorrow13)
- added filterByOverlappingRegion for python #1869 (akmorrow13)
- Add command line option for populating nested variant.annotation field in Genotype records. #1865 (heuermh)
- Hive partitioned(v4) rebased #1864 (jpdna)
- [ADAM-1597] Move to Scala 2.11 and Spark 2.x. #1861 (heuermh)
- [ADAM-1857] Fix formatting error due to forward slashes. #1860 (heuermh)
- [ADAM-1858] Use getattr instead of Class.forName from python API. #1859 (fnothaft)
- [ADAM-1836] Adds loadIndexedBam API to Python and Java. #1837 (fnothaft)
- Added check for bam index files in loadIndexedBam #1831 (akmorrow13)
- [ADAM-1793] Adding vcf2adam and adam2vcf that handle separate variant and genotype data. #1794 (heuermh)
- added adam notebook #1778 (akmorrow13)
- [ADAM-1666] SQLContext creation fix for Spark 2.x #1777 (akmorrow13)
- Add optional accumulator for VCF header lines to VCFOutFormatter. #1727 (heuermh)
- add hive style partitioning for contigName #1620 (jpdna)
- Add loadReadsFromSamString function into ADAMContext #1434 (xubo245)
Closed issues:
- Readthedocs build error #1854
- Add pip release to release scripts #1847
- Publish scaladoc script still attempts to build markdown docs #1845
- Allow variant annotations to be loaded into genotypes #1838
- Specify correct extensions for SAM/BAM output #1834
- Fix link anchors and other issues in readthedocs #1822
- Sphinx fulltoc is not included #1821
- Readme link to bigdatagenomics/lime 404s #1819
- Bump to Hadoop-BAM 7.9.1 #1817
- LoadVariants Header Format #1815
- Right and Left Outer Shuffle Region Join don't match #1813
- Pipe command can fail with empty partitions #1807
- adam files with outdated formats throw FileNotFoundException #1804
- Move GenomicRDD.writeTextRDD outside of GenomicRDD #1803
- find-adam-assembly fails to recognize more than 1 jar #1801
- tests/testthat.R failed on git head #1799
- Run python and R tests conditionally in build #1795
- scala-lang should be a provided dependency #1789
- loadIndexedBam does an unnecessary union #1784
- Release bdgenomics.adam R package on CRAN #1783
- Issue with transformVariant // Adam to vcf #1782
- Add code of conduct #1779
- Reinstantiation of SQLContext in pyadam ADAMContext #1774
- Genotypes should only contain the core variant fields #1770
- Add SingleFASTQInFormatter #1768
- INDEL realigner can emit negative partition IDs #1763
- Request for a new release #1762
- INDEL realigner generates targets for reads with more than 1 INDEL #1753
- Fragment Issue #1752
- Variant Caller!!! #1751
- Spark Version!! #1750
- ReferenceRegion.subtract eliminating valid regions #1747
- New Shuffle Join Implementation - Left Outer + Group By Left #1745
- command failure after build success #1744
- Recalibrate_base_Qualities #1743
- Standardize regionFn for ShuffleJoin returned objects #1740
- Shuffle, Broadcast Joins with threshold #1739
- Adam on Spark 2.1 #1738
- Opening up permission on GenericGenomicRDD constructor #1735
- Consistency on ShuffleRegionJoin returns #1734
- vcf2adam support #1731
- Cloud-scale BWA MEM #1730
- Aligned Human Genome couldn't convert to Adam #1729
- Mark Duplicates #1726
- Genomics Pipeline #1724
- .fastq Alignment #1723
- Is it correct Adam file #1720
- .fastQ to .adam #1718
- Unable to create .adam from .sam #1717
- Add adam- prefix to distribution module name #1716
- Python load methods don't have ability to specify validation stringency #1715
- NPE when trying to map loadVariants over RDD #1713
- Add left normalization of INDELs as an RDD level primitive #1709
- Allow validation stringency to be set in AnySAMOutFormatter #1703
- InterleavedFastqInFormatter should sort by readInFragment #1702
- Allow silencing the # of reads in fragment warning in InterleavedFastqInFormatter #1701
- GenomicRDD.toXxx method names should be consistent #1699
- Exception thrown in VariantContextConverter.formatAllelicDepth despite SILENT validation stringency #1695
- Make GenomicRDD.toString more adam-shell friendly #1694
- Add adam-shell friendly VariantContextRDD.saveAsVcf method #1693
- change bdgenomics.adam package name for adam-python to bdg-adam #1691
- Conflict in bdg-formats dependency version due to org.hammerlab:genomic-loci #1688
- Convert and store variant quality field. #1682
- Region join shows non-determinism #1680
- Shuffle region join throws multimapped exception for unmapped reads #1679
- Push validation checks down to INFO/FORMAT fields #1676
- IndexOutOfBounds thrown when saving gVCF with no likelihoods #1673
- Generate docs from R API for distribution #1672
- Support loading a subset of VCF fields #1670
- Error with metadata: Multivalued flags are not supported for INFO lines #1669
- Include bdg.adam-0.23.0.tar.gz in distribution tarballs #1668
- Include bdgenomics.adam-0.23.0_SNAPSHOT-py2.7.egg in distribution tarball #1667
- Add SUPPORT.md file to complement CONTRIBUTING.md #1664
- Can't merge BAM files containing the same sample #1663
- Incorrect README.md kmer.scala loadAliments method parameter name #1662
- Add performance benchmarks similar to Samtools CRAM benchmarking page #1661
- Transient bad GZIP header bug when loading BGZF FASTQ #1658
- bdgenomics.adam vs bdg.adam for R/Python APIs #1655
- Need adamR script #1649
- incorrect grep for assembly jars in bin/pyadam #1647
- VariantRDD union creates multiple records for the same SNP ID #1644
- S3 access documentation #1643
- Algorithms docs formatting #1639
- Building downstream apps docs reformatting #1638
- FastqInputFormat.FILE_SPLITTABLE in conf not getting passed properly #1635
- Add benchmarks to documentation #1634
- Intro docs contain outdated/incompatible code #1633
- Intro docs missing a number of active projects #1632
- Installation instructions for Homebrew missing from documentation #1631
- Architecture section is missing from docs #1630
- Seq vs. Seq with javac #1625
- ProcessingStep missing from adam-codegen #1623
- Add ADAM recipe to bioconda #1618
- adam-submit cannot find assembly jar if installed as symlink #1616
- Expose transform/transmute in Java/Python/R #1615
- Expose VariantContextRDD in R/Python #1614
- Expose pipe API from Python/R #1611
- Serialization issue with TwoBitFile #1610
- Snapshot Distribution Does not include jar files #1607
- ManualRegionPartitioner is broken for ParallelFileMerger codepath #1602
- VariantRDD doesn't save partition map #1601
- Scala copy method not supported in abstract classes such as AlignmentRecordRDD #1599
- Interleaved FASTQ recognizes only /1 suffix pattern #1589
- Use empty sequence dictionary when loading features #1588
- New Illumina FASTQ spec adds metadata to read name line #1585
- first run of ADAM #1582
- Add unit test coverage for BED12 parser and writer #1579
- Spark 1.x Scala 2.10 snapshot artifacts missing since 31 March 2017 #1578
- Unable to save GenomicRDDs after a join. #1576
- Add filterBySequenceDictionary to GenomicRDD #1575
- Unaligned Trait does nothing #1573
- Bump to bdg-formats 0.11.1 #1570
- PhredUtils conversion to log probabilities has insufficient resolution for PLs #1569
- Reference model import code is borked #1568
- SequenceDictionary vs Feature[RDD] of reference length features #1567
- giab-NA12878 truth_small_variants.vcf.gz header issues #1566
- VCF header read from stream ignored in VCFOutFormatter #1564
- VCF genotype Number=A attribute throws ArrayIndexOutOfBoundsException #1562
- Save compressed single file VCF via HadoopBAM #1554
- bucketing strategy #1553
- Is parquet using delta encoding for positions? #1552
- Export to VCF does not include symbolic non-ref if site has a called alt #1551
- Refactor filterByOverlappingRegions not to require a List #1549
- Move docs to Sphinx/pure Markdown #1548
- java.lang.IncompatibleClassChangeError: Implementing class #1544
- Support locus predicate in
TransformAlignments
#1539 - Visibility from Java, jrdd has private access in AvroGenomicRDD #1538
- Rename o.b.adam.apis.java package to o.b.adam.api.java #1537
- VCF header genotype reserved key FT cardinality clobbered by htsjdk #1535
- Compute a SequenceDictionary from a *.genome file #1534
- Queryname sorted check should check for queryname grouped as well #1530
- Bump to bdg-formats 0.11.0 #1520
- Move to Spark 2.2, Parquet 1.8.2 #1517
- Minor refactor for TreeRegionJoin for consistency #1514
- Allow +Inf and -Inf Float values when reading VCF #1512
- SparkFiles temp directory path should be accessible as a variable #1510
- SparkFiles.get expects just the filename #1509
- Split apart #1324 #1507
- Where can I find "Phred-scaled quality score" (QUAL)? #1506
- Alignment Record sort is not consistent with samtools #1504
- Sequence dictionary records in TwoBitFile are not stable #1502
- Move coverage counter over to Dataset API #1501
- Allow users to set the minimum partition count across all load methods #1500
- Enable reuse of broadcast object across broadcast region joins #1499
- Take union across genomic RDDs #1497
- Adam files created by vcf2adam is not recognizable #1496
- Scalatest log output disappears with Maven 3.5.0 #1495
- ArrayOutOfBoundsException in vcf2adam (spark2_2.11-0.22.0) on UK10K VCFs (VCFv4.1) #1494
- ReferenceRegion overlaps and covers returns false if overlap is 1 #1492
- Provide asSingleFile parameter for saveAsFastq and related #1490
- Min Phred score gets bumped by 33 twice in BQSR #1488
- Should throw error when BAM header load fails #1486
- Default value for reads.toCoverage(collapse) should be false #1483
- Refactor ADAMContext loadXxx methods for consistency #1481
- loadGenotypes three time #1480
- Fall back to sequential concat when HDFS concat fails #1478
- VCF line with
.
ALT gets dropped #1476 - ADAM works on Cloudera but does NOT work on MAPR #1475
- Clean up ReferenceRegion.scala #1474
- Allow joins on regions that are within a threshold (instead of requiring overlap) #1473
- FeatureRDD.toCoverage throws NullPointerException when there is no coverage information #1471
- Add quality score binner #1462
- Splittable compression and FASTQ #1457
- Don't convert .{different-type}.adam in loadAlignments and loadFragments #1456
- New primitives for adam-core #1454
- Port over code for populating SequenceDictionaries from .dict files #1449
- Ignore failed push to Coveralls during CI builds #1444
- No asSingleFile parameter for saveAsFasta in NucleotideContigFragmentRDD #1438
- shufflejoin and ArrayIndexOutOfBoundsException #1436
- Document using ADAM snapshot #1432
- Improve metrics coverage across ADAMContext load methods #1428
- loadReferenceFile missing from Java API #1421
- loadCoverage missing from Java API #1420
- Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecordRDD]? #1419
- Clean up possibly unused methods in Projection #1417
- Problem loading SNPeff annotated VCF #1390
- RecordGroupDictionary should support
isEmpty
#1380 - Get rid of mutable collection transformations in ShuffleRegionJoin #1379
- Add tab5/6 as native output format for AlignmentRecordRDD #1377
- ValidationStringency in MDTagging should apply to reads on unknown references #1365
- Assembly final name doesn't include spark2 for Spark 2.x builds #1361
- Merge reads2fragments and fragments2reads into a single CLI #1359
- Investigate failures to load ExAC.0.3.GRCh38.vcf variants #1351
- adam-shell does not allow additional jars via Spark jars argument #1349
- Loading GZipped VCF returns an empty RDD #1333
- Bump Spark 2 build to Spark 2.1.0 #1330
- Rename Transform command TransformAlignments or similar #1328
- Replace ADAM2Vcf and Vcf2ADAM commands with TransformGenotypes and TransformVariants #1327
- FeatureRDD instantiation tries to cache the RDD #1321
- Repository for Pipe API wrappers for bioinformatics tools #1314
- Trying to get Spark pipeline working with slightly out of date code. #1313
- Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) #1312
- Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311
- Don't include log4j.properties in published JAR #1300
- Removing ProgramRecords info when saving data to sam/bam? #1257
- ADAM on Slurm/LSF #1229
- Maintaining sorted/partitioned knowledge #1216
- Evaluate bdg-convert external conversion library proposal #1197
- Port AMPCamp Tutorial over #1174
- Top level WrappedRDD or similar abstraction #1173
- GFF3 formatted features written as single file must include gff-version pragma #1169
- Can probably eliminate sort in RealignIndels #1137
- Load SV type info field - need for allele uniquness #1134
- BroadcastRegionJoin is not a broadcast join #1110
- AlignmentRecordRDD does not extend GenomicRDD per javac #1092
- Add generic ReferenceRegion pushdown for parquet files #1047
- Use of dataset api in ADAM #1018
- Difference running markdups with and without projection #1014
- ADAM to BAM conversion fails using relative path #1012
- Refactor SequenceDictionary to use Contig instead of SequenceRecord #997
- NoSuchMethodError due to kryo minor-version mismatch #955
- Autogen field names in projection package #941
- Future of schemas in bdg-formats #925
- genotypeType for genotypes with multiple OtherAlt alleles? #897
- How to filter genotype RDD with FeatureRDD #890
- How to convert genotype DataFrame to VariantContext DataFrame / RDD #886
- R language package for Adam #882
- How to count genotypes with a 10 node Spark/Adam cluster faster than with BCFTools on a single machine? #879
- Ensure Java API is up-to-date with Scala API #855
- BroadcastRegionJoin fails with unmapped reads #821
- Resolve Fragment vs. SingleReadBucket #789
- Updating/Publishing the docs/ directory #774
- Next on empty iterator in BroadcastRegionJoin #661
- Cleanup code smell in sort work balancing code #635
- Provide low-impact alternative to
transform -repartition
for reducing partition size #594 - Create an ADAM Python API #538
- Migrate serialization libraries out of ADAM core #448
- Create standardized, interpretable exceptions for error reporting #420
- Build info/version info inside ADAM-generated files #188
Merged and closed pull requests:
- [ADAM-1854] Add requirements.txt file for RTD. #1856 (fnothaft)
- [ADAM-1783] Resolve check issues that block pushing to CRAN. #1849 (fnothaft)
- [ADAM-1847] Update ADAM scripts to support self-contained pip install. #1848 (fnothaft)
- [ADAM-1845] Only build and publish scaladocs in publish-scaladoc.sh. #1846 (heuermh)
- [ADAM-1843] Install sources before calling scala:doc in publish scaladoc #1844 (fnothaft)
- Remove python and R profiles from release script #1842 (heuermh)
- [ADAM-1817] Bump to Hadoop-BAM 7.9.1. #1841 (fnothaft)
- [ADAM-1838] Make populating variant.annotation field in Genotype configurable #1839 (fnothaft)
- [ADAM-1834] Add proper extensions for SAM/BAM/CRAM output formats. #1835 (fnothaft)
- [ADAM-1822] Misc docs cleanup #1827 (fnothaft)
- Added missing init.py for fulltoc. #1824 (fnothaft)
- [ADAM-1821] Add missing fulltoc for Sphinx documentation. #1823 (fnothaft)
- Fix link to documentation #1820 (nzachow)
- [ADAM-1634] Add algorithm benchmarks to documentation. #1818 (fnothaft)
- [ADAM-1813] Delegate right outer shuffle region join to left OSRJ implementation. #1814 (fnothaft)
- [ADAM-1807] Check for empty partition when running a piped command. #1812 (fnothaft)
- [ADAM-1803] Refactor GenomicRDD.writeTextRdd to util.TextRddWriter. #1809 (heuermh)
- Added Filter error when file loaded does not match schema #1805 (akmorrow13)
- changed num_jars count #1802 (akmorrow13)
- [ADAM-1795] Map -DskipTests=true to exec.skip for Python and R tests. #1800 (heuermh)
- [ADAM-1672] Use working directory for R devtools::document(). #1798 (heuermh)
- [ADAM-1789] Move scala-lang to provided scope. #1790 (fnothaft)
- [ADAM-1784] loadIndexedBam should pass the raw globbed path to Hadoop-BAM #1785 (fnothaft)
- [ADAM-1664] Add SUPPORT.md file to complement CONTRIBUTING.md. #1781 (heuermh)
- [ADAM-1779] Adding code of contact adapted from the Contributor Convenant, version 1.4. #1780 (heuermh)
- [ADAM-1661] Add file storage benchmarks. #1772 (fnothaft)
- [ADAM-1770] Genotype should only store core variant fields. #1771 (fnothaft)
- [ADAM-1768] Add InFormatter for unpaired FASTQ. #1769 (fnothaft)
- [ADAM-1643] Add S3 access documentation. #1767 (fnothaft)
- [ADAM-1763] Apply absolute value to destination partition in ModPartitioner #1766 (fnothaft)
- Add R and Python into distribution artifacts #1765 (fnothaft)
- [ADAM-1655] Move R package to bdgenomics.adam. #1764 (fnothaft)
- [ADAM-1753] Only emit realignment targets for reads containing a single INDEL #1756 (fnothaft)
- [ADAM-1715] Support validation stringency in Python/R. #1755 (fnothaft)
- [ADAM-1680] Eliminate non-determinism in the ShuffleRegionJoin. #1754 (fnothaft)
- update to _replaceRdd with tests #1749 (akmorrow13)
- [ADAM-1747] Fixed subtract bug and tests #1748 (devin-petersohn)
- [ADAM-1745] Adding LeftOuterShuffleRegionJoinAndGroupByLeft and tests #1746 (devin-petersohn)
- Enabled thresholding for joins and standardized regionFn #1741 (devin-petersohn)
- Making join return types consistent #1737 (devin-petersohn)
- Opening up permissions on GenericGenomicRDD #1736 (devin-petersohn)
- [ADAM-1716] Add adam- prefix to distribution module name. #1733 (heuermh)
- [ADAM-1695] Check for illegal genotype index after splitting multi-allelic variants. #1725 (heuermh)
- [ADAM-1517] Bump Parquet version in a manner compatible with Spark 2.2.x #1722 (fnothaft)
- [ADAM-1512] Support VCFs with +Inf/-Inf float values. #1721 (fnothaft)
- [ADAM-1709] Add ability to left normalize reads containing INDELs. #1711 (fnothaft)
- [ADAM-1691] Move bdgenomics.adam to use a namespace package. #1706 (fnothaft)
- moved bdgenomics.adam package to bdgenomics-adam #1705 (akmorrow13)
- Misc cleanup needed for bigdatagenomics/cannoli#65 #1704 (fnothaft)
- [ADAM-1699] Make GenomicRDD.toXxx method names consistent. #1700 (heuermh)
- [ADAM-1694] Add short readable descriptions for toString in subclasses of GenomicRDD. #1698 (heuermh)
- [ADAM-1693] Add adam-shell friendly VariantContextRDD.saveAsVcf method. #1696 (heuermh)
- [ADAM-1688] Add bdg-formats exclusion to org.hammerlab:genomic-loci dependency. #1690 (heuermh)
- [ADAM-1679] Unmapped items should not get caught in requirement when sorting #1687 (fnothaft)
- [ADAM-1566] Merge VCF header lines with VCFHeaderLineCount.INTEGER correctly. #1685 (heuermh)
- [ADAM-1682] Add variant quality field. #1684 (fnothaft)
- Remove adam- prefix from module directory names. #1681 (heuermh)
- Update to hadoop-bam 7.9.0 and htsjdk 2.11.0. #1678 (heuermh)
- [ADAM-1676] Add more finely grained validation for INFO/FORMAT fields. #1677 (fnothaft)
- Python API fixes for AlignmentRecordRDD #1675 (akmorrow13)
- [ADAM-1673] Don't set PL to empty when no PL is attached to a gVCF record #1674 (fnothaft)
- [ADAM-1670] Add ability to selectively project VCF fields. #1671 (fnothaft)
- [ADAM-1663] Enable read groups with repeated names when unioning. #1665 (fnothaft)
- Maint 2.11 0.18.0 #1659 (Douglas-H)
- [ADAM-1630] Overhauled docs introduction and added architecture section. #1653 (fnothaft)
- Add adamR script #1651 (fnothaft)
- [ADAM-1647] Fix bad JAR discovery grep in bin/pyadam. #1648 (fnothaft)
- [ADAM-1548] Generate reStructuredText from pandoc markdown. #1646 (fnothaft)
- Algorithms docs formatting #1645 (gunjanbaid)
- Cleaned up docs. #1642 (gunjanbaid)
- Making example code compatible with current ADAM build #1641 (devin-petersohn)
- Cleaning up formatting and spacing of docs. #1640 (devin-petersohn)
- added ExtractRegions #1637 (antonkulaga)
- [ADAM-1635] Eliminate passing FASTQ splittable status via config. #1636 (fnothaft)
- [ADAM-1614] Add VariantContextRDD to R and Python APIs. #1628 (fnothaft)
- [ADAM-1615] Add transform and transmute APIs to Java, R, and Python #1627 (fnothaft)
- [ADAM-1625] Use explicit types for header lines #1626 (heuermh)
- [ADAM-1623] Add ProcessingStep to adam-codegen. #1624 (heuermh)
- [ADAM-1607] Update distribution assembly task to attach assembly überjar #1622 (fnothaft)
- [ADAM-1490] Add asSingleFile to saveAsFastq and related. #1621 (heuermh)
- Update load method docs in Python and R. #1619 (heuermh)
- [ADAM-1616] Resolve installation directory if scripts are symlinks. #1617 (heuermh)
- [ADAM-1611] Extend pipe APIs to Java, Python, and R. #1613 (fnothaft)
- [ADAM-1610] Mark non-serializable field in TwoBitFile as transient. #1612 (fnothaft)
- [ADAM-1554] Support saving BGZF VCF output. #1608 (fnothaft)
- Adding examples of how to use joins in the real world #1605 (devin-petersohn)
- [ADAM-1599] Add explicit functions for updating GenomicRDD metadata. #1600 (fnothaft)
- [ADAM-1576] Allow translation between two different GenomicRDD types. #1598 (fnothaft)
- [ADAM-1444] Ignore failed push to Coveralls. #1595 (fnothaft)
- Testing, testing, 1... 2... 3... #1592 (fnothaft)
- [ADAM-1417] Removed unused Projection.apply method, add test for Filter. #1591 (fnothaft)
- [ADAM-1579] Add unit test coverage for BED12 format. #1587 (fnothaft)
- [ADAM-1585] Support additional Illumina FASTQ metadata. #1586 (fnothaft)
- [ADAM-1438] Add ability to save FASTA back as a single file. #1581 (fnothaft)
- Bump bdg-formats correctly to 0.11.1, not SNAPSHOT. #1577 (fnothaft)
- [ADAM-1573] Remove unused Unaligned trait. #1574 (fnothaft)
- Slurm deployment readme #1571 (jpdna)
- [ADAM-1564] Read VCF header from stream in VCFOutFormatter. #1565 (heuermh)
- [ADAM-1562] Index off by one for VCF genotype Number=A attributes. #1563 (heuermh)
- [ADAM-1533] Set Theory #1561 (devin-petersohn)
- Freebayes FORMAT=<ID=AO,Number=A attribute throws ArrayIndexOutOfBoundsException #1560 (heuermh)
- [ADAM-1551] Emit non-reference model genotype at called sites. #1559 (fnothaft)
- [ADAM-1449] Add loadSequenceDictionary to ADAM context. #1557 (heuermh)
- [ADAM-1537] Rename o.b.adam.apis.java package to o.b.adam.api.java #1556 (heuermh)
- [ADAM-1549] Make regions provided to filterByOverlappingRegions an Iterable. #1550 (fnothaft)
- [ADAM-941] Automatically generate projection enums. #1547 (fnothaft)
- [ADAM-1361] Fix misnamed ADAM überjar. #1546 (fnothaft)
- [ADAM-1257] Add program record support for alignment/fragment files. #1545 (fnothaft)
- [ADAM-1359] Merge
reads2fragments
andfragments2reads
intotransformFragments
#1543 (fnothaft) - Fix minor format mistakes (and typo) in docs #1542 (kkaneda)
- Add a simple unit test to SingleFastqInputFormat #1541 (kkaneda)
- Support locus predicate in Transform #1540 (fnothaft)
- [ADAM-1421] Add java API for
loadReferenceFile
. #1536 (fnothaft) - Refactor Vcf2ADAM and ADAM2Vcf into TransformGenotypes and TransformVariants #1532 (heuermh)
- [ADAM-1530] Support loading GO:query (S/CR/B)AMs as fragments. #1531 (fnothaft)
- [ADAM-1169] Write GFF header line pragma in single file mode. #1529 (fnothaft)
- [ADAM-1501] Compute coverage using Dataset API. #1528 (fnothaft)
- [ADAM-1497] Add union to GenomicRDD. #1526 (fnothaft)
- [ADAM-1486] Respect validation stringency if BAM header load fails. #1525 (fnothaft)
- [ADAM-1499] Enable reuse of broadcasted objects in region join. #1524 (fnothaft)
- [ADAM-1520] Bump to bdg-formats 0.11.0. #1523 (fnothaft)
- Adding fragment InFormatter for Bowtie tab5 format #1522 (heuermh)
- [ADAM-1328] Rename
Transform
toTransformAlignments
. #1521 (fnothaft) - [ADAM-1517] Move to Parquet 1.8.2 in preparation for moving to Spark 2.2.0 #1518 (fnothaft)
- Fixed minor typos in README. #1516 (gunjanbaid)
- Making TreeRegionJoin consistent with ShuffleRegionJoin #1515 (devin-petersohn)
- Resolve #1508, #1509 for Pipe API #1511 (fnothaft)
- [ADAM-1502] Preserve contig ordering in TwoBitFile sequence dictionary. #1508 (fnothaft)
- [ADAM-1483] Remove collapse parameter from AlignmentRecordRDD.toCoverage #1493 (fnothaft)
- [ADAM-1377] Adding fragment InFormatter for Bowtie tab6 format #1491 (heuermh)
- [ADAM-1488] Only increment BQSR min quality by 33 once. #1489 (fnothaft)
- [ADAM-1481] Refactor ADAMContext loadXxx methods for consistency #1487 (heuermh)
- Add quality score binner #1485 (fnothaft)
- Clean up ReferenceRegion.scala and add thresholded overlap and covers #1484 (devin-petersohn)
- [ADAM-1456] Remove .{type}.adam file extension conversions in type-guessing methods. #1482 (heuermh)
- [ADAM-1480] Add switch to disable the fast concat method. #1479 (fnothaft)
- [ADAM-1476] Treat
.
ALT allele as symbolic non-ref. #1477 (fnothaft) - Adding require for Coverage Conversion and related tests #1472 (devin-petersohn)
- Add cache argument to loadFeatures, additional Feature timers #1427 (heuermh)
- [ADAM-882] R API #1397 (fnothaft)
- [ADAM-1018] Add support for Spark SQL Datasets. #1391 (fnothaft)
- WIP Python API #1387 (fnothaft)
- [ADAM-1365] Apply validation stringency to reads on missing contigs when MD tagging #1366 (fnothaft)
- Update dependency and plugin versions #1360 (heuermh)
- [ADAM-1330] Move to Spark 2.1.0. #1332 (fnothaft)
- Efficient Joins and (re)Partitioning #1324 (devin-petersohn)
Closed issues:
- Realign all reads at target site, not just reads with no mismatches #1469
- Parallel file merger fails if the output file is smaller than the HDFS block size #1467
- Add new realigner arguments to docs #1465
- Recalibrate method misspelled as recalibateBaseQualities #1463
- FASTQ may try to split GZIPed files #1459
- Update to Hadoop-BAM 7.8.0 #1455
- Publish Markdown and Scaladoc to the interwebs #1453
- Make VariantContextConverter public #1451
- Apply method in FragmentRDD is package private #1445
- Thread pool will block inside of pipe command for streams too large to buffer #1442
- FeatureRDD.apply() does not allow addition of other parameters with defaults in the case class #1439
- Question : Why the number of paired sequence in adam-0.21.0 less than adam-0.19.0? #1424
- loadCoverage missing from Java API #1420
- Estimate contig lengths in SequenceDictionary for BED, GFF3, GTF, and NarrowPeak feature formats #1410
- loadIntervalList FeatureRDD has empty SequenceDictionary #1409
- problem using transform command #1406
- Add coveralls #1403
- INDEL realigner binary search conditional is flipped #1402
- Delete adam-scripts/R #1398
- Data missing when transfroming FASTQ to Adam #1393
- java.io.FileNotFoundException when file exists #1385
- Off-by-1 error in FASTQ InputFormat start positioning code #1383
- Set the wrong value for end for symbolic alts #1381
- RecordGroupDictionary should support
isEmpty
#1380 - Add pipe API in and out formatters for Features #1374
- Increase visibility for SupportedHeaderLines.allHeaderLines #1372
- Bits of VariantContextConverter don't get ValidationStringencied #1371
- Add Markdown docs for Pipe API #1368
- Array[Consensus] not registered #1367
- ValidationStringency in MDTagging should apply to reads on unknown references #1365
- When doing a release, the SNAPSHOT should bump by 0.1.0, not 0.0.1 #1364
- FromKnowns consensus generator fails if no reads overlap a consensus #1362
- Performance tune-up in BQSR #1358
- Increase visibility for ADAMContext.sc and/or getFs... methods #1356
- Pipe API formatters need to be public #1354
- Version 0.21.0: VariantContextConverter fails for 1000G VCF data #1353
- ConsensusModel's can't really be instantiated #1352
- Runtime conflicts in transitive versions of Guava dependency #1350
- Transcript Effects ignored if more than 1 #1347
- Remove "fork" tag from releases #1344
- Refactor isSorted boolean parameters to sorted #1341
- Loading GZipped VCF returns an empty RDD #1333
- Follow up on error messages in build scripts #1331
- Bump Spark 2 build to Spark 2.1.0 #1330
- FeatureRDD instantiation tries to cache the RDD #1321
- Load queryname sorted BAMs as Fragments #1303
- Run Duplicate Marking on Fragments #1302
- GenomicRDD.pipe may hang on failure error codes #1282
- IllegalArgumentException Wrong FS for vcf_head files on HDFS #1272
- java.io.NotSerializableException: org.bdgenomics.formats.avro.AlignmentRecord #1240
- Investigate sorted join in dataset api #1223
- Support looser validation stringency for loading some VCF Integer fields #1213
- Add new feature-overlap command to demonstrate new region joins #1194
- What should our API at the command line look like? #1178
- Split apart partition and join in ShuffleRegionJoin #1175
- Merging files should be multithreaded #1164
- File _rgdict.avro does not exist #1150
- how to collect the .adam files from Spark cluster multiple nodes and some questions about avocado #1140
- JFYI: tiny forked adam-core "0.20.0" release #1139
- Samtools (htslib) integration testing #1120
- AlignmentRecordRDD does not extend GenomicRDD per javac #1092
- Release ADAM version 0.21.0 #1088
- Difference running markdups with and without projection #1014
- ADAM to BAM conversion fails using relative path #1012
- Refactor SequenceDictionary to use Contig instead of SequenceRecord #997
- Customize adam-main cli from configuration file #918
- genotypeType for genotypes with multiple OtherAlt alleles? #897
- How to convert genotype DataFrame to VariantContext DataFrame / RDD #886
- Ensure Java API is up-to-date with Scala API #855
- Improve parallelism during FASTA output #842
- Explicitly validate user args passed to transform enhancement #841
- BroadcastRegionJoin fails with unmapped reads #821
- Resolve Fragment vs. SingleReadBucket #789
- Add profile for skipping test compilation/resolution #713
- Next on empty iterator in BroadcastRegionJoin #661
- Cleanup code smell in sort work balancing code #635
- Remove reliance on MD tags #622
- Provide low-impact alternative to
transform -repartition
for reducing partition size #594 - Clean up Rich records #577
- Create standardized, interpretable exceptions for error reporting #420
- Create ADAM Benchmarking suite #120
Merged and closed pull requests:
- [ADAM-1469] Don't filter on whether reads have mismatches during realignment #1470 (fnothaft)
- [ADAM-1467] Skip
concat
call if there is only one shard. #1468 (fnothaft) - [ADAM-1465] Updating realigner CLI docs. #1466 (fnothaft)
- [ADAM-1463] Rename recalibateBaseQualities method as recalibrateBaseQualities #1464 (heuermh)
- [ADAM-1453] Add hooks to publish ADAM docs from CI flow. #1461 (fnothaft)
- [ADAM-1459] Don't split FASTQ when compressed. #1459 (fnothaft)
- [ADAM-1451] Make VariantContextConverter class and convert methods public #1452 (fnothaft)
- Moving API overview from building apps doc to new source file. #1450 (heuermh)
- [ADAM-1424] Adding test for reads dropped in 0.21.0. #1448 (heuermh)
- [ADAM-1439] Add inferSequenceDictionary ctr to FeatureRDD. #1447 (heuermh)
- [ADAM-1445] Make apply method for FragmentRDD public. #1446 (fnothaft)
- [ADAM-1442] Fix thread pool deadlock in GenomicRDD.pipe #1443 (fnothaft)
- [ADAM-1164] Add parallel file merger. #1441 (fnothaft)
- Dependency version bump + BroadcastRegionJoin fix #1440 (fnothaft)
- added JavaApi for loadCoverage #1437 (akmorrow13)
- Update versions, etc. in build docs #1435 (heuermh)
- Add test sample(verify number of reads in loadAlignments function) and ADAM SNAPSHOT document #1433 (xubo245)
- Add cache argument to loadFeatures, additional Feature timers #1427 (heuermh)
- feat: speed up 2bit file extract #1426 (Blaok)
- BQSR refactor for perf improvements #1423 (fnothaft)
- Add ADAMContext/GenomicRDD/pipe docs #1422 (fnothaft)
- INDEL realigner cleanup #1412 (fnothaft)
- Estimate contig lengths in SequenceDictionary for BED, GFF3, GTF, and NarrowPeak feature formats #1411 (heuermh)
- Add coveralls badge to README.md. #1408 (fnothaft)
- [ADAM-1403] Push coverage reports to Coveralls. #1404 (fnothaft)
- Added instrumentation timers around joins. #1401 (fnothaft)
- Add Apache Spark version to --version text #1400 (heuermh)
- [ADAM-1398] Delete adam-scripts/R. #1399 (fnothaft)
- [ADAM-1383] Use gt instead of gteq in FASTQ input format line size checks #1396 (fnothaft)
- Maint spark2 2.11 0.21.0 #1395 (A-Tsai)
- [ADAM-1393] fix missing reads when transforming fastq to adam #1394 (A-Tsai)
- [ADAM-1380] Adds isEmpty method to RecordGroupDictionary. #1392 (fnothaft)
- [ADAM-1381] Fix Variant end position. #1389 (fnothaft)
- Make javac see that AlignmentRecordRDD extends GenomicRDD #1386 (fnothaft)
- Added ShuffleRegionJoin usage docs #1384 (devin-petersohn)
- Misc. INDEL realigner bugfixes #1382 (fnothaft)
- Add pipe API in and out formatters for Features #1378 (heuermh)
- [ADAM-1356] Make ADAMContext.getFsAndFiles and related protected visibility #1376 (heuermh)
- [ADAM-1372] Increase visibility for DefaultHeaderLines.allHeaderLines #1375 (heuermh)
- [ADAM-1371] Wrap ADAM->htsjdk VariantContext conversion with validation stringency. #1373 (fnothaft)
- [ADAM-1367] Register Consensus array for serialization. #1369 (fnothaft)
- [ADAM-1365] Apply validation stringency to reads on missing contigs when MD tagging #1366 (fnothaft)
- [ADAM-1362] Fixing issue where FromKnowns consensus model fails if no reads hit a target. #1363 (fnothaft)
- [ADAM-1352] Clean up consensus model usage. #1357 (fnothaft)
- Increase visibility for InFormatter case classes from package private to public #1355 (heuermh)
- Use htsjdk getAttributeAsList for VCF INFO ANN key #1348 (heuermh)
- Fixes parsing variant annotations for multi-allelic rows #1346 (majkiw)
- Sort pull requests by id #1345 (heuermh)
- HBase genotypes backend -revised #1335 (jpdna)
- [ADAM-1330] Move to Spark 2.1.0. #1332 (fnothaft)
- Support deduping fragments #1309 (fnothaft)
- [ADAM-1280] Silence CRAM logging in tests. #1294 (fnothaft)
- Added test to try and repro #1282. #1292 (fnothaft)
Closed issues:
- Update Markdown docs with ValidationStringency in VCF<->ADAM CLI #1342
- Variant VCFHeaderLine metadata does not handle wildcards properly #1339
- Close called multiple times on VCF header stream #1337
- BroadcastRegionJoin has serialization failures #1334
- adam-cli uses git-commit-id-plugin which breaks release? #1322
- move_to_xyz scripts should have interlocks... #1317
- Lineage for partitionAndJoin in ShuffleRegionJoin causes StackOverflow Errors #1308
- Add move_to_spark_1.sh script and update README to mention #1307
- adam-submit transform fails with Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class #1306
- private ADAMContext constructor? #1296
- AlignmentRecord.mateAlignmentEnd never set #1290
- how to submit my own driver class via adam-submit? #1289
- ReferenceRegion on Genotype seems busted? #1286
- Clarify strandedness in ReferenceRegion apply methods #1285
- Parquet and CRAM debug logging during unit tests #1280
- Add more ANN field parsing unit tests #1273
- loadVariantAnnotations returns empty RDD #1271
- Implement joinVariantAnnotations with region join #1259
- Count how many chromosome in the range of the kmer #1249
- ADAM minor release to support htsjdk 2.7.0? #1248
- how to config kryo.registrator programmatically #1245
- Does the nested record Flattener drop Maps/Arrays? #1244
- Dead-ish code cleanup in
org.bdgenomics.adam.utils
#1242 - java.io.FileNotFoundException for old adam file after upgrade to adam0.20 #1240
- please add maven-source-plugin into the pom file #1239
- Assembly jar doesn't get rebuilt on CLI changes #1238
- how to compare with the last the column for the same chromosome name? #1237
- Need a way for users to add VCF header lines #1233
- Enhancements to VCF save #1232
- Must we split multi-allelic sites in our Genotype model? #1231
- Can't override default -collapse in reads2coverage #1228
- Reads2coverage NPEs on unmapped reads #1227
- Strand bias doesn't get exported #1226
- Move ADAMFunSuite helper functions upstream to SparkFunSuite #1225
- broadcast join using interval tree #1224
- Instrumentation is lost in ShuffleRegionJoin #1222
- Bump Spark, Scala, Hadoop dependency versions #1221
- GenomicRDD shuffle region join passes partition count to partition size #1220
- Scala compile errors downstream of Spark 2 Scala 2.11 artifacts #1218
- Javac error: incompatible types: SparkContext cannot be converted to ADAMContext #1217
- Release 0.20.0 artifacts failed Sonatype Nexus validation #1212
- Release script failed for 0.20.0 release #1211
- gVCF - can't load multi-allelic sites #1202
- Allow open-ended intervals in loadIndexedBam #1196
- Interval tree join in ADAM #1171
- spark-submit throw exception in spark-standalone using .adam which transformed from .vcf #1121
- BroadcastRegionJoin is not a broadcast join #1110
- Improve test coverage of VariantContextConverter #1107
- Variant dbsnp rs id tracking in vcf2adam and ADAM2Vcf #1103
- Document core ADAM transform methods #1085
- Document deploying ADAM on Toil #1084
- Clean up packages #1083
- VariantCallingAnnotations is getting populated with INFO fields #1063
- How to load DatabaseVariantAnnotation information ? #1049
- Release ADAM version 0.20.0 #1048
- Support VCF annotation ANN field in vcf2adam and adam2vcf #1044
- How to create a rich(er) VariantContext RDD? Reconstruct VCF INFO fields. #878
- Add biologist targeted section to the README #497
- Update usage docs running for EC2 and CDH #493
- Add docs about building downstream apps on top of ADAM #291
- Variant filter representation #194
Merged and closed pull requests:
- [ADAM-1342] Update CLI docs after #1288 merged. #1343 (fnothaft)
- [ADAM-1339] Use glob-safe method to load VCF header metadata for Parquet #1340 (fnothaft)
- [ADAM-1337] Remove os.{flush,close} calls after writing VCF header. #1338 (fnothaft)
- [ADAM-1334] Clean up serialization issues in Broadcast region join. #1336 (fnothaft)
- [ADAM-1307] move_to_spark_2 fails after moving to scala 2.11. #1329 (fnothaft)
- unroll/optimize some JavaConversions #1326 (ryan-williams)
- clean up *Join type-params/scaldocs #1325 (ryan-williams)
- [ADAM-1322] Skip git commit plugin if .git is missing. #1323 (fnothaft)
- Supports access to indexed fa and fasta files #1320 (akmorrow13)
- Add interlocks for move_to_xyz scripts. #1319 (fnothaft)
- [ADAM-1307] Add script for moving to Spark 1. #1318 (fnothaft)
- Update move_to_spark_2.sh #1316 (creggian)
- [ADAM-1308] Fix stack overflow in join with custom iterator impl. #1315 (fnothaft)
- Why Adam? section added to README.md #1310 (tverbeiren)
- Add docs about using ADAM's Kryo registrator from another Kryo registrator. #1305 (fnothaft)
- Add docs about building downstream applications #1304 (heuermh)
- [ADAM-493] Add ADAM-on-Spark-on-YARN docs. #1301 (fnothaft)
- Code style fixes #1299 (heuermh)
- Make ADAMContext and JavaADAMContext constructors public #1298 (heuermh)
- Remove back reference between VariantAnnotation and Variant #1297 (fnothaft)
- [ADAM-1280] Silence CRAM logging in tests. #1294 (fnothaft)
- HBase as a separate repo #1293 (jpdna)
- Reference region cleanup #1291 (fnothaft)
- Clean rewrite of VariantContextConverter #1288 (fnothaft)
- add function:filterByOverlappingRegions #1287 (liamlee)
- Populate fields on VariantAnnotation #1283 (heuermh)
- Add VCF headers for fields in Variant and VariantAnnotation records #1281 (heuermh)
- CGCloud deploy docs #1279 (jpdna)
- some style nits #1278 (ryan-williams)
- use ParsedLoci in loadIndexedBam #1277 (ryan-williams)
- Increasing unit test coverage for VariantContextConverter #1276 (heuermh)
- Expose FeatureRDD to public #1275 (Georgehe4)
- Clean up CLI operation categories and names, and add documentation for CLI #1274 (fnothaft)
- Rename org.bdgenomics.adam.rdd.variation package to o.b.a.rdd.variant #1270 (heuermh)
- use testFile in some tests #1268 (ryan-williams)
- [ADAM-1083] Cleaning up
org.bdgenomics.adam.models
. #1267 (fnothaft) - make py file py3-forward-compatible #1266 (ryan-williams)
- rm accidentally-added file #1265 (fnothaft)
- Finishing up the cleanup on org.bdgenomics.adam.rdd. #1264 (fnothaft)
- Clean up
org.bdgenomics.adam.rich
package. #1263 (fnothaft) - Add docs for transform pipeline, ADAM-on-Toil #1262 (fnothaft)
- updates for bdg utils 0.2.9-SNAPSHOT #1261 (akmorrow13)
- [ADAM-1233] Expose header lines in Variant-related GenomicRDDs #1260 (fnothaft)
- [ADAM-1221] Bump Spark/Hadoop versions. #1258 (fnothaft)
- Rename org.bdgenomics.adam.rdd.features package to o.b.a.rdd.feature #1256 (heuermh)
- Clean up documentation in
org.bdgenomics.adam.projection
. #1255 (fnothaft) - [ADAM-1221] Bump Spark/Hadoop versions. #1254 (fnothaft)
- Misc shuffle join fixes. #1253 (fnothaft)
- [ADAM-1196] Add support for open ReferenceRegions. #1252 (fnothaft)
- [ADAM-1225] Move helper functions from ADAMFunSuite to SparkFunSuite. #1251 (fnothaft)
- Merge VariantAnnotation and DatabaseVariantAnnotation records #1250 (heuermh)
- Miscellaneous VCF fixes #1247 (fnothaft)
- HBase backend for Genotypes #1246 (jpdna)
- [ADAM-1242] Clean up dead code in org.bdgenomics.adam.util. #1243 (fnothaft)
- Small cleanup of "replacing uses of deprecated class SAMFileReader" #1236 (fnothaft)
- replacing uses of deprecated class SAMFileReader #1235 (lbergelson)
- [ADAM-1224] Replace BroadcastRegionJoin with tree based algo. #1234 (fnothaft)
- Fix reads2coverage issues #1230 (fnothaft)
- [ADAM-1212] Add empty assembly object, allows Maven build to create sources and javadoc artifacts #1215 (heuermh)
- [ADAM-1211] Fix call to move_to_scala_2.sh, reorder Spark 2.x Scala 2.10 and 2.10 sections #1214 (heuermh)
- demonstrate multi-allelic gVCF failure - test added #1205 (jpdna)
- Merge VariantAnnotation and DatabaseVariantAnnotation records #1144 (heuermh)
- Upgrade to bdg-formats-0.10.0 #1135 (fnothaft)
Closed issues:
- Sorting by reference index seems doesn't work or sorted by DESC order? #1204
- master won't compile #1200
- VCF format tag SB field parse error in loading #1199
- Publish sources JAR with snapshots #1195
- Type SparkFunSuite in package org.bdgenomics.utils.misc is not available #1193
- MDTagging fails on GRCh38 #1192
- Fix stack overflow in IndelRealigner serialization #1190
- Delete
./scripts/commit-pr.sh
#1188 - Hadoop globStatus returns null if no glob matches #1186
- Swapping out IntervalRDD under GenomicRDDs #1184
- How to get "SO coordinate" instead of "SO unsorted"? #1182
- How to read glob of multiple parquet Genotype #1179
- Update command line doc and examples in README.md #1176
- FastqRecordConverter needs cleanup and tests #1172
- TransformFormats write to .gff3 file path incorrectly writes as parquet #1168
- Should be able to merge shards across two different file systems #1165
- RG ID gets written as the index, not the record group name #1162
- Users should be able to save files as
-single
without merging them #1161 - Users should be able to set size of buffer used for merging files #1160
- Bump Hadoop-BAM to 7.7.0 #1158
- adam-shell prints command trace to stdout #1154
- Map IntervalList format column four to feature name or attributes? #1152
- Parquet storage of VariantContext #1151
- vcf2adam unparsable vcf record #1149
- Reorder kryo.register statements in ADAMKryoRegistrator #1146
- Make region joins public again #1143
- Support CRAM input/output #1141
- Transform should run with spark.kryo.requireRegistration=true #1136
- adam-shell not handling bash args correctly #1132
- Remove Gene and related models and parsing code #1129
- Generate Scoverage reports when running CI #1124
- Remove PairingRDD #1122
- SAMRecordConverter.convert takes unused arguments #1113
- Add Pipe API #1112
- Improve coverage in Feature unit tests #1106
- K-mer.scala code #1105
- add -single file output option to ADAM2Vcf #1102
- adam2vcf Fails with Sample not serializable #1100
- ReferenceRegion.apply(AlignmentRecord) should not NPE on unmapped reads #1099
- Add outer region join implementations #1098
- VariantContextConverter never returns DatabaseVariantAnnotation #1097
- loadvcf: conflicting require statement #1094
- ADAM version 0.19.0 will not run on Spark version 2.0.0 #1093
- Be more rigorous with FileSystem.get #1087
- Remove network-connected and default test-related Maven profiles #1073
- Releases should get pushed to Spark Packages #1067
- Invalid POM for cli on 0.19.0 #1066
- scala.MatchError RegExp does not catch colons in value part properly #1061
- Support writing IntervalList header for features #1059
- Add -single support when writing features in native formats #1058
- Remove workaround for gzip/BGZF compressed VCF headers #1057
- Clean up if clauses in Transform #1053
- Adam-0.18.2 can not load Adam-0.14.0 adamSave function data (sam) #1050
- filterByOverlappingRegion Incorrect for Genotypes #1042
- Move Interval trait to utils, added in #75 #1041
- Remove implicit GenomicRDD to RDD conversion #1040
- VCF sample metadata - proposal for a GenotypedSampleMetadata object #1039
- [build system] ADAM test builds pollute /tmp, leaving lots of cruft... #1038
- adamMarkDuplicates function in AlignmentRecordRDDFunctions class can not mark the same read? #1037
- test MarkDuplicatesSuite with two similar read in ref and start position and different avgPhredScore, error! #1035
- Explore protocol buffers vs Avro #1031
- Increase Avro dependency version to 1.8.0 #1029
- ADAM specific logging #1024
- Reenable Travis CI for pull request builds #1023
- Bump Apache Spark version to 1.6.1 in Jenkins #1022
- ADAM compatibility with Spark 2.0 #1021
- ADAM to BAM conversion failing on 1000G file #1013
- Factor out *RDDFunctions classes #1011
- Port single file BAM and header code to VCF #1009
- Roll Jenkins JDK 8 changes into ./scripts/jenkins-test #1008
- Support GFF3 format #1007
- Separate fat jar build from adam-cli to new maven module #1006
- adam-cli POM invalid: maven.build.timestamp #1004
- Sub-partitioning of Parquet file for ADAM #1003
- Flattening the Genotype schema #1002
- install adam 0.19 error! #1001
- How to solve it please? #1000
- Has the project realized alignment reads to reference genome algorithm? #996
- All file-based input methods should support running on directories, compressed files, and wildcards #993
- Contig to ContigName Change not reflected in AlignmentRecordField #991
- Add homebrew guidelines to release checklist or automate PR generation #987
- fix deprecation warnings #985
- rename
fragments
package #984 - Explore if SeqDict data can be factored out more aggressively #983
- Make "Adam" all caps in filename Adam2Fastq.scala #981
- Adam2Fastq should output reverse complement when 0x10 flag is set for read #980
- Allow lowercase letters in jar/version names #974
- Add stringency parameter to flagstat #973
- Arg-array parsing problem in adam-submit #971
- Pass recordGroup parameter to loadPairedFastq #969
- Send a number of partitions to sc.textFile calls #968
- adamGetReferenceString doesn't reduce pairs correctly #967
- Update ADAM formula in homebrew-science to version 0.19.0 #963
- BAM output in ADAM appears to be corrupt #962
- Remove code workarounds necessary for Spark 1.2.1/Hadoop 1.0.x support #959
- Issue with version 18.0.2 #957
- Expose sorting by reference index #952
- .rgdict and .seqdict files are not placed in the adam directory #945
- Why does count_kmers not return k-mers that are split between two records? #930
- Load legacy file formats to Spark SQL Dataframes #912
- Clean up RDD method names #910
- Load/store sequence dictionaries alongside Genotype RDDs #909
- vcf2adam -print_metrics throws IllegalStateException on Spark 1.5.2 or later #902
- error: no reads in first split: bad BAM file or tiny split size? #896
- FastaConverter.FastaDescriptionLine not kryo-registered #893
- Work With ADAM fasta2adam in a distributed mode #881
- vcf2adam -> Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; #871
- Code coverage profile is broken #849
- Building Adam on OS X 10.10.5 with Java 1.8 #835
- Normalize AlignmentRecord.recordGroup* fields onto a separate record type #828
- Gracefully handle missing Spark- and Hadoop-versions in jenkins-test; document how to set them. #827
- Use Adam File with Hive #820
- How do we handle reads that don't have original quality scores when converting to FASTQ with original qualities? #818
- SAMFileHeader "sort order" attribute being un-set during file-save job #800
- Use same sort order as Samtools #796
- RNAME and RNEXT fields jumbled on transform BAM->ADAM->BAM #795
- Support loading multiple indexed read files #787
- Duplicate OUTPUT command line argument metaVar in adam2fastq #776
- Allow Variant to ReferenceRegion conversion #768
- Spark Errors References Deprecated SPARK_CLASSPATH #767
- Spark Errors References Deprecated SPARK_CLASSPATH #766
- adam2vcf fails with -coalesce #735
- Writing to a BAM file with adamSAMSave consistently fails #721
- BQSR on C835.HCC1143_BL.4 uses excessive amount of driver memory #714
- Support writing RDD[Feature] to various file formats #710
- adamParquetSave has a menacing false error message about *.adam extension #681
- BAMHeader not set when running on a cluster #676
- spark 1.3.1 upgarde to hortonworks HDP 2.2.4.2-2? #675
Symbol
case class is nucleotide-centric #672- xAssembler cannot be build using mvn #658
- adam-submit VerifyError #642
- vcf2adam : Unsupported type ENUM #638
- Update CDH documentation #615
- Remove and generalize plugin code #602
- Fix record oriented shuffle #599
- Migrate preprocessing stages out of ADAM #598
- Publish/socialize a roadmap #591
- Eliminate format detection and extension checks for loading data #587
- Improve error message when we can't find a ReferenceRegion for a contig #582
- Do reference partitioners restrict a partition to contain keys from a single contig? #573
- Connection refused errors when transforming BAM file with BQSR #516
- ReferenceRegion shouldn't extend Ordered #511
- Documentation for common usecases #491
- Improve handling of "*" sequences during BQSR #484
- Original qualities are parsed out, but left in attribute fields #483
- Need a FileLocator that mirrors the use of Path in HDFS #477
- FileLocator should support finding "child" locators. #476
- Add S3 based Parquet directory loader #463
- Should FASTQ output use reads' "original qualities"? #436
- VcfStringUtils unused? #428
- We should be able to filter genotypes that overlap a region #422
- Create a simplified vocabulary for naming projections. #419
- Update documentation #406
- Bake off different region join implementations #395
- Handle no-ops more intelligently when creating MD tags #392
- Remove all the commands in the "CONVERSION OPERATIONS"
CommandGroup
#373 - Fail to Write RDD into HDFS with Parquet Format #344
- Refactor ReferencePositionWithOrientation #317
- Add docs about SPARK_LOCAL_IP #305
- PartitionAndJoin should throw an exception if it sees an unmapped read #297
- Add insert size calculation #296
- Newbie questions - learning resources? Reading a range of records from Adam? #281
- Add variant effect ontology #261
- Don't flatten optional SAM tags into a string #240
- Characterize impact of partition size on pileup creation #163
- Need to support BCF output format #153
- Allow list of commands to be injected into adam-cli AdamMain #132
- Parse out common annotations stored in VCF format #118
- Update normalization code to enable normalization of sequences with more than two indels #64
- Add clipping heuristic to indel realigner #63
- BQSR should support recalibration across multiple ADAM files #58
Merged and closed pull requests:
- fix SB tag parsing #1209 (fnothaft)
- Fastq record converter #1208 (fnothaft)
- Doc suggested partitionSize in ShuffleRegionJoin #1207 (jpdna)
- Test demonstrating region join failure #1206 (jpdna)
- fix SB tag parsing #1203 (jpdna)
- fix build #1201 (ryan-williams)
- [ADAM-1192] Correctly handle other whitespace in FASTA description. #1198 (fnothaft)
- [ADAM-1190] Manually (un)pack IndelRealignmentTarget set. #1191 (fnothaft)
- [ADAM-1188] Delete scripts/commit-pr.sh #1189 (fnothaft)
- [ADAM-1186] Mask null from fs.globStatus. #1187 (fnothaft)
- Fastq record converter #1185 (zyxue)
- [ADAM-1182] isSorted=true should write SO:coordinate in SAM/BAM/CRAM header. #1183 (fnothaft)
- Add scoverage aggregator and fail on low coverage. #1181 (fnothaft)
- [ADAM-1179] Improve error message when globbing a parquet file fails. #1180 (fnothaft)
- [ADAM-1176] Update command line doc and examples in README.md #1177 (heuermh)
- Refactor CLIs for merging sharded files #1167 (fnothaft)
- Update Hadoop-BAM to version 7.7.0 #1166 (heuermh)
- [ADAM-1162] Write record group string name. #1163 (fnothaft)
- Map IntervalList format column four to feature name #1159 (heuermh)
- Make AlignmentRecordConverter public so that it can be used from other projects #1157 (tomwhite)
- added predicate option to loadCoverage #1156 (akmorrow13)
- [ADAM-1154] Change set -x to set -e in ./bin/adam-shell. #1155 (fnothaft)
- Remove Gene and related models and parsing code #1153 (heuermh)
- Reorder kryo.register statements in ADAMKryoRegistrator #1148 (heuermh)
- Updated GenomicPartitioners to accept additional key. #1147 (akmorrow13)
- [ADAM-1141] Add support for saving/loading AlignmentRecords to/from CRAM. #1145 (fnothaft)
- misc pom/test/resource improvements #1142 (ryan-williams)
- [ADAM-1136] Transform runs successfully with kryo registration required #1138 (fnothaft)
- [ADAM-1132] Fix improper quoting of bash args in adam-shell. #1133 (fnothaft)
- Remove StructuralVariant and StructuralVariantType, add names field to Variant #1131 (heuermh)
- Remove StructuralVariant and StructuralVariantType, add names field to Variant #1130 (heuermh)
- PR #1108 with issue #1122 #1128 (fnothaft)
- [ADAM-1038] Eliminate writing to /tmp during CI builds. #1127 (fnothaft)
- Update for bdg-formats code style changes #1126 (heuermh)
- [ADAM-1124] Add Scoverage and generate coverage reports in Jenkins. #1125 (fnothaft)
- [ADAM-1093] Move to support Spark 2.0.0. #1123 (fnothaft)
- remove duplicated dependency #1119 (ryan-williams)
- Clean up ADAMContext #1118 (fnothaft)
- [ADAM-993] Support loading files using globs and from directory paths. #1117 (fnothaft)
- [ADAM-1087] Migrate away from FileSystem.get #1116 (fnothaft)
- [ADAM-1099] Make reference region not throw NPE. #1115 (fnothaft)
- Add pipes API #1114 (fnothaft)
- [ADAM-1105] Use assembly jar in adam-shell. #1111 (fnothaft)
- Add outer joins #1109 (fnothaft)
- Modified CalculateDepth to calcuate coverage from alignment files #1108 (akmorrow13)
- Resolves various single file save/header issues #1104 (fnothaft)
- [ADAM-1100] Resolve Sample Not Serializable exception #1101 (fnothaft)
- added loadIndexedVcf and loadIndexedBam for multiple ReferenceRegions #1096 (akmorrow13)
- Added support for Indexed VCF files #1095 (akmorrow13)
- [ADAM-582] Eliminate .get on option in FragmentCoverter. #1091 (fnothaft)
- [ADAM-776] Rename duplicate OUTPUT metaVar in ADAM2Fastq. #1090 (fnothaft)
- refactored ReferenceFile to require SequenceDictionary #1086 (akmorrow13)
- [ADAM-1073] Remove network-connected and default test-related Maven profiles #1082 (heuermh)
- [ADAM-1053] Clean up Transform #1081 (fnothaft)
- [ADAM-1061] Clean up attributes regex and denormalized fields #1080 (fnothaft)
- Extended TwoBitFile and NucleotideContigFragmentRDDFunctions to behave more similar #1079 (akmorrow13)
- Refactor variant and genotype annotations #1078 (heuermh)
- [ADAM-1039] Add basic support for Sample record. #1077 (fnothaft)
- Remove code workarounds necessary for Spark 1.2.1/Hadoop 1.0.x support #1076 (heuermh)
- [ADAM-194] Use separate filtersFailed and filtersPassed arrays for variant quality filters #1075 (heuermh)
- Whitespace code style fixes #1074 (heuermh)
- [ADAM-1006] Split überjar out to adam-assembly submodule. #1072 (fnothaft)
- Remove code coverage profile #1071 (heuermh)
- [ADAM-768] ReferenceRegion from variant/genotypes #1070 (fnothaft)
- [ADAM-1044] Support VCF annotation ANN field #1069 (heuermh)
- [ADAM-1067] Add release documentation and scripting for Spark Packages. #1068 (fnothaft)
- [ADAM-602] Remove plugin code. #1065 (fnothaft)
- Refactoring
org.bdgenomics.adam.io
package. #1064 (fnothaft) - Cleanup in org.bdgenomics.adam.converters package. #1062 (fnothaft)
- [ADAM-1057] Remove workaround for gzip/BGZF compressed VCF headers #1057 (heuermh)
- Cleanup on
org.bdgenomics.adam.algorithms.smithwaterman
package. #1056 (fnothaft) - Documentation cleanup and minor refactor on the consensus package. #1055 (fnothaft)
- Add KEYS with public code signing keys #1054 (heuermh)
- Adding GA4GH 0.5.1 converter for reads. #1052 (fnothaft)
- [ADAM-1011] Refactor to add GenomicRDDs for all Avro types #1051 (fnothaft)
- removed interval trait and redirected to interval in utils-intervalrdd #1046 (akmorrow13)
- [ADAM-952] Expose sorting by reference index. #1045 (fnothaft)
- overlap query reflects new formats #1043 (erictu)
- Changed loadIndexedBam to use hadoop-bam InputFormat #1036 (fnothaft)
- Increase Avro dependency version to 1.8.0 #1034 (heuermh)
- Improved README fix using feedback from other approach review. #1034 (InvisibleTech)
- Error in the README.md for kmer.scala example, need to get rdd first. #1032 (InvisibleTech)
- Add fragmentEndPosition to NucleotideContigFragment #1030 (heuermh)
- Logging to be done by ADAM utils code rather than Spark #1028 (jpdna)
- add maxScore #1027 (xubo245)
- [ADAM-1008] Modify jenkins-test script to support Java 8 build. #1026 (fnothaft)
- whitespace change, do not merge #1025 (shaneknapp)
- require kryo registration in tests #1020 (ryan-williams)
- print full stack traces on test failures #1019 (ryan-williams)
- bump commons-io version #1017 (ryan-williams)
- exclude javadoc jar in adam-shell #1016 (ryan-williams)
- [ADAM-909] Refactoring variation RDDs. #1015 (fnothaft)
- Modified CalculateDepth to get coverage on whole alignment adam files #1010 (akmorrow13)
- [ADAM-1004] Remove recursive maven.build.timestamp declaration #1005 (heuermh)
- Maint 2.11 0.19.0 #999 (tushu1232)
- [ADAM-710] Add saveAs methods for feature formats GTF, BED, IntervalList, and NarrowPeak #998 (heuermh)
- Moving Adam2Fastq to ADAM2Fastq #995 (heuermh)
- Update release doc for CHANGES.md and homebrew #994 (heuermh)
- Update to AlignmentRecordField and its usages as contig changed to co… #992 (jpdna)
- [ADAM-974] Short term fix for multiple ADAM cli assembly jars check #990 (heuermh)
- Update hadoop-bam dependency version to 7.5.0 #989 (heuermh)
- Replaced Contig with ContigName in AlignmentRecord and related changes #988 (jpdna)
- fix some deprecation/style things and rename a pkg #986 (ryan-williams)
- Fix Adam2fastq in case of read with both reverse and unmapped flags #982 (jpdna)
- [ADAM-510] Refactoring RDD function names #979 (heuermh)
- Use .adam/_{seq,rg}dict.avro paths for Avro-formatted dictionaries #978 (heuermh)
- Remove unused file VcfHeaderUtils.scala #977 (heuermh)
- add validation stringency to bam parsing, flagstat #976 (ryan-williams)
- more permissible jar regex in adam-submit #975 (ryan-williams)
- fix bash arg array processing in adam-submit #972 (ryan-williams)
- adamGetReferenceString reduces pairs correctly, fixes #967 #970 (erictu)
- A few improvements #966 (ryan-williams)
- improve SW performance by replacing functional reductions with imperative ones #965 (noamBarkai)
- [ADAM-962] Fix corrupt single-file BAM output. #964 (fnothaft)
- [ADAM-960] Updating bdg-utils dependency version to 0.2.4 #961 (heuermh)
- [ADAM-946] Fixes to FlagStat for Samtools concordance issue #954 (jpdna)
- Use hadoop-bam BAMInputFormat to do loadIndexedBam #953 (andrewmchen)
- Add -print_metrics option to Jenkins build #947 (heuermh)
- adam2vcf doesn't have info fields #939 (andrewmchen)
- [ADAM-893] Register missing serializers. #933 (fnothaft)
Closed issues:
- Update bdg-utils dependency version to 0.2.4 #960
- Drop support for Spark version 1.2.1, Hadoop version 1.0.x #958
- Exception occurs when running tests on master #956
- Flagstat results still don't match samtools flagstat #946
- readInFragment value is not properly read from parquet file into RDD[AlignmentRecord] #942
- adam2vcf -sort_on_save flag broken #940
- Transform -limit_projection requires .sam.seqdict file #937
- MarkDuplicates fails if library name is not set #934
- fastqtobam or sam #928
- Vcf2Adam uses SB field instead of FS field for fisher exact test for strand bias #923
- Add back limit_projection on Transform #920
- BAM header is not getting set on partition 0 with headerless BAM output format #916
- Add numParts apply method to GenomicRegionPartitioner #914
- Add Spark version 1.6.x to Jenkins build matrix #913
- Target Spark 1.5.2 as default Spark version #911
- Move to bdg-formats 0.7.0 #905
- secondOfPair and firstOfPair flag is missing in the newest 0.18 adam transformed results from BAM #903
- Future pull request #900
- error in vcf2adam #899
- Importing directory of VCFs seems to fail #898
- How to filter genotypeRDD on sample names? org.apache.spark.SparkException: Task not serializable? #891
- Add Spark version 1.5.x to Jenkins build matrix #889
- Transform DAG causes stages to recompute #883
- adam-submit buildinfo is confused #880
- move_to_scala_2.11 and maven-javadoc-plugin #863
- NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable #837
- Fix record oriented shuffle #599
- Avro.GenericData error with ADAM 0.12.0 on reading from ADAM file #290
Merged and closed pull requests:
- [ADAM-960] Updating bdg-utils dependency version to 0.2.4 #961 (heuermh)
- [ADAM-946] Fixes to FlagStat for Samtools concordance issue #954 (jpdna)
- Fix for travis build, replace reads2ref with reads2fragments #950 (heuermh)
- [ADAM-940] Fix adam2vcf -sort_on_save flag #949 (massie)
- Remove BuildInformation and extraneous git-commit-id-plugin configuration #948 (heuermh)
- Update readme for spark 1.5.2 and hadoop 2.6.0 #944 (heuermh)
- [ADAM-942] Replace first/secondInRead with readInFragment #943 (heuermh)
- [ADAM-937] Adding check for aligned read predicate or limit projection flags and non-parquet input path #938 (heuermh)
- [ADAM-934] Properly handle unset library name during duplicate marking #935 (fnothaft)
- [ADAM-911] Move to Spark 1.5.2 and Hadoop 2.6.0 as default versions. #932 (fnothaft)
- added start and end values to Interval Trait. Used for IntervalRDD #931 (akmorrow13)
- Removing buildinfo command #929 (heuermh)
- Removing symbolic test resource links, read from test classpath instead #927 (heuermh)
- Changed fisher strand bias field for VCF2Adam from SB to FS #924 (andrewmchen)
- [ADAM-920] Limit tag/orig qual flags in Transform. #921 (fnothaft)
- Change the README to use adam-shell -i instead of pasting #919 (andrewmchen)
- [ADAM-916] New strategy for writing header. #917 (fnothaft)
- [ADAM-914] Create a GenomicRegionPartitioner given a partition count. #915 (fnothaft)
- Squashed #907 and ran format-sources #908 (fnothaft)
- Various small fixes #907 (huitseeker)
- ADAM-599, 905: Move to bdg-formats:0.7.0 and migrate metadata #906 (fnothaft)
- Rewrote the getType method to handle all ploidy levels #904 (NeillGibson)
- Single file save from #733, rebased #901 (fnothaft)
- Added is* genotype methods from HTS-JDK Genotype to RichGenotype #895 (NeillGibson)
- [ADAM-891] Mark SparkContext as @transient. #894 (fnothaft)
- Update README URLs based on HTTP redirects #892 (ReadmeCritic)
- adding --version command line option #888 (heuermh)
- Add exception in move_to_scala_2.11.sh for maven-javadoc-plugin #887 (heuermh)
- Fix tightlist bug in Pandoc #885 (massie)
- [ADAM-883] Add caching to Transform pipeline. #884 (fnothaft)
- ISSUE 877: Minor fix to commit script to support https.
- ISSUE 876: Separate command line argument words by underscores
- ISSUE 875: P Operator parsing for MDTag
- ISSUE 873: [ADAM-872] Modify regex to capture release and SNAPSHOT jars but not javadoc or sources jars
- ISSUE 866: [ADAM-864] Don't force shuffle if reducing partition count.
- ISSUE 856: export valid fastq
- ISSUE 847: Updating build dependency versions to latest minor versions
- ISSUE 870: [ADAM-867] add pull requests missing from 0.18.0 release to CHANGES.md
- ISSUE 869: [ADAM-868] make release branch and tag names consistent
- ISSUE 862: [ADAM-861] use -d to check for repo assembly dir
- ISSUE 860: New release and pr-commit scripts
- ISSUE 859: [ADAM-857] Corrected handling of env vars in bin scripts
- ISSUE 854: [ADAM-853] allow main class in adam-submit to be specified
- ISSUE 852: [ADAM-851] Slienced Parquet logging.
- ISSUE 850: [ADAM-848] TwoBitFile now support nBlocks and maskBlocks
- ISSUE 846: Updating maven build plugin dependency versions
- ISSUE 845: [ADAM-780] Make DecadentRead package private.
- ISSUE 844: [ADAM-843] Aggressively project out metadata fields.
- ISSUE 840: fix flagstat output file encoding
- ISSUE 839: let flagstat write to file
- ISSUE 831: Support loading paired fastqs
- ISSUE 830: better validation when saving paired fastqs
- ISSUE 829: fix
Long != null
warnings - ISSUE 819: Implement custom ReferenceRegion hashcode
- ISSUE 816: [ADAM-793] adding command to convert ADAM nucleotide contig fragments to FASTA files
- ISSUE 815: Upgrade to bdg-formats:0.6.0, add Fragment datatype converters
- ISSUE 814: [ADAM-812] fix for javadoc errors on JDK8
- ISSUE 813: [ADAM-808] build an assembly cli jar with maven shade plugin
- ISSUE 810: [ADAM-807] workaround for git-commit-id/git-commit-id-maven-plugin#61
- ISSUE 809: [ADAM-785] Add support for all numeric array (TYPE=B) tags
- ISSUE 806: [ADAM-755] updating utils dependency version to 0.2.3
- ISSUE 805: Better transform error when file doesn't exist
- ISSUE 803: fix unmapped-read sorting
- ISSUE 802: stop writing contig names as md5 sums
- ISSUE 798: fix SAM-attr conversion bug; int[]'s not byte[]'s
- ISSUE 790: optionally add MDTags to reads with
transform
- ISSUE 782: Fix SAM Attribute parser for numeric array tags
- ISSUE 773: [ADAM-772] fix some bash var quoting
- ISSUE 765: [ADAM-752] Build for many combos of Spark/Hadoop versions.
- ISSUE 764: More involved README restructuring
- ISSUE 762: [ADAM-132] allowing list of commands to be injected into adam-cli ADAMMain
- ISSUE 784: [ADAM-783] Write @SQ header lines in sorted order.
- ISSUE 792: [ADAM-791] Add repartition parameter to Fasta2ADAM.
- ISSUE 781: [ADAM-777] Add validation stringency flag for BQSR.
- ISSUE 757: We should print a warning message if the user has ADAM_OPTS set.
- ISSUE 770: [ADAM-769] Fix serialization issue in known indel consensus model.
- ISSUE 763: Clean up README links, other nits
- ISSUE 749: Remove adam-cli jar from classpath during adam-submit
- ISSUE 754: Bump ADAM to Spark 1.4
- ISSUE 753: Bump Spark to 1.4
- ISSUE 748: Fix for mdtag issues with insertions
- ISSUE 746: Upgrade to Parquet 1.8.1.
- ISSUE 744: [ADAM-743] exclude conflicting jackson dependencies
- ISSUE 737: Reverse complement negative strand reads in fastq output
- ISSUE 731: Fixed bug preventing use of TLEN attribute
- ISSUE 730: [ADAM-729] Stuff TLEN into attributes.
- ISSUE 728: [ADAM-709] Remove FeatureHierarchy and FeatureHierarchySuite
- ISSUE 719: [ADAM-718] Use filesystem path to get underlying file system.
- ISSUE 712: unify header-setting between BAM/SAM and VCF
- ISSUE 696: include SequenceRecords from second-in-pair reads
- ISSUE 698: class-ify ShuffleRegionJoin, force setting seqdict
- ISSUE 706: restore clause guarding pruneCache check
- ISSUE 705: GeneFeatureRDDFunctions → FeatureRDDFunctions
- ISSUE 691: fix BAM/SAM header setting when writing on cluster
- ISSUE 688: make adamLoad public
- ISSUE 694: Fix parent reference in distribution module
- ISSUE 684: a few region-join nits
- ISSUE 682: [ADAM-681] Remove menacing error message about reqd .adam extension
- ISSUE 680: [ADAM-674] Delete Bam2ADAM.
- ISSUE 678: upgrade to bdg utils 0.2.1
- ISSUE 668: [ADAM-597] Move correction out of ADAM and into a downstream project.
- ISSUE 671: Bug fix in ReferenceUtils.unionReferenceSet
- ISSUE 667: [ADAM-666] Clean up key not found error in partitioner code.
- ISSUE 656: Update Vcf2ADAM.scala
- ISSUE 652: added filterByOverlappingRegion in GeneFeatureRDDFunctions
- ISSUE 650: [ADAM-649] Support transform of all BAM/SAM files in a directory.
- ISSUE 647: [ADAM-646] Special case reads with '*' quality during BQSR.
- ISSUE 645: [ADAM-634] Create a local ParquetLister for testing purposes.
- ISSUE 633: [Adam] Tests for SAMRecordConverter.scala
- ISSUE 641: [ADAM-640] Fix incorrect exclusion for org.seqdoop.htsjdk.
- ISSUE 632: [ADAM-631] Allow VCF conversion to sort on output after coalescing.
- ISSUE 628: [ADAM-627] Makes ReferenceFile trait extend Serializable.
- ISSUE 637: check for mac brew alternate spark install structure
- ISSUE 624: Conceptual fix for duplicate marking and sorting stragglers
- ISSUE 629: [ADAM-604] Remove normalization code.
- ISSUE 630: Add flatten command.
- ISSUE 619: [ADAM-540] Move to new HTSJDK release; should support Java 8.
- ISSUE 626: [ADAM-625] Enable globbing for BAM.
- ISSUE 621: Removes the predicates package.
- ISSUE 620: [ADAM-600] Adding RegionJoin trait.
- ISSUE 616: [ADAM-565] Upgrade to Parquet filter2 API.
- ISSUE 613: [ADAM-612] Point to proper k-mer counters.
- ISSUE 588: [ADAM-587] Clean up loading checks.
- ISSUE 592: [ADAM-513] Remove ReferenceMappable trait.
- ISSUE 606: [ADAM-605] Remove visualization code.
- ISSUE 596: [ADAM-595] Delete the 'comparisons' code.
- ISSUE 590: [ADAM-589] Removed pileup code.
- ISSUE 586: [ADAM-452] Fixes SM attribute on ADAM to BAM conversion.
- ISSUE 584: [ADAM-583] Add k-mer counting functionality for nucleotide contig fragments
- ISSUE 570: A few small conversion fixes
- ISSUE 579: [ADAM-578] Update end of read when trimming.
- ISSUE 564: [ADAM-563] Add warning message when saving Parquet files with incorrect extension
- ISSUE 576: Changed hashCode implementations to improve performance of BQSR
- ISSUE 569: Typo in the narrowPeak parser
- ISSUE 568: Moved the Timers object from bdg-utils back to ADAM
- ISSUE 478: Move non-genomics code
- ISSUE 550: [ADAM-549] Added documentation for testing and CI for ADAM.
- ISSUE 555: Makes maybeLoadVCF private.
- ISSUE 558: Makes Features2ADAMSuite use SparkFunSuite
- ISSUE 557: Randomize ports and turn off Spark UI to reduce bind exceptions in tests
- ISSUE 552: Create test suite for FlagStat
- ISSUE 554: privatize ADAMContext.maybeLoad{Bam,Fastq}
- ISSUE 551: [ADAM-386] Multiline FASTQ input
- ISSUE 542: Variants Visualization
- ISSUE 545: [ADAM-543][ADAM-544] Fix issues with ADAM scripts and classpath
- ISSUE 535: [ADAM-441] put a check in for Nothing. Throws an IAE if no return type is provided
- ISSUE 546: [ADAM-532] Fix wigFix intermittent test failure
- ISSUE 534: [ADAM-528][ADAM-533] Adds new RegionJoin impl that is shuffle-based
- ISSUE 531: [ADAM-529] Attaching scaladoc to released distribution.
- ISSUE 413: [ADAM-409][ADAM-520] Added local wigfix2bed tool
- ISSUE 527: [ADAM-526]
VcfAnnotation2ADAM
only counts once - ISSUE 523: don't open non-.adam-extension files as ADAM files
- ISSUE 521: quieting wget output
- ISSUE 482: [ADAM-462] Coverage region calculation
- ISSUE 515: [ADAM-510] fix for bash syntax error; add ADDL_JARS check to adam-submit
- ISSUE 509: Add a 'distribution' module to create assemblies
- ISSUE 508: Upgrade from Parquet 1.4.3 to 1.6.0rc4
- ISSUE 498: [ADAM-496] Changes VCF to flat ADAM command name and usage
- ISSUE 500: [ADAM-495] Require SPARK_HOME for adam-submit
- ISSUE 501: [ADAM-499] Add -onlyvariants option to vcf2adam
- ISSUE 507: [ADAM-505] Removed
adam-local
from docs - ISSUE 504: [ADAM-502] Add missing Long implicit to ColumnReaderInput
- ISSUE 503: [ADAM-473] Make RecordCondition and FieldCondition public
- ISSUE 494: Fix foreach block for vcf ingest
- ISSUE 492: Documentation cleanup and style improvements
- ISSUE 481: [ADAM-480] Switch assembly to single goal.
- ISSUE 487: [ADAM-486] Add port option to viz command.
- ISSUE 469: [ADAM-461] Fix ReferenceRegion and ReferencePosition impl
- ISSUE 440: [ADAM-439] Fix ADAM to account for BDG-FORMATS-35: Avro uses Strings
- ISSUE 470: added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions
- ISSUE 468: refactor RDD loading; explicitly load alignments
- ISSUE 474: Consolidate documentation into a single location in source.
- ISSUE 471: Fixed typo on MAVEN_OPTS quotation mark
- ISSUE 467: [ADAM-436] Optionally output original qualities to fastq
- ISSUE 451: add
adam view
command, analogous tosamtools view
- ISSUE 466: working examples on .sam included in repo
- ISSUE 458: Remove unused val from Reads2Ref
- ISSUE 438: Add ability to save paired-FASTQ files
- ISSUE 457: A few random Predicate-related cleanups
- ISSUE 459: a few tweaks to scripts/jenkins-test
- ISSUE 460: Project only the sequence when kmer/qmer counting
- ISSUE 450: Refactor some file writing and reading logic
- ISSUE 455: [ADAM-454] Add serializers for Avro objects which don't have serializers
- ISSUE 447: Update the contribution guidelines
- ISSUE 453: Better null handling for isSameContig utility
- ISSUE 417: Stores original position and original cigar during realignment.
- ISSUE 449: read “OQ” attr from structured SAMRecord field
- ISSUE 446: Revert "[ADAM-237] Migrate to Chill serialization libraries."
- ISSUE 437: random nits
- ISSUE 434: Few transform tweaks
- ISSUE 435: [ADAM-403] Remove seqDict from RegionJoin
- ISSUE 431: A few tweaks, typo corrections, and random cleanups
- ISSUE 430: [ADAM-429] adam-submit now handles args correctly.
- ISSUE 427: Fixes for indel realigner issues
- ISSUE 418: [ADAM-416] Removing 'ADAM' prefix
- ISSUE 404: [ADAM-327] Adding gene, transcript, and exon models.
- ISSUE 414: Fix error in
adam-local
alias - ISSUE 415: Update README.md to reflect Spark 1.1
- ISSUE 412: [ADAM-411] Updated usage aliases in README. Fixes #411.
- ISSUE 408: [ADAM-405] Add FASTQ output.
- ISSUE 385: [ADAM-384] Adds import from FASTQ.
- ISSUE 400: [ADAM-399] Fix link to schemas.
- ISSUE 396: [ADAM-388] Sets Kryo serialization with --conf args
- ISSUE 394: [ADAM-393] Adds knobs to SparkContext creation in SparkFunSuite
- ISSUE 391: [ADAM-237] Migrate to Chill serialization libraries.
- ISSUE 380: Rewrite of MarkDuplicates which seems to improve performance
- ISSUE 387: fix some deprecation warnings
- ISSUE 376: [ADAM-375] Upgrade to Hadoop-BAM 7.0.0.
- ISSUE 378: [ADAM-360] Upgrade to Spark 1.1.0.
- ISSUE 379: Fix the position of the jar path in the submit.
- ISSUE 383: Make Mdtags handle '=' and 'X' cigar operators
- ISSUE 369: [ADAM-369] Improve debug output for indel realigner
- ISSUE 377: [ADAM-377] Update to Jenkins scripts and README.
- ISSUE 374: [ADAM-372][ADAM-371][ADAM-365] Refactoring CLI to simplify and integrate with Spark model better
- ISSUE 370: [ADAM-367] Updated alias in README.md
- ISSUE 368: erasure, nonexhaustive-match, deprecation warnings
- ISSUE 354: [ADAM-353] Fixing issue with SAM/BAM/VCF header attachment when running distributed
- ISSUE 357: [ADAM-357] Added Java Plugin hook for ADAM.
- ISSUE 352: Fix failing MD tag
- ISSUE 363: Adding maven assembly plugin configuration to create tarballs
- ISSUE 364: [ADAM-364] Fixing remaining cs.berkeley.edu URLs.
- ISSUE 362: Remove mention of uberjar from README
- ISSUE 343: Allow retrying on failure for HTTPRangedByteAccess
- ISSUE 349: Fix for a NullPointerException when hostname is null in Task Metrics
- ISSUE 347: Bug fix for genome browser
- ISSUE 346: Genome visualization
- ISSUE 342: [ADAM-309] Update to bdg-formats 0.2.0
- ISSUE 333: [ADAM-332] Upgrades ADAM to Spark 1.0.1.
- ISSUE 341: [ADAM-340] Adding the TrackedLayout trait and implementation.
- ISSUE 337: [ADAM-335] Updated README.md to reflect migration to appassembler.
- ISSUE 311: Adding several simple normalizations.
- ISSUE 330: Make mismatch and deletes positions accessible
- ISSUE 334: Moving code coverage into a profile
- ISSUE 329: Add count of mismatches to mdtag
- ISSUE 328: [ADAM-326] Adding a 5-second retry on the HttpRangedByteAccess test.
- ISSUE 325: Adding documentation for commit/issue nomenclature and rebasing
- ISSUE 308: Fixing the 'index 0' bug in features2adam
- ISSUE 306: Adding code for lifting over between sequences and the reference genome.
- ISSUE 320: Remove extraneous implicit methods in ReferenceMappingContext
- ISSUE 314: Updates to indel realigner to improve performance and accuracy.
- ISSUE 319: Adding scripts for publishing scaladoc.
- ISSUE 315: Added table of (wall-clock) stage durations when print_metrics is used
- ISSUE 312: Fixing sources jar
- ISSUE 313: Making the CredentialsProperties file optional
- ISSUE 267: Parquet and indexed Parquet RDD implementations, and indices.
- ISSUE 301: Add Beacon's AlleleCount
- ISSUE 293: Add aggregation and display of metrics obtained from Spark
- ISSUE 295: Fix broken link to ADAM specification for storing reads.
- ISSUE 292: Cleaning up scaladoc generation warnings.
- ISSUE 289: Modifying interleaved fastq format to be hadoop version independent.
- ISSUE 288: Add ADAMFeature to Kryo registrator
- ISSUE 286: Removing some debug printout that was left in.
- ISSUE 287: Cleaning hadoop dependencies
- ISSUE 285: Refactoring read groups to increase the amount of data stored.
- ISSUE 284: Cleaning up build warnings.
- ISSUE 280: Move to bdg-formats
- ISSUE 283: Fix reference name comment
- ISSUE 282: Minor cleanup on interleaved FASTQ input format.
- ISSUE 277: Implemented HTTPRangedByteAccess.
- ISSUE 274: Added clarifying note to
ADAMVariantContext
- ISSUE 279: Simplify format-source
- ISSUE 278: Use maven license plugin to ensure source has correct license
- ISSUE 268: Adding fixed depth prefix trie implementation
- ISSUE 273: Fixes issue in reference models where strings are not sanitized on collection from avro.
- ISSUE 272: Created command categories
- ISSUE 269: Adding k-mer and q-mer counting.
- ISSUE 271: Consolidate Parquet logging configuration
- ISSUE 264: Parquet-related Utility Classes
- ISSUE 259: ADAMFlatGenotype is a smaller, flat version of a genotype schema
- ISSUE 266: Removed extra command 'BuildInformation'
- ISSUE 263: Added AdamContext.referenceLengthFromCigar
- ISSUE 260: Modifying conversion code to resolve #112.
- ISSUE 258: Adding an 'args' parameter to the plugin framework.
- ISSUE 262: Adding reference assembly name to ADAMContig.
- ISSUE 256: Upgrading to Spark 1.0
- ISSUE 257: Adds toString method for sequence dictionary.
- ISSUE 255: Add equals, canEqual, and hashCode methods to MdTag class
- ISSUE 254: Cleanup import statements
- ISSUE 250: Adding ADAM to SAM conversion.
- ISSUE 248: Adding utilities for read trimming.
- ISSUE 252: Added a note about rebasing-off-master to CONTRIBUTING.md
- ISSUE 249: Cosmetic changes to FastaConverter and FastaConverterSuite.
- ISSUE 251: CHANGES.md is updated at release instead of per pull request
- ISSUE 247: For #244, Fragments were incorrect order and incomplete
- ISSUE 246: Making sample ID field in genotype nullable.
- ISSUE 245: Adding ADAMContig back to ADAMVariant.
- ISSUE 243: Rebase PR#238 onto master
- ISSUE 242: Upgrade to Parquet 1.4.3
- ISSUE 241: Fixes to FASTA code to properly handle indices.
- ISSUE 239: Make ADAMVCFOutputFormat public
- ISSUE 233: Build up reference information during cigar processing
- ISSUE 234: Predicate to filter conversion
- ISSUE 235: Remove unused contiglength field
- ISSUE 232: Add
-pretty
and-o
to theprint
command - ISSUE 230: Remove duplicate mdtag field
- ISSUE 231: Helper scripts to run an ADAM Console.
- ISSUE 226: Fix ReferenceRegion from ADAMRecord
- ISSUE 225: Change Some to Option to check for unmapped reads
- ISSUE 223: Use SparkConf object to configure SparkContext
- ISSUE 217: Stop using reference IDs and use reference names instead
- ISSUE 220: Update SAM to ADAM conversion
- ISSUE 213: BQSR updates
- ISSUE 214: Upgrade to Spark 0.9.1
- ISSUE 211: FastaConverter Refactor
- ISSUE 212: Cleanup build warnings
- ISSUE 210: Remove Scalariform from process-sources phase
- ISSUE 209: Fix Scalariform issues and Maven warnings
- ISSUE 207: Change from deprecated manifest erasure to runtimeClass
- ISSUE 206: Add Scalariform settings to pom
- ISSUE 204: Update Avro code gen to not mark fields as deprecated.
- ISSUE 203: Move package from edu.berkeley.cs.amplab to org.bdgenomics
- ISSUE 199: Updating pileup conversion code to convert sequences that use the X and = (EQ) CIGAR operators
- ISSUE 191: Add repartition parameter
- ISSUE 183: Fixing Job.getInstance call that breaks hadoop 1 compatibility.
- ISSUE 192: Add docs and scripts for creating a release
- ISSUE 193: Issue #137, clarify role of CHANGES.{md,txt}
- ISSUE 187: Add summarize_genotypes command
- ISSUE 178: Upgraded to Hadoop-BAM 0.6.2/Picard 1.107.
- ISSUE 173: Parse annotations out of vcf files
- ISSUE 162: Refactored SequenceDictionary
- ISSUE 180: BQSR using vcf loader
- ISSUE 179: Update maven-surefire-plugin dependency version to 2.17, also create an ...
- ISSUE 175: VariantContext converter refactor
- ISSUE 169: Cleaning up mpileup command
- ISSUE 170: Adding variant field enumerations
- ISSUE 166: Pair-wise genotype concordance of genotype RDDs, with CLI tool
- ISSUE 171: Add back in allele dosage for genotypes.
- ISSUE 167: Fix for Hadoop 1.0.x support
- ISSUE 165: call PluginExecutor in apply method, fixes issue 164
- ISSUE 160: Refactoring FASTA work to break contig sizes.
- ISSUE 78: Upgrade to Spark 0.9 and Scala 2.10
- ISSUE 138: Display Git commit info on command line
- ISSUE 161: Added switches to spark context creation code
- ISSUE 117: Add a "range join" method.
- ISSUE 151: Vcf work concordance and genotype
- ISSUE 150: Remaining variant changes for adam2vcf, unit tests, and CLI modifications
- ISSUE 147: Resurrect VCF conversion code
- ISSUE 148: Moving createSparkContext into core
- ISSUE 142: Enforce Maven and Java versions
- ISSUE 144: Merge of last few days of work on master into this branch
- ISSUE 124: Vcf work rdd master merge
- ISSUE 143: Changing package declaration to match test file location and removing un...
- ISSUE 140: Update README.md
- ISSUE 139: Update README.md
- ISSUE 129: Modified pileup transforms to improve performance + to add options
- ISSUE 116: add fastq interleaver script
- ISSUE 125: Add design doc to CONTRIBUTING document
- ISSUE 114: Changes to RDD utility files for new variant schema
- ISSUE 122: Add IRC Channel to readme
- ISSUE 100: CLI component changes for new variant schema
- ISSUE 108: Adding new PluginExecutor command
- ISSUE 98: Vcf work remove old variant
- ISSUE 104: Added the port erasure to SparkFunSuite's cleanup.
- ISSUE 107: Cleaning up change documentation.
- ISSUE 99: Encoding tag types in the ADAMRecord attributes, adding the 'tags' command
- ISSUE 105: Add initial documentation on contributing
- ISSUE 97: New schema, variant context converter changes, and removal of old genoty...
- ISSUE 79: Adding ability to convert reference FASTA files for nucleotide sequences
- ISSUE 91: Minor change, increase adam-cli usage width to 150 characters
- ISSUE 86: Fixes to pileup code
- ISSUE 88: Added function for building variant context from genotypes.
- ISSUE 81: Update README and cleanup top-level cli help text
- ISSUE 76: Changing hadoop fs call to be compatible with Hadoop 1.
- ISSUE 74: Updated CHANGES.txt to include note about the recursive-load branch.
- ISSUE 73: Support for loading/combining multiple ADAM files into a single RDD.
- ISSUE 72: Added ability to create regions from reads, and to merge adjacent regions
- ISSUE 71: Change RecalTable to use optimized phred calculations
- ISSUE 68: sonatype-nexus-snapshots repository is already in parent oss-parent-7 pom
- ISSUE 67: fix for wildcard exclusion maven warnings
- ISSUE 65: Create a cache for phred -> double values instead of recalculating
- ISSUE 60: Bugfix for BQSR: Offset into qualityScore list was wrong
- ISSUE 66: add pluginDependency section and remove versions in plugin sections
- ISSUE 61: Filter utility for inverse of Projection
- ISSUE 48: Fix read groups mapping and add Y as base type
- ISSUE 36: Adding reads to rods transformation.
- ISSUE 56: Adding Yy as base in MdTag
- ISSUE 53: Fix Hadoop 2.2.0 support, upgrade to Spark 0.8.1
- ISSUE 52: Attributes: Use 't' instead of ',', as , is a valid character
- ISSUE 47: Adding containsRefName to SequenceDictionary
- ISSUE 46: Reduce logging for the actual adamSave job
- ISSUE 45: Make MdTag immutable
- ISSUE 38: Small bugfixes and cleanups to BQSR
- ISSUE 40: Fixing reference position from offset implementation
- ISSUE 31: Fixing a few issues in the ADAM2VCF2ADAM pipeline.
- ISSUE 30: Suppress parquet logging in FieldEnumerationSuite
- ISSUE 28: Fix build warnings
- ISSUE 24: Add unit tests for marking duplicates
- ISSUE 26: Fix unmapped reads in sequence dictionary
- ISSUE 23: Generalizing the Projection class
- ISSUE 25: Adding support for before, after clauses to SparkFunSuite.
- ISSUE 22: Add a unit test for sorting reads
- ISSUE 21: Adding rod functionality: a specialized grouping of pileup data.
- ISSUE 13: Cleaning up VCF<->ADAM pipeline
- ISSUE 20: Added Apache License 2.0 boilerplate to tops of all the GB-(c) files
- ISSUE 19: Allow the Hadoop version to be specified
- ISSUE 17: Fix transform -sort_reads partitioning. Add -coalesce option to transform.
- ISSUE 16: Fixing an issue in pileup generation and in the MdTag util.
- ISSUE 15: Tweaks 1
- ISSUE 12: Subclass testing bug in AdamContext.adamLoad
- ISSUE 11: Missing brackets in VcfConverter.getType
- ISSUE 10: Moved record field name enum over to the projections package.
- ISSUE 8: Fixes to sorting in ReferencePosition
- ISSUE 4: New SparkFunSuite test support class, logging util and new BQSR test.
- ISSUE 1: Fix scalatest configuration and fix unit tests
- ISSUE 14: Converting some of the Option() calls to Some()
- ISSUE 13: Cleaning up VCF<->ADAM pipeline
- ISSUE 9: Adding support for a Sequence Dictionary from BAM files
- ISSUE 8: Fixes to sorting in ReferencePosition
- ISSUE 7: ADAM variant and genotype formats; and a VCF->ADAM converter
- ISSUE 4: New SparkFunSuite test support class, logging util and new BQSR test.
- ISSUE 3: Adding in implicit conversion functions for going between Java and Scala...
- ISSUE 2: Update from Spark 0.7.3 to 0.8.0-incubating
- ISSUE 1: Fix scalatest configuration and fix unit tests