Skip to content

Releases: dimajix/flowman

0.21.1

24 Feb 12:33
Compare
Choose a tag to compare
  • flowexec now returns different exit codes depending on the processing result

0.21.0

26 Jan 14:48
Compare
Choose a tag to compare

This is a minor release with only few noticeable changes, but some internal refactorings.

  • Fix wrong dependencies in Swagger plugin
  • Implement basic schema inference for local CSV files
  • Implement new stack mapping
  • Improve error messages of local CSV parser

0.20.1

07 Jan 06:16
Compare
Choose a tag to compare
  • Implement detection of dependencies introduced by schema

0.20.0

05 Jan 16:32
Compare
Choose a tag to compare
  • Fix detection of Derby metastore to truncate comment lengths.
  • Add new config variable flowman.default.relation.input.columnMismatchPolicy (default is IGNORE)
  • Add new config variable flowman.default.relation.input.typeMismatchPolicy (default is IGNORE)
  • Add new config variable flowman.default.relation.output.columnMismatchPolicy (default is ADD_REMOVE_COLUMNS)
  • Add new config variable flowman.default.relation.output.typeMismatchPolicy (default is CAST_ALWAYS)
  • Improve handling of _SUCCESS files for detecting (non-)dirty directories
  • Implement new merge target
  • Implement merge operation for Delta relations
  • Implement merge operation for JDBC relations (only for some databases, i.e. MS SQL)
  • Add new config variable flowman.execution.target.useHistory (default is false)
  • Change the semantics of config variable flowman.execution.target.forceDirty (default is false)
  • Add new -d / --dirty option for explicitly marking individual targets as dirty

0.19.0

14 Dec 11:18
Compare
Choose a tag to compare
  • Add build profile for Hadoop 3.3
  • Add build profile for Spark 3.2
  • Allow SQL expressions as dimensions in aggregate mapping
  • Update Hive views when the resulting schema would change
  • Add new mapping cache command to FlowShell
  • Support embedded connection definitions
  • Much improved Flowman History Server
  • Fix wrong metric names with TemplateTarget
  • Implement more template types for connection, schema, dataset, assertion and measure
  • Implement new measure target for creating custom metrics for measuring data quality
  • Add new config option flowman.execution.mapping.parallelism

0.18.0

13 Oct 17:37
Compare
Choose a tag to compare
  • Improve automatic schema migration for Hive and JDBC relations
  • Improve support of CHAR(n) and VARCHAR(n) types. Those types will now be propagates to Hive with newer Spark versions
  • Support writing to dynamic partitions for file relations, Hive tables, JDBC relations and Delta tables
  • Fix the name of some config variables (floman.* => flowman.*)
  • Added new config variables flowman.default.relation.migrationPolicy and flowman.default.relation.migrationStrategy
  • Add plugin for supporting DeltaLake (https://delta.io), which provides deltaTable and deltaFile relation types
  • Fix non-deterministic column order in schema mapping, values mapping and values relation
  • Mark Hive dependencies has 'provided', which reduces the size of dist packages
  • Significantly reduce size of AWS dependencies in AWS plugin
  • Add new build profile for Cloudera CDP-7.1
  • Improve Spark configuration of LocalSparkSession and TestRunner
  • Update Spark 3.0 build profile to Spark 3.0.3
  • Upgrade Impala JDBC driver from 2.6.17.1020 to 2.6.23.1028
  • Upgrade MySQL JDBC driver from 8.0.20 to 8.0.25
  • Upgrade MariaDB JDBC driver from 2.2.4 to 2.7.3
  • Upgrade several Maven plugins to latest versions
  • Add new config option flowman.workaround.analyze_partition to workaround CDP 7.1 issues
  • Fix migrating Hive views to tables and vice-versa
  • Add new option "-j " to allow running multiple job instances in parallel
  • Add new option "-j " to allow running multiple tests in parallel
  • Add new uniqueKey assertion
  • Add new schema assertion
  • Update Swagger libraries for swagger schema
  • Implement new openapi plugin to support OpenAPI 3.0 schemas
  • Add new readHive mapping
  • Add new simpleReport and report hook
  • Implement new templates

0.17.1

18 Jun 09:20
Compare
Choose a tag to compare
  • Bump CDH version to 6.3.4
  • Fix scope of some dependencies
  • Update Spark to 3.1.2
  • Add new values relation

0.17.0

04 Jun 08:02
Compare
Choose a tag to compare
  • New Flowman Kernel and Flowman Studio application prototypes
  • New ParallelExecutor
  • Fix before/after dependencies in count target
  • Default build is now Spark 3.1 + Hadoop 3.2
  • Remove build profiles for Spark 2.3 and CDH 5.15
  • Add MS SQL Server plugin containing JDBC driver
  • Speed up file listing for file relations
  • Use Spark JobGroups
  • Better support running Flowman on Windows with appropriate batch scripts

0.16.0

26 Apr 05:41
Compare
Choose a tag to compare
  • Add logo to Flowman Shell
  • Fix name of config option flowman.execution.executor.class
  • Add new groupedAggregate mapping
  • Reimplement target ordering, configurable via flowman.execution.scheduler.class
  • Implement new assertions columns and expression

0.15.0

23 Mar 13:56
Compare
Choose a tag to compare
  • New configuration variable floman.default.target.rebalance
  • New configuration variable floman.default.target.parallelism
  • Changed behaviour: The mergeFile target now does not assume any more that the target is local. If you already
    use mergeFiles with a local file, you need to prefix the target file name with file://.
  • Add new -t argument for selectively building a subset of targets
  • Remove example-plugin
  • Add quickstart guide
  • Add new "flowman-parent" BOM for projects using Flowman
  • Move com.dimajix.flowman.annotations package to com.dimajix.flowman.spec.annotations
  • Add new log redaction
  • Integrate Scala scode coverage analysis
  • assemble will fail when trying to use non-existing columns
  • Move swagger and json schema support into separate plugins
  • Change default build to Spark 3.0 and Hadoop 3.2
  • Update Spark to 3.0.2
  • Rename class Executor to Execution - watch your plugins!
  • Implement new configurable Executor class for executing build targets.
  • Add build profile for Spark 3.1.x
  • Update ScalaTest to 3.2.5 - watch your unittests for changed ScalaTest API!
  • Add new case mapping
  • Add new --dry-run command line option
  • Add new mock and null mapping types
  • Add new mock relation
  • Add new values mapping
  • Add new values dataset
  • Implement new testing capabilities
  • Rename update mapping to upsert mapping, which better describes its functionality
  • Introduce new VALIDATE phase, which is executed even before CREATE phase
  • Implement new validate and verify targets
  • Implement new deptree command in Flowman shell