Skip to content

Releases: dimajix/flowman

0.21.2

24 Feb 16:53
Compare
Choose a tag to compare

Fix importing projects

0.21.1

24 Feb 12:33
Compare
Choose a tag to compare
  • flowexec now returns different exit codes depending on the processing result

0.21.0

26 Jan 14:48
Compare
Choose a tag to compare

This is a minor release with only few noticeable changes, but some internal refactorings.

  • Fix wrong dependencies in Swagger plugin
  • Implement basic schema inference for local CSV files
  • Implement new stack mapping
  • Improve error messages of local CSV parser

0.20.1

07 Jan 06:16
Compare
Choose a tag to compare
  • Implement detection of dependencies introduced by schema

0.20.0

05 Jan 16:32
Compare
Choose a tag to compare
  • Fix detection of Derby metastore to truncate comment lengths.
  • Add new config variable flowman.default.relation.input.columnMismatchPolicy (default is IGNORE)
  • Add new config variable flowman.default.relation.input.typeMismatchPolicy (default is IGNORE)
  • Add new config variable flowman.default.relation.output.columnMismatchPolicy (default is ADD_REMOVE_COLUMNS)
  • Add new config variable flowman.default.relation.output.typeMismatchPolicy (default is CAST_ALWAYS)
  • Improve handling of _SUCCESS files for detecting (non-)dirty directories
  • Implement new merge target
  • Implement merge operation for Delta relations
  • Implement merge operation for JDBC relations (only for some databases, i.e. MS SQL)
  • Add new config variable flowman.execution.target.useHistory (default is false)
  • Change the semantics of config variable flowman.execution.target.forceDirty (default is false)
  • Add new -d / --dirty option for explicitly marking individual targets as dirty

0.19.0

14 Dec 11:18
Compare
Choose a tag to compare
  • Add build profile for Hadoop 3.3
  • Add build profile for Spark 3.2
  • Allow SQL expressions as dimensions in aggregate mapping
  • Update Hive views when the resulting schema would change
  • Add new mapping cache command to FlowShell
  • Support embedded connection definitions
  • Much improved Flowman History Server
  • Fix wrong metric names with TemplateTarget
  • Implement more template types for connection, schema, dataset, assertion and measure
  • Implement new measure target for creating custom metrics for measuring data quality
  • Add new config option flowman.execution.mapping.parallelism

0.18.0

13 Oct 17:37
Compare
Choose a tag to compare
  • Improve automatic schema migration for Hive and JDBC relations
  • Improve support of CHAR(n) and VARCHAR(n) types. Those types will now be propagates to Hive with newer Spark versions
  • Support writing to dynamic partitions for file relations, Hive tables, JDBC relations and Delta tables
  • Fix the name of some config variables (floman.* => flowman.*)
  • Added new config variables flowman.default.relation.migrationPolicy and flowman.default.relation.migrationStrategy
  • Add plugin for supporting DeltaLake (https://delta.io), which provides deltaTable and deltaFile relation types
  • Fix non-deterministic column order in schema mapping, values mapping and values relation
  • Mark Hive dependencies has 'provided', which reduces the size of dist packages
  • Significantly reduce size of AWS dependencies in AWS plugin
  • Add new build profile for Cloudera CDP-7.1
  • Improve Spark configuration of LocalSparkSession and TestRunner
  • Update Spark 3.0 build profile to Spark 3.0.3
  • Upgrade Impala JDBC driver from 2.6.17.1020 to 2.6.23.1028
  • Upgrade MySQL JDBC driver from 8.0.20 to 8.0.25
  • Upgrade MariaDB JDBC driver from 2.2.4 to 2.7.3
  • Upgrade several Maven plugins to latest versions
  • Add new config option flowman.workaround.analyze_partition to workaround CDP 7.1 issues
  • Fix migrating Hive views to tables and vice-versa
  • Add new option "-j " to allow running multiple job instances in parallel
  • Add new option "-j " to allow running multiple tests in parallel
  • Add new uniqueKey assertion
  • Add new schema assertion
  • Update Swagger libraries for swagger schema
  • Implement new openapi plugin to support OpenAPI 3.0 schemas
  • Add new readHive mapping
  • Add new simpleReport and report hook
  • Implement new templates

0.17.1

18 Jun 09:20
Compare
Choose a tag to compare
  • Bump CDH version to 6.3.4
  • Fix scope of some dependencies
  • Update Spark to 3.1.2
  • Add new values relation

0.17.0

04 Jun 08:02
Compare
Choose a tag to compare
  • New Flowman Kernel and Flowman Studio application prototypes
  • New ParallelExecutor
  • Fix before/after dependencies in count target
  • Default build is now Spark 3.1 + Hadoop 3.2
  • Remove build profiles for Spark 2.3 and CDH 5.15
  • Add MS SQL Server plugin containing JDBC driver
  • Speed up file listing for file relations
  • Use Spark JobGroups
  • Better support running Flowman on Windows with appropriate batch scripts

0.16.0

26 Apr 05:41
Compare
Choose a tag to compare
  • Add logo to Flowman Shell
  • Fix name of config option flowman.execution.executor.class
  • Add new groupedAggregate mapping
  • Reimplement target ordering, configurable via flowman.execution.scheduler.class
  • Implement new assertions columns and expression