Skip to content

Releases: dimajix/flowman

Flowman 0.27.0

09 Sep 07:46
Compare
Choose a tag to compare

The highlights of this release are

  • New ' jdbcCommand' target for executing arbitrary SQL statements for JDBC sinks
  • Support direct SQL statements in JDBC relations for creating tables
  • Upgrade Delta Lake to 2.0.0/2.1.0 (for Spark 3.2 and 3.3 respectively)
  • Better error messages
  • Bug fixes and other improvements

In detail, this release contains the following changes:

  • github-232: [BUG] Column descriptions should be propagates in UNIONs
  • github-233: [BUG] Missing Hadoop dependencies for S3, Delta, etc
  • github-235: Implement new rest hook with fine control
  • github-229: A build target should not fail if Impala "COMPUTE STATS" fails
  • github-236: 'copy' target should not apply output schema
  • github-237: jdbcQuery relation should use fields "sql" and "file" instead of "query"
  • github-239: Allow optional SQL statement for creating jdbcTable
  • github-238: Implement new 'jdbcCommand' target
  • github-240: [BUG] Data quality checks in documentation should not fail on NULL values
  • github-241: Throw an error on duplicate entity definitions
  • github-220: Upgrade Delta-Lake to 2.0 / 2.1
  • github-242: Switch to Spark 3.3 as default
  • github-243: Use alternative Spark MS SQL Connector for Spark 3.3
  • github-244: Generate project HTML documentation with optional external CSS file

0.26.1 (minor release with reduced prebuilt dists)

03 Aug 07:17
Compare
Choose a tag to compare

This is a minor release addressing some specific issues:

Detailed Changes

  • github-226: Upgrade to Spark 3.2.2
  • github-227: [BUG] Flowman should not fail with field names containing "-", "/" etc
  • github-228: Padding and truncation of CHAR(n)/VARCHAR(n) should be configurable

Since this is only a minor release with a limited impact, please find more prebuilt variants in the 0.26.0 relase

0.26.0

27 Jul 18:52
Compare
Choose a tag to compare

Version 0.26.0 of Flowman is another high quality release with a strong focus on improving the work with JDBC targets like Postgres, MariaDB, MS SQL Server, Oracle and more. Also new is the support for Spark 3.3, although still not as battle-proven as Spark 3.2. Moreover many smaller bugs and improvements have been fixed.

Detailed Changes

  • github-202: Add support for Spark 3.3
  • github-203: [BUG] Resource dependencies for Hive should be case-insensitive
  • github-204: [BUG] Detect indirect dependencies in a chain of Hive views
  • github-207: [BUG] Build should not directly fail if inferring dirty status fails
  • github-209: [BUG] HiveViews should not trigger cascaded refresh during CREATE phase even when nothing is changed
  • github-211: Implement new hiveQuery relation
  • github-210: [BUG] HiveTables should be migrated if partition columns change
  • github-208: Implement JDBC hook for database based semaphores
  • github-212: [BUG] Hive views should not be migrated in RELAXED mode if only comments have changed
  • github-214: Update ImpalaJDBC driver to 2.6.26.1031
  • github-144: Support changing primary key for JDBC relations
  • github-216: [BUG] Floats should be represented as FLOAT and not REAL in MySQL/MariaDB
  • github-217: Support collations for creating/migrating JDBC tables
  • github-218: [BUG] Postgres dialect should be used for Postgres JDBC URLs
  • github-219: [BUG] SchemaMapping should retain incoming comments
  • github-215: Support COLUMN STORE INDEX for MS SQL Server
  • github-182: Support column descriptions in JDBC relations (SQL Server / Azure SQL)
  • github-224: Support column descriptions for MariaDB / MySQL databases
  • github-223: Support column descriptions for Postgres database
  • github-205: Initial support Oracle DB via JDBC
  • github-225: [BUG] Staging schema should not have comments

Breaking changes

We take backward compatibility very seriously. But sometimes a breaking change is needed to clean up code and to
enable new features. This release contains some breaking changes, which are annoying but simple to fix.
In order to respect null as keyword in YAML with a special semantics, some entities needed to be renamed, as
described in the following table:

category old kind new kind
mapping null empty
relation null empty
target null empty
store null none
history null none

0.25.1 (Source only release)

15 Jun 16:23
Compare
Choose a tag to compare

This is minor bugfix release

Detailed Changes

  • github-195: [BUG] Metric "target_records" is not reset correctly after an execution phase is finished
  • github-197: [BUG] Impala REFRESH METADATA should not fail when dropping views

0.25.0

31 May 14:22
Compare
Choose a tag to compare
  • github-184: Only read in *.yml / *.yaml files in module loader
  • github-183: Support storing SQL in external file in hiveView
  • github-185: Missing _SUCCESS file when writing to dynamic partitions
  • github-186: Support output mode OVERWRITE_DYNAMIC for Delta relation
  • github-149: Support creating views in JDBC with new jdbcView relation
  • github-190: Replace logo in documentation
  • github-188: Log detailed timing information when writing to JDBC relation
  • github-191: Add user provided description to quality checks
  • github-192: Provide example queries for JDBC metric sink

0.24.1

29 Apr 14:05
Compare
Choose a tag to compare
  • github-175: '--jobs' parameter starts way to many parallel jobs
  • github-176: start-/end-date in report should not be the same
  • github-177: Implement generic SQL schema check
  • github-179: Update DeltaLake dependency to 1.2.1

0.24.0

05 Apr 15:48
Compare
Choose a tag to compare
  • github-168: Support optional filters in data quality checks
  • github-169: Support sub-queries in filter conditions
  • github-171: Parallelize loading of project files
  • github-172: Update CDP7 profile to the latest patch level
  • github-153: Use non-privileged user in Docker image
  • github-174: Provide application for generating YAML schema

Breaking changes

We take backward compatibility very seriously. But sometimes a breaking change is needed to clean up code and to
enable new features. This release contains some breaking changes, which are annoying but simple to fix.
In order to avoid YAML schema inconsistencies, some entities needed to be renamed, as described in the following
table:

category old kind new kind
mapping const values
mapping empty null
mapping read relation
mapping readRelation relation
mapping readStream stream
relation const values
relation empty null
relation jdbc jdbcTable, jdbcQuery
relation table hiveTable
relation view hiveView
schema embedded inline

0.23.1

29 Mar 04:52
Compare
Choose a tag to compare
  • github-154: Fix failing migration when PK requires change due to data type
  • github-156: Recreate indexes when data type of column changes
  • github-155: Project level configs are used outside job
  • github-157: Fix UPSERT operations for SQL Server
  • github-158: Improve non-nullability of primary key column
  • github-160: Use sensible defaults for default documenter
  • github-161: Improve schema caching during execution
  • github-162: ExpressionColumnCheck does not work when results contain NULL values
  • github-163: Implement new column length quality check

0.23.0

18 Mar 17:12
Compare
Choose a tag to compare

The main feature of this version is a significant improvement of the new documentation system, which now also includes column level lineage. The automatically generated documentation is a valuable artifact for both developers and business experts to improve the understanding of the data models and transformations. Flowman projects can also specify quality checks (like NOT NULL condition, foreign key relationships or arbitrary SQL expressions), which are not only included in the documentation but also executed on the real data.

Moreover support for SQL databases has been improved again with the introduction of temporary staging tables to perform updates within a transactional commit.

Detailed Changes

  • github-148: Support staging table for all JDBC relations
  • github-120: Use staging tables for UPSERT and MERGE operations in JDBC relations
  • github-147: Add support for PostgreSQL
  • github-151: Implement column level lineage in documentation
  • github-121: Correctly apply documentation, before/after and other common attributes to templates
  • github-152: Implement new 'cast' mapping

0.22.0

01 Mar 15:01
Compare
Choose a tag to compare
  • Add new sqlserver relation
  • Implement new documentation subsystem
  • Change default build to Spark 3.2.1 and Hadoop 3.3.1
  • Add new drop target for removing tables
  • Speed up project loading by reusing Jackson mapper
  • Implement new jdbc metric sink
  • Implement schema cache in Executor to speed up documentation and similar tasks
  • Add new config variables flowman.execution.mapping.schemaCache and flowman.execution.relation.schemaCache
  • Add new config variable flowman.default.target.verifyPolicy to ignore empty tables during VERIFY phase
  • Implement initial support for indexes in JDBC relations