Releases: dimajix/flowman
Flowman 0.27.0
The highlights of this release are
- New ' jdbcCommand' target for executing arbitrary SQL statements for JDBC sinks
- Support direct SQL statements in JDBC relations for creating tables
- Upgrade Delta Lake to 2.0.0/2.1.0 (for Spark 3.2 and 3.3 respectively)
- Better error messages
- Bug fixes and other improvements
In detail, this release contains the following changes:
- github-232: [BUG] Column descriptions should be propagates in UNIONs
- github-233: [BUG] Missing Hadoop dependencies for S3, Delta, etc
- github-235: Implement new
rest
hook with fine control - github-229: A build target should not fail if Impala "COMPUTE STATS" fails
- github-236: 'copy' target should not apply output schema
- github-237: jdbcQuery relation should use fields "sql" and "file" instead of "query"
- github-239: Allow optional SQL statement for creating jdbcTable
- github-238: Implement new 'jdbcCommand' target
- github-240: [BUG] Data quality checks in documentation should not fail on NULL values
- github-241: Throw an error on duplicate entity definitions
- github-220: Upgrade Delta-Lake to 2.0 / 2.1
- github-242: Switch to Spark 3.3 as default
- github-243: Use alternative Spark MS SQL Connector for Spark 3.3
- github-244: Generate project HTML documentation with optional external CSS file
0.26.1 (minor release with reduced prebuilt dists)
This is a minor release addressing some specific issues:
Detailed Changes
- github-226: Upgrade to Spark 3.2.2
- github-227: [BUG] Flowman should not fail with field names containing "-", "/" etc
- github-228: Padding and truncation of CHAR(n)/VARCHAR(n) should be configurable
Since this is only a minor release with a limited impact, please find more prebuilt variants in the 0.26.0 relase
0.26.0
Version 0.26.0 of Flowman is another high quality release with a strong focus on improving the work with JDBC targets like Postgres, MariaDB, MS SQL Server, Oracle and more. Also new is the support for Spark 3.3, although still not as battle-proven as Spark 3.2. Moreover many smaller bugs and improvements have been fixed.
Detailed Changes
- github-202: Add support for Spark 3.3
- github-203: [BUG] Resource dependencies for Hive should be case-insensitive
- github-204: [BUG] Detect indirect dependencies in a chain of Hive views
- github-207: [BUG] Build should not directly fail if inferring dirty status fails
- github-209: [BUG] HiveViews should not trigger cascaded refresh during CREATE phase even when nothing is changed
- github-211: Implement new hiveQuery relation
- github-210: [BUG] HiveTables should be migrated if partition columns change
- github-208: Implement JDBC hook for database based semaphores
- github-212: [BUG] Hive views should not be migrated in RELAXED mode if only comments have changed
- github-214: Update ImpalaJDBC driver to 2.6.26.1031
- github-144: Support changing primary key for JDBC relations
- github-216: [BUG] Floats should be represented as FLOAT and not REAL in MySQL/MariaDB
- github-217: Support collations for creating/migrating JDBC tables
- github-218: [BUG] Postgres dialect should be used for Postgres JDBC URLs
- github-219: [BUG] SchemaMapping should retain incoming comments
- github-215: Support COLUMN STORE INDEX for MS SQL Server
- github-182: Support column descriptions in JDBC relations (SQL Server / Azure SQL)
- github-224: Support column descriptions for MariaDB / MySQL databases
- github-223: Support column descriptions for Postgres database
- github-205: Initial support Oracle DB via JDBC
- github-225: [BUG] Staging schema should not have comments
Breaking changes
We take backward compatibility very seriously. But sometimes a breaking change is needed to clean up code and to
enable new features. This release contains some breaking changes, which are annoying but simple to fix.
In order to respect null
as keyword in YAML with a special semantics, some entities needed to be renamed, as
described in the following table:
category | old kind | new kind |
---|---|---|
mapping | null | empty |
relation | null | empty |
target | null | empty |
store | null | none |
history | null | none |
0.25.1 (Source only release)
This is minor bugfix release
Detailed Changes
- github-195: [BUG] Metric "target_records" is not reset correctly after an execution phase is finished
- github-197: [BUG] Impala REFRESH METADATA should not fail when dropping views
0.25.0
- github-184: Only read in *.yml / *.yaml files in module loader
- github-183: Support storing SQL in external file in
hiveView
- github-185: Missing _SUCCESS file when writing to dynamic partitions
- github-186: Support output mode
OVERWRITE_DYNAMIC
for Delta relation - github-149: Support creating views in JDBC with new
jdbcView
relation - github-190: Replace logo in documentation
- github-188: Log detailed timing information when writing to JDBC relation
- github-191: Add user provided description to quality checks
- github-192: Provide example queries for JDBC metric sink
0.24.1
0.24.0
- github-168: Support optional filters in data quality checks
- github-169: Support sub-queries in filter conditions
- github-171: Parallelize loading of project files
- github-172: Update CDP7 profile to the latest patch level
- github-153: Use non-privileged user in Docker image
- github-174: Provide application for generating YAML schema
Breaking changes
We take backward compatibility very seriously. But sometimes a breaking change is needed to clean up code and to
enable new features. This release contains some breaking changes, which are annoying but simple to fix.
In order to avoid YAML schema inconsistencies, some entities needed to be renamed, as described in the following
table:
category | old kind | new kind |
---|---|---|
mapping | const | values |
mapping | empty | null |
mapping | read | relation |
mapping | readRelation | relation |
mapping | readStream | stream |
relation | const | values |
relation | empty | null |
relation | jdbc | jdbcTable, jdbcQuery |
relation | table | hiveTable |
relation | view | hiveView |
schema | embedded | inline |
0.23.1
- github-154: Fix failing migration when PK requires change due to data type
- github-156: Recreate indexes when data type of column changes
- github-155: Project level configs are used outside job
- github-157: Fix UPSERT operations for SQL Server
- github-158: Improve non-nullability of primary key column
- github-160: Use sensible defaults for default documenter
- github-161: Improve schema caching during execution
- github-162: ExpressionColumnCheck does not work when results contain NULL values
- github-163: Implement new column length quality check
0.23.0
The main feature of this version is a significant improvement of the new documentation system, which now also includes column level lineage. The automatically generated documentation is a valuable artifact for both developers and business experts to improve the understanding of the data models and transformations. Flowman projects can also specify quality checks (like NOT NULL condition, foreign key relationships or arbitrary SQL expressions), which are not only included in the documentation but also executed on the real data.
Moreover support for SQL databases has been improved again with the introduction of temporary staging tables to perform updates within a transactional commit.
Detailed Changes
- github-148: Support staging table for all JDBC relations
- github-120: Use staging tables for UPSERT and MERGE operations in JDBC relations
- github-147: Add support for PostgreSQL
- github-151: Implement column level lineage in documentation
- github-121: Correctly apply documentation, before/after and other common attributes to templates
- github-152: Implement new 'cast' mapping
0.22.0
- Add new
sqlserver
relation - Implement new documentation subsystem
- Change default build to Spark 3.2.1 and Hadoop 3.3.1
- Add new
drop
target for removing tables - Speed up project loading by reusing Jackson mapper
- Implement new
jdbc
metric sink - Implement schema cache in Executor to speed up documentation and similar tasks
- Add new config variables
flowman.execution.mapping.schemaCache
andflowman.execution.relation.schemaCache
- Add new config variable
flowman.default.target.verifyPolicy
to ignore empty tables during VERIFY phase - Implement initial support for indexes in JDBC relations