Skip to content

Releases: dimajix/flowman

Flowman 0.30.3

06 Aug 07:01
Compare
Choose a tag to compare
  • github-465: (backport) Fix parsing of Swagger schema files

0.30.2

29 Jul 05:49
Compare
Choose a tag to compare
  • github-496: updated build profile for CDP 7.1.9 platform version

Flowman 1.2.0

03 Apr 17:54
Compare
Choose a tag to compare

Version 1.2.0 - 2024-04-03

  • github-464: Upgrade Cloudera CDP 7.1 to Hotfix 16
  • github-465: Fix parsing of Swagger schema files
  • github-447: Support Spark 3.5.0
  • github-444: Remove Flowman Hub
  • github-443: Remove Flowman DSL
  • github-466: Use frontend-maven-plugin for all npm packages
  • github-467: Implement "target_records" metric for "copy" target
  • github-474: Remove Flowman Studio
  • github-480: Upgrade Spark to 3.5.1
  • github-481: Fix Sphinx documentation

Flowman 1.1.0

16 Oct 15:39
Compare
Choose a tag to compare

Version 1.1.0 - 2023-10-10

  • github-413: Support Azure Key Vault for retrieving secrets
  • github-415: Improve documentation for Velocity templating
  • github-361: Remove broken support for Databricks
  • github-402: Support Spark 3.4
  • github-417: Fix URL to flowman.io in all Maven modules
  • github-416: Support specifying multiple targets separated by commas on CLI
  • github-414: Support AWS Secrets Manager for retrieving secrets
  • github-419: Add a command line option to kernel server to listen on specific address
  • github-326: Truncate target graphs at sinks
  • github-410: Support relative paths in project imports
  • github-421: Provide new 'session' environment variable
  • github-422: Upgrade Spark to 3.4.1
  • github-423: Migrating a MariaDB/MySQL table from a text type to a numeric type fails
  • github-425: Support building and running Flowman with Java 17
  • github-388: Replace Akka http with Jersey/Jetty in Flowman History Server
  • github-385: Update Flowman Tutorial
  • github-412: Create tutorial for using Flowman Maven Plugin
  • github-371: Automate EMR integration test via AWS CLI
  • github-431: Update EMR build profil to 6.12
  • github-428: Move project selector into top bar in Flowman History Server
  • github-433: Add Trino JDBC driver as plugin
  • github-434: Use sshj instead of ganymed for sftp
  • github-435: Flowman should also load a project.yaml (with an 'a' in the extension)
  • github-436: Document minimum Maven version of Flowman-maven-plugin
  • github-438: Empty or non-existing module directories should not lead to an error
  • github-452: [BUG] SQL assertions do not support empty strings as expected values
  • github-450: Update Spark to 3.3.3
  • github-454: Fix scope of netty-codec
  • github-446: Fix deadlock when running targets in parallel

Breaking changes

This version introduces some (minor) breaking changes:

  • When providing sample records (for example, via the values mapping, or in the expected outcome of sql assertions),
    empty strings will be interpreted as empty strings. Older versions interpreted empty strings as SQL NULL values. In
    case you still need a NULL value, you can simply use the YAML null value.

Flowman 1.0.0

28 Apr 05:43
Compare
Choose a tag to compare

Flowman Version 1.0.0 !

Flowman has proven to be robust and is used in production at multiple companies since several years. Time to officially celebrate its success with a version 1.0.0!

This is a huge and exciting release with many improvements. The main features are:

  • The Flowman 1.0.0 release. But making such a release is more work than one might expect.
  • New client/server Flowman shell to support accessing real data during development. This feature is still experimental.
  • Official support for Azure Synapse

Full list of changes

  • github-314: Move avro related functionality into separate plugin
  • github-307: Describe Flowmans security policy in SECURITY.md
  • github-315: Create build profile for CDP 7.1 with Spark 3.3
  • github-317: Perform retry on failing JDBC commands
  • github-318: Support mappings from different projects and with non-standard outputs in SQL
  • github-140: Strictly check imports
  • github-316: Beautify README.md
  • github-310: Explain versioning policy in CHANGELOG.md
  • github-313: Improve example for "observe" mapping
  • github-319: Support Oracle for History Server
  • github-320: Do not fall back to "inline" schema when no kind is specified
  • github-321: [BUG] Properly support lower case / upper case table names in Oracle
  • github-309: Automate integration tests
  • github-322: Remove flowman-client
  • github-324: Log environment variables for imported projects
  • github-329: Create Kernel API
  • github-330: Implement Kernel Server
  • github-331: Implement Kernel Client
  • github-332: Build Flowman Shell on top of kernel Client/Server
  • github-334: Create standalone Flowman Kernel application
  • github-338: Update Spark to 3.3.2
  • github-333: Forward Logs from Kernel to Client
  • github-339: Set Copyright to "The Flowman Authors"
  • github-345: [BUG] Loading an embedded schema inside a jar file should not throw an exception
  • github-346: Create build profile for Databricks
  • github-343: Log all client requests in kernel
  • github-342: Automatically close session when the client disconnects from kernel
  • github-351: [BUG] Failing execution listener instantiation should not fail a build
  • github-347: Exclude AWS SDK for Databricks and EMR build profiles
  • github-352: [BUG] Spark sessions should not contain duplicate jars from different plugins
  • github-353: Successful runs should not use System.exit(0)
  • github-354: Optionally load custom log4j config from jar
  • github-358: Provide different log4j config for Flowman server and kernel
  • github-359: Update jline dependency
  • github-357: Spark session should not be shut down in Databricks environment
  • github-360: Logging should exclude more Databricks specific stuff
  • github-361: Work around low-level API differences in DataBricks
  • github-363: HiveDatabaseTarget should accept an optional location
  • github-311: Create integration test for EMR
  • github-362: Upgrade EMR to 6.10
  • github-369: [BUG] Prevent endless loop in Kernel client, when getContext fails
  • github-370: The Kernel client should use temporary workspaces with automatic cleanup
  • github-337: Add documentation for flowman-rshell
  • github-336: Add documentation for flowman-kernel
  • github-366: Feature parity between Flowman shell and Flowman remote shell
  • github-365: Implement saving mappings in Flowman Kernel/client
  • github-367: Create integration test for "quickstart" archetype
  • github-375: [BUG] "project reload" does not work correctly in remote shell with nested directories
  • github-376: Document options to parallelize work
  • github-378: Remove travis-ci integration
  • github-308: Revise branching model
  • github-381: Remove json-smart dependency
  • github-382: [BUG] Parallel execution of multiple dq checks runs too many checks on Java 17
  • github-384: Improve documentation for using docker-compose
  • github-377: Load override config/env from .flowman-env.yml
  • github-344: Support .flowman-ignore file for Flowman Kernel client
  • github-385: Update Flowman tutorial
  • github-386: Create Integration Test for Azure Synapse
  • github-387: Remove scala-arm dependency
  • github-390: Rename "master" branch to "main"
  • github-392: [BUG] 'relation' mapping should support numeric partition values
  • github-393: Move Maven archetype to flowman-maven project
  • github-394: [BUG] The Spark job group and description are not set for sql assertions
  • github-395: Support optional file locations for project imports
  • github-397: Automate build using GitHub actions
  • github-403: Upgrade Spark 3.2 to 3.2.4
  • github-404: [BUG] Partition columns do not support Timestamp data type
  • github-409: [BUG] Fix build for AWS EMR 6.10 and Azure Synapse 3.3
  • github-407: Update Delta to 2.3.0 for Spark 3.3
  • github-406: Improve integration tests to automatically pick up the current Flowman version
  • github-408: Make use of DeltaLake in Synapse integration test
  • github-405: Document deployment to EMR and Azure Synapse

Breaking changes

This version introduces some (minor) breaking changes:

  • All Avro related functionality is now moved into the new "flowman-avro" plugin. If you rely on such functionality,
    you explicitly need to include the plugin in the default-namesapce.yml file.
  • Imports are now strictly checked. This means when you cross-reference some entity in your project which is provided
    by a different Flowman project, you now need to explicitly import the project in the project.yml
  • The kind for schema definitions is now a mandatory attribute, Flowman will not fall back to a inline schema anymore.

Flowman 0.30.1

12 Apr 13:30
Compare
Choose a tag to compare

This is a maintenance release of the 0.30 version with two bug fixes:

  • github-379: [BUG] Parallel execution of multiple targets runs too many targets on Java 17
  • github-383: Flowman should preserve target ordering

Flowman 0.30.0

04 Jan 06:10
Compare
Choose a tag to compare

This release has a strong focus on improving deploying Flowman projects has a self-contained fat jar. In addition a new "executions" section in the job entity will give you better control of what target is executed in which phase, especially when executing parameter ranges (for example to process multiple days in a row).

  • github-278: Parallelize execution of data quality checks. This also introduces a new configuration property
    flowman.execution.check.parallelism (default 1)
  • github-282: Improve implementation for counting records
  • github-288: Support reading local CSV files from fatjar
  • github-290: Simplify specifying project name in fatjar
  • github-291: Simplify create/destroy Relation interface
  • github-292: Upgrade AWS EMR to 6.9
  • github-289: Color log output via log4j configuration (requires log4j 2.x)
  • Bump postgresql from 42.4.1 to 42.4.3 in /flowman-plugins/postgresql
  • Bump loader-utils from 1.4.0 to 1.4.2
  • Bump json5 from 2.2.1 to 2.2.3
  • github-293: [BUG] Fatal exceptions in parallel mapping instantiation cause deadlock
  • github-273: Support projects contained in (fat) jar files
  • github-294: [BUG] Parallel execution should not execute more targets after errors
  • github-295: Create build profile for CDP 7.1 with Spark 3.2
  • github-296: Update npm dependencies (vuetify & co)
  • github-297: Parametrize when to execute a specific phase
  • github-299: Move migrationPolicy and migrationStrategy from target into relation
  • github-115: Implement additional build policy in relation target for forcing dirty. This also introduces a new
    configuration property flowman.default.target.buildPolicy (default COMPAT).
  • github-298: Support fine-grained control when to execute each target of a job
  • github-300: Implement new 'observe' mapping
  • github-301: Upgrade Spark to 3.2.3
  • github-302: Upgrade DeltaLake to 2.2.0
  • github-303: Use multi-stage build for Docker image
  • github-304: Upgrade Cloudera profile to CDP 7.1.8
  • github-312: Fix build with Spark 2.4 and Maven 3.8

This version is fully backwards compatible until and including version 0.27.0.

Flowman 0.29.0

09 Nov 15:49
Compare
Choose a tag to compare
  • github-260: Remove hive-storage-api from several plugins and lib
  • github-261: Add descriptions to all pom.xml
  • github-262: Verification of "relation" targets should only check existence
  • github-263: Add filter condition to data quality checks in documentation
  • github-265: Make JDBC dialects pluggable
  • github-264: Provide "jars" for all plugins
  • github-267: Add new flowman-spark-dependencies module to simplify dependency management
  • github-269: Implement new 'iterativeSql' mapping
  • github-270: Upgrade Spark to 3.3.1
  • github-271: Upgrade Delta to 2.1.1
  • github-272: Create build profile for AWS EMR 6.8.0
  • github-273: Refactor file abstraction
  • github-274: Print Flowman configuration to console

Flowman 0.28.0

07 Oct 16:51
Compare
Choose a tag to compare

Again, this release contains many smaller improvements for working with JDBC databases. Moreover, a new Maven archetype has been added for simplifying creating new Flowman projects.

  • Improve support for MariaDB / MySQL as data sinks
  • github-245: Bump ejs, @vue/cli-plugin-babel, @vue/cli-plugin-eslint and @vue/cli-service in /flowman-studio-ui
  • github-246: Bump ejs, @vue/cli-plugin-babel, @vue/cli-plugin-eslint and @vue/cli-service in /flowman-server-ui
  • github-247: Automatically generate YAML schemas as part of build process
  • github-248: Bump scss-tokenizer and node-sass in /flowman-server-u
  • github-249: Add new options -X and -XX to increase logging
  • github-251: Support for log4j2 Configuration
  • github-252: Move sftp target into separate plugin
  • github-253: SQL Server relation should support explicit staging table
  • github-254: Use DATETIME2 for timestamps in MS SQL Server
  • github-256: Provide Maven archetype for simple Flowman projects
  • github-258: Support clustered indexes in MS SQL Server

Flowman 0.27.0

09 Sep 07:46
Compare
Choose a tag to compare

The highlights of this release are

  • New ' jdbcCommand' target for executing arbitrary SQL statements for JDBC sinks
  • Support direct SQL statements in JDBC relations for creating tables
  • Upgrade Delta Lake to 2.0.0/2.1.0 (for Spark 3.2 and 3.3 respectively)
  • Better error messages
  • Bug fixes and other improvements

In detail, this release contains the following changes:

  • github-232: [BUG] Column descriptions should be propagates in UNIONs
  • github-233: [BUG] Missing Hadoop dependencies for S3, Delta, etc
  • github-235: Implement new rest hook with fine control
  • github-229: A build target should not fail if Impala "COMPUTE STATS" fails
  • github-236: 'copy' target should not apply output schema
  • github-237: jdbcQuery relation should use fields "sql" and "file" instead of "query"
  • github-239: Allow optional SQL statement for creating jdbcTable
  • github-238: Implement new 'jdbcCommand' target
  • github-240: [BUG] Data quality checks in documentation should not fail on NULL values
  • github-241: Throw an error on duplicate entity definitions
  • github-220: Upgrade Delta-Lake to 2.0 / 2.1
  • github-242: Switch to Spark 3.3 as default
  • github-243: Use alternative Spark MS SQL Connector for Spark 3.3
  • github-244: Generate project HTML documentation with optional external CSS file