Skip to content

Flowman 1.0.0

Compare
Choose a tag to compare
@kupferk kupferk released this 28 Apr 05:43
· 196 commits to main since this release

Flowman Version 1.0.0 !

Flowman has proven to be robust and is used in production at multiple companies since several years. Time to officially celebrate its success with a version 1.0.0!

This is a huge and exciting release with many improvements. The main features are:

  • The Flowman 1.0.0 release. But making such a release is more work than one might expect.
  • New client/server Flowman shell to support accessing real data during development. This feature is still experimental.
  • Official support for Azure Synapse

Full list of changes

  • github-314: Move avro related functionality into separate plugin
  • github-307: Describe Flowmans security policy in SECURITY.md
  • github-315: Create build profile for CDP 7.1 with Spark 3.3
  • github-317: Perform retry on failing JDBC commands
  • github-318: Support mappings from different projects and with non-standard outputs in SQL
  • github-140: Strictly check imports
  • github-316: Beautify README.md
  • github-310: Explain versioning policy in CHANGELOG.md
  • github-313: Improve example for "observe" mapping
  • github-319: Support Oracle for History Server
  • github-320: Do not fall back to "inline" schema when no kind is specified
  • github-321: [BUG] Properly support lower case / upper case table names in Oracle
  • github-309: Automate integration tests
  • github-322: Remove flowman-client
  • github-324: Log environment variables for imported projects
  • github-329: Create Kernel API
  • github-330: Implement Kernel Server
  • github-331: Implement Kernel Client
  • github-332: Build Flowman Shell on top of kernel Client/Server
  • github-334: Create standalone Flowman Kernel application
  • github-338: Update Spark to 3.3.2
  • github-333: Forward Logs from Kernel to Client
  • github-339: Set Copyright to "The Flowman Authors"
  • github-345: [BUG] Loading an embedded schema inside a jar file should not throw an exception
  • github-346: Create build profile for Databricks
  • github-343: Log all client requests in kernel
  • github-342: Automatically close session when the client disconnects from kernel
  • github-351: [BUG] Failing execution listener instantiation should not fail a build
  • github-347: Exclude AWS SDK for Databricks and EMR build profiles
  • github-352: [BUG] Spark sessions should not contain duplicate jars from different plugins
  • github-353: Successful runs should not use System.exit(0)
  • github-354: Optionally load custom log4j config from jar
  • github-358: Provide different log4j config for Flowman server and kernel
  • github-359: Update jline dependency
  • github-357: Spark session should not be shut down in Databricks environment
  • github-360: Logging should exclude more Databricks specific stuff
  • github-361: Work around low-level API differences in DataBricks
  • github-363: HiveDatabaseTarget should accept an optional location
  • github-311: Create integration test for EMR
  • github-362: Upgrade EMR to 6.10
  • github-369: [BUG] Prevent endless loop in Kernel client, when getContext fails
  • github-370: The Kernel client should use temporary workspaces with automatic cleanup
  • github-337: Add documentation for flowman-rshell
  • github-336: Add documentation for flowman-kernel
  • github-366: Feature parity between Flowman shell and Flowman remote shell
  • github-365: Implement saving mappings in Flowman Kernel/client
  • github-367: Create integration test for "quickstart" archetype
  • github-375: [BUG] "project reload" does not work correctly in remote shell with nested directories
  • github-376: Document options to parallelize work
  • github-378: Remove travis-ci integration
  • github-308: Revise branching model
  • github-381: Remove json-smart dependency
  • github-382: [BUG] Parallel execution of multiple dq checks runs too many checks on Java 17
  • github-384: Improve documentation for using docker-compose
  • github-377: Load override config/env from .flowman-env.yml
  • github-344: Support .flowman-ignore file for Flowman Kernel client
  • github-385: Update Flowman tutorial
  • github-386: Create Integration Test for Azure Synapse
  • github-387: Remove scala-arm dependency
  • github-390: Rename "master" branch to "main"
  • github-392: [BUG] 'relation' mapping should support numeric partition values
  • github-393: Move Maven archetype to flowman-maven project
  • github-394: [BUG] The Spark job group and description are not set for sql assertions
  • github-395: Support optional file locations for project imports
  • github-397: Automate build using GitHub actions
  • github-403: Upgrade Spark 3.2 to 3.2.4
  • github-404: [BUG] Partition columns do not support Timestamp data type
  • github-409: [BUG] Fix build for AWS EMR 6.10 and Azure Synapse 3.3
  • github-407: Update Delta to 2.3.0 for Spark 3.3
  • github-406: Improve integration tests to automatically pick up the current Flowman version
  • github-408: Make use of DeltaLake in Synapse integration test
  • github-405: Document deployment to EMR and Azure Synapse

Breaking changes

This version introduces some (minor) breaking changes:

  • All Avro related functionality is now moved into the new "flowman-avro" plugin. If you rely on such functionality,
    you explicitly need to include the plugin in the default-namesapce.yml file.
  • Imports are now strictly checked. This means when you cross-reference some entity in your project which is provided
    by a different Flowman project, you now need to explicitly import the project in the project.yml
  • The kind for schema definitions is now a mandatory attribute, Flowman will not fall back to a inline schema anymore.