From a4a45b7cceaaf07a635a4363d8de06dba88aeeef Mon Sep 17 00:00:00 2001 From: Willy Lulciuc Date: Tue, 19 Sep 2023 16:14:50 -0400 Subject: [PATCH 1/5] Update mqz slack link (#2616) * Update mqz slack link Signed-off-by: wslulciuc * continued: Update mqz slack link Signed-off-by: wslulciuc --------- Signed-off-by: wslulciuc --- CODE_QUALITY_AND_SECURITY.md | 2 +- CONTRIBUTING.md | 6 +++--- GOVERNANCE.md | 2 +- README.md | 4 ++-- docs/index.md | 2 +- docs/quickstart.md | 2 +- examples/airflow/airflow.md | 2 +- proposals/README.md | 2 +- 8 files changed, 11 insertions(+), 11 deletions(-) diff --git a/CODE_QUALITY_AND_SECURITY.md b/CODE_QUALITY_AND_SECURITY.md index aa0a98f4b1..e28331ed99 100644 --- a/CODE_QUALITY_AND_SECURITY.md +++ b/CODE_QUALITY_AND_SECURITY.md @@ -26,7 +26,7 @@ The specific security and analysis methodologies that we employ include but are For more information about our approach to quality and security, feel free to reach out to the Marquez development team: -- Slack: [Marquezproject.slack.com](http://bit.ly/MarquezSlack) +- Slack: [Marquezproject.slack.com](http://bit.ly/Marquez_invite) - Twitter: [@MarquezProject](https://twitter.com/MarquezProject) ---- diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d2bd188543..353e333abd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -3,14 +3,14 @@ We're excited you're interested in contributing to Marquez! We'd love your help, and there are plenty of ways to contribute: * Give the repo a star -* Join our [slack](http://bit.ly/MqzSlack) channel and leave us feedback or help with answering questions from the community +* Join our [slack](http://bit.ly/Marquez_invite) channel and leave us feedback or help with answering questions from the community * Fix or [report](https://github.com/MarquezProject/marquez/issues/new) a bug * Fix or improve documentation * For newcomers, pick up a ["good first issue"](https://github.com/MarquezProject/marquez/labels/good%20first%20issue), then send a pull request our way (see the [resources](#resources) section below for helpful links to get started) -We feel that a welcoming community is important and we ask that you follow the [Contributor Covenant Code of Conduct](https://github.com/MarquezProject/marquez/blob/main/CODE_OF_CONDUCT.md) in all interactions with the community. +We feel that a welcoming community is important and we ask that you follow the [Contributor Covenant Code of Conduct](https://github.com/MarquezProject/marquez/blob/main/CODE_OF_CONDUCT.md) in all interactions with the community. -If you’re interested in using or learning more about Marquez, reach out to us on our [slack](http://bit.ly/MqzSlack) channel and follow [@MarquezProject](https://twitter.com/MarquezProject) for updates. We also encourage new comers to [join](https://lists.lfaidata.foundation/g/marquez-technical-discuss/ics/invite.ics?repeatid=32038) our monthly community meeting! +If you’re interested in using or learning more about Marquez, reach out to us on our [slack](http://bit.ly/Marquez_invite) channel and follow [@MarquezProject](https://twitter.com/MarquezProject) for updates. We also encourage new comers to [join](https://lists.lfaidata.foundation/g/marquez-technical-discuss/ics/invite.ics?repeatid=32038) our monthly community meeting! # Getting Your Changes Approved diff --git a/GOVERNANCE.md b/GOVERNANCE.md index 6385713000..84919f07b5 100644 --- a/GOVERNANCE.md +++ b/GOVERNANCE.md @@ -100,7 +100,7 @@ Or a meeting may be at an organization's offices that are required to maintain a ## Marquez on Slack -Marquez uses [a Slack community](http://bit.ly/MarquezSlack) to provide an ongoing dialogue between members. +Marquez uses [a Slack community](https://bit.ly/Marquez_invite) to provide an ongoing dialogue between members. This creates a recorded discussion of design decisions and discussions that complement the project meetings. Follow the link above and register with the Slack service using your email address. diff --git a/README.md b/README.md index 4e68a53165..9fa773c1b1 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ Marquez is an open source **metadata service** for the **collection**, **aggrega [![CircleCI](https://circleci.com/gh/MarquezProject/marquez/tree/main.svg?style=shield)](https://circleci.com/gh/MarquezProject/marquez/tree/main) [![codecov](https://codecov.io/gh/MarquezProject/marquez/branch/main/graph/badge.svg)](https://codecov.io/gh/MarquezProject/marquez/branch/main) [![status](https://img.shields.io/badge/status-active-brightgreen.svg)](#status) -[![Slack](https://img.shields.io/badge/slack-chat-blue.svg)](http://bit.ly/MqzSlack) +[![Slack](https://img.shields.io/badge/slack-chat-blue.svg)](https://bit.ly/Marquez_invite) [![license](https://img.shields.io/badge/license-Apache_2.0-blue.svg)](https://raw.githubusercontent.com/MarquezProject/marquez/main/LICENSE) [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](CODE_OF_CONDUCT.md) [![maven](https://img.shields.io/maven-central/v/io.github.marquezproject/marquez-api.svg)](https://search.maven.org/search?q=g:io.github.marquezproject) @@ -160,7 +160,7 @@ Marquez listens on port `8080` for all API calls and port `8081` for the admin i * Website: https://marquezproject.ai * Source: https://github.com/MarquezProject/marquez -* Chat: [MarquezProject Slack](https://bit.ly/MqzSlackInvite) +* Chat: [MarquezProject Slack](https://bit.ly/Marquez_invite) * Twitter: [@MarquezProject](https://twitter.com/MarquezProject) ## Contributing diff --git a/docs/index.md b/docs/index.md index 6841722fc3..948ac52557 100644 --- a/docs/index.md +++ b/docs/index.md @@ -95,7 +95,7 @@ We're excited you're interested in contributing to Marquez! We'd love your help, We feel that a welcoming community is important and we ask that you follow the [Contributor Covenant Code of Conduct](https://github.com/MarquezProject/marquez/blob/main/CODE_OF_CONDUCT.md) in all interactions with the community. -If you’re interested in using or learning more about Marquez, reach out to us on our [slack](http://bit.ly/MarquezSlack) channel and follow [@MarquezProject](https://twitter.com/MarquezProject) for updates. We also encourage new comers to [join](https://lists.lfaidata.foundation/g/marquez-technical-discuss/ics/invite.ics?repeatid=32038) our monthly community meeting! +If you’re interested in using or learning more about Marquez, reach out to us on our [slack](http://bit.ly/Marquez_invite) channel and follow [@MarquezProject](https://twitter.com/MarquezProject) for updates. We also encourage new comers to [join](https://lists.lfaidata.foundation/g/marquez-technical-discuss/ics/invite.ics?repeatid=32038) our monthly community meeting! ## Marquez Talks diff --git a/docs/quickstart.md b/docs/quickstart.md index 510945623b..4f2a4f89aa 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -111,7 +111,7 @@ In this simple example, we showed you how to write sample lineage metadata to a ## Feedback -What did you think of this guide? You can reach out to us on [slack](http://bit.ly/MarquezSlack) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions! +What did you think of this guide? You can reach out to us on [slack](http://bit.ly/Marquez_invite) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions! ---- SPDX-License-Identifier: Apache-2.0 diff --git a/examples/airflow/airflow.md b/examples/airflow/airflow.md index 7506077134..f6c6675313 100644 --- a/examples/airflow/airflow.md +++ b/examples/airflow/airflow.md @@ -309,4 +309,4 @@ _Congrats_! You successfully step through a troubleshooting scenario of a failin # Feedback -What did you think of this example? You can reach out to us on [slack](http://bit.ly/MarquezSlack) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions! +What did you think of this example? You can reach out to us on [slack](http://bit.ly/Marquez_invite) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions! diff --git a/proposals/README.md b/proposals/README.md index 5e104bdac0..52dd5aced9 100644 --- a/proposals/README.md +++ b/proposals/README.md @@ -20,7 +20,7 @@ Once your proposal has been _`accepted`_, and has been associated with a milesto ## Questions? -If you need help with the proposal process, please reach out to us on our [slack](http://bit.ly/MarquezSlack) channel. +If you need help with the proposal process, please reach out to us on our [slack](http://bit.ly/Marquez_invite) channel. ---- SPDX-License-Identifier: Apache-2.0 From eab9f178e21767d69ca6736bd3c8815f6f29fb4e Mon Sep 17 00:00:00 2001 From: Willy Lulciuc Date: Wed, 20 Sep 2023 16:36:28 -0400 Subject: [PATCH 2/5] Redirect website to `marquezproject.ai` (#2618) Signed-off-by: wslulciuc --- docs/index.md | 108 +++----------------------------------------------- 1 file changed, 5 insertions(+), 103 deletions(-) diff --git a/docs/index.md b/docs/index.md index 948ac52557..9eab575c64 100644 --- a/docs/index.md +++ b/docs/index.md @@ -2,109 +2,11 @@ layout: index --- -## Overview - -Marquez is an open source **metadata service** for the **collection**, **aggregation**, and **visualization** of a data ecosystem's metadata. It maintains the [provenance](https://en.wikipedia.org/wiki/Provenance#Data_provenance) of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more. Marquez was released and open sourced by [WeWork](https://www.wework.com). - -#### FEATURES - -* A reference implementation of the [OpenLineage](https://openlineage.io) standard -* Centralized [metadata management](https://en.wikipedia.org/wiki/Metadata_management) powering: - * Data lineage - * [Data governance](https://en.wikipedia.org/wiki/Data_governance) - * Data health - * Data discovery **+** exploration -* Precise and highly dimensional [data model](#data-model) - * Datasets - * Jobs - * Runs -* Easily collect metadata as [OpenLineage](https://openlineage.io) events via the [LineageAPI](https://marquezproject.github.io/marquez/openapi.html#tag/Lineage/paths/~1lineage/post) -* **Datasets** as first-class values -* **Enforcement** of _job_ and _dataset_ ownership -* Simple operation and design with minimal dependencies -* [RESTful API](./openapi.html) enabling sophisticated integrations with other systems: - * [Airflow](https://airflow.apache.org) - * [Amundsen](https://www.amundsen.io) - * [dbt](https://www.getdbt.com) - * [Spark](https://spark.apache.org/docs/latest/index.html) -* Designed to promote a **healthy** data ecosystem where teams within an organization can seamlessly _share_ and _safely_ depend on one another's datasets with confidence - -## Why Marquez? - -Marquez enables highly flexible [data lineage](https://en.wikipedia.org/wiki/Data_lineage) queries across _all datasets_, while reliably and efficiently associating (_upstream_, _downstream_) dependencies between jobs and the datasets they produce and consume. - -
- -
- -## Why manage and utilize metadata? - -
- -
- -## Design - -Marquez is a modular system and has been designed as a highly scalable, highly extensible platform-agnostic solution for metadata management. It consists of the following system components: - -* **Metadata Repository**: Stores all job and dataset metadata, including a complete history of job runs and job-level statistics (i.e. total runs, average runtimes, success/failures, etc). -* **Metadata API**: RESTful API enabling a diverse set of clients to begin interacting with metadata around dataset production and consumption. -* **Metadata UI**: Used for dataset discovery, connecting multiple datasets and exploring their dependency graph. - -
- -
- -
- -To ease adoption and enable a diverse set of data processing applications to build metadata collection as a core requirement into their design, Marquez implements the OpenLineage [specification](https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.yml). OpenLineage provides support for [Java](https://github.com/OpenLineage/OpenLineage/tree/main/client/java) and [Python](https://github.com/OpenLineage/OpenLineage/tree/main/client/python) as well as many [integrations](https://openlineage.io/integration). - -The Metadata API is an abstraction for recording information around the production and consumption of datasets. It's a low-latency, highly-available stateless layer responsible for encapsulating both metadata persistence and aggregation of lineage information. The API allows clients to collect and/or obtain dataset information to/from the [Metadata Repository](https://www.lucidchart.com/documents/view/f918ce01-9eb4-4900-b266-49935da271b8/0). - -Metadata needs to be collected, organized, and stored in a way to allow for rich exploratory queries via the [Metadata UI](https://github.com/MarquezProject/marquez/tree/main/web). The Metadata Repository serves as a catalog of dataset information encapsulated and cleanly abstracted away by the Metadata API. - -## Data Model - -Marquez's data model emphasizes immutability and timely processing of datasets. Datasets are first-class values produced by job runs. A job run is linked to _versioned_ code, and produces one or more immutable _versioned_ outputs. Dataset changes are recorded at different points in job execution via lightweight API calls, including the success or failure of the run itself. - -The diagram below shows the metadata collected and cataloged for a given job over multiple runs, and the time-ordered sequence of changes applied to its input dataset. - -
- -
- -**Job**: A job has an `owner`, unique `name`, `version`, and optional `description`. A job will define one or more _versioned_ inputs as dependencies, and one or more _versioned_ outputs as artifacts. Note that it's possible for a job to have only input, or only output datasets defined. - -**Job Version:** A read-only _immutable_ `version` of a job, with a unique referenceable `link` to code preserving the reproducibility of builds from source. A job version associates one or more input and output datasets to a job definition (important for lineage information as data moves through various jobs over time). Such associations catalog provenance links and provide powerful visualizations of the flow of data. - -**Dataset:** A dataset has an `owner`, unique `name`, `schema`, `version`, and optional `description`. A dataset is contained within a datasource. A `datasource` enables the grouping of physical datasets to their physical source. A version `pointer` into the historical set of changes is present for each dataset and maintained by Marquez. When a dataset change is committed back to Marquez, a distinct version ID is generated, stored, then set to `current` with the pointer updated internally. - -**Dataset Version:** A read-only _immutable_ `version` of a dataset. Each version can be read independently and has a unique ID mapped to a dataset change preserving its state at some given point in time. The _latest_ version ID is updated only when a change to the dataset has been recorded. To compute a distinct version ID, Marquez applies a versioning function to a set of properties corresponding to the datasets underlying datasource. - -## Deployment - -To deploy and manage Marquez in a cloud environment, please follow our [deployment](deployment-overview.html) guide. - -## Contributing - -We're excited you're interested in contributing to Marquez! We'd love your help, and there are plenty of ways to contribute: - -* Fix or [report](https://github.com/MarquezProject/marquez/issues/new) a bug -* Fix or improve documentation -* Pick up a ["good first issue"](https://github.com/MarquezProject/marquez/labels/good%20first%20issue), then send a pull request our way - -We feel that a welcoming community is important and we ask that you follow the [Contributor Covenant Code of Conduct](https://github.com/MarquezProject/marquez/blob/main/CODE_OF_CONDUCT.md) in all interactions with the community. - -If you’re interested in using or learning more about Marquez, reach out to us on our [slack](http://bit.ly/Marquez_invite) channel and follow [@MarquezProject](https://twitter.com/MarquezProject) for updates. We also encourage new comers to [join](https://lists.lfaidata.foundation/g/marquez-technical-discuss/ics/invite.ics?repeatid=32038) our monthly community meeting! - -## Marquez Talks - -* [Data Lineage with Apache Airflow using OpenLineage](https://www.youtube.com/watch?v=qQAdpbNhxl8) by Julien Le Dem, Willy Lulciuc at Airflow Summit '21 -* [Data Lineage with Apache Airflow](https://www.datacouncil.ai/talks/data-lineage-with-apache-airflow) by Willy Lulciuc at Data Council SF '20 -* [Solving Data Lineage Tracking And Data Discovery At WeWork](https://www.dataengineeringpodcast.com/marquez-data-lineage-episode-111) on [The Data Engineering Podcast](https://www.dataengineeringpodcast.com/) -* [Data Lineage with Apache Airflow using Marquez](https://www.youtube.com/watch?v=BIVUXruv5io) by Willy Lulciuc at CRUNCH '19 -* [Marquez: An Open Source Metadata Service for ML Platforms](https://www.slideshare.net/WillyLulciuc/marquez-an-open-source-metadata-service-for-ml-platforms) by Willy Lulciuc, Shawn Shah at AI NEXTCon SF '19 -* [Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-based Triggers](https://www.datacouncil.ai/speaker/marquez-a-metadata-service-for-data-abstraction-data-lineage-and-event-based-triggers) by Willy Lulciuc at DataEngConf NYC '18 + + +Redirecting you to our new website! + + ---- SPDX-License-Identifier: Apache-2.0 From 797bbfc467fc2ca02e7e729cc1d65a69c3c86d4d Mon Sep 17 00:00:00 2001 From: Michael Robinson <68482867+merobi-hub@users.noreply.github.com> Date: Wed, 20 Sep 2023 23:16:10 -0400 Subject: [PATCH 3/5] update changelog for 0.41.0 (#2619) * update changelog for 0.41.0 Signed-off-by: Michael Robinson * continued Signed-off-by: Michael Robinson * continued Signed-off-by: Michael Robinson * continued Signed-off-by: Michael Robinson * continued Signed-off-by: Michael Robinson --------- Signed-off-by: Michael Robinson --- CHANGELOG.md | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a9736d4962..5c16a7be4b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,37 @@ # Changelog -## [Unreleased](https://github.com/MarquezProject/marquez/compare/0.40.0...HEAD) +## [Unreleased](https://github.com/MarquezProject/marquez/compare/0.41.0...HEAD) + +## [0.41.0](https://github.com/MarquezProject/marquez/compare/0.40.0...0.41.0) - 2023-09-20 +### Added +* API: add support for the following parameters in the `SearchDao` [`#2556`](https://github.com/MarquezProject/marquez/pull/2556) [@tati](https://github.com/tati) [@wslulciuc](https://github.com/wslulciuc) + *This PR updates the search endpoint to enforce `YYYY-MM-DD` for query params, use `YYYY-MM-DD` as `LocalDate`, and support the following query params:* + - *`namespace` - matches jobs or datasets within the given namespace.* + - *`before` - matches jobs or datasets before `YYYY-MM-DD`.* + - *`after` - matches jobs or datasets after `YYYY-MM-DD`.* +* Web: add paging on jobs and datasets [`#2614`](https://github.com/MarquezProject/marquez/pull/2614) [@phixme](https://github.com/phixMe) + *Adds paging to jobs and datasets just like we already have on the lineage events page.* +* Web: add tag descriptions to tooltips [`#2612`](https://github.com/MarquezProject/marquez/pull/2612) [@davidsharp7](https://github.com/davidsharp7) + *Get the tag descriptions from the tags endpoint and when a column has a tag display the corresponding description on hover over. Context can be found [here](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue).* +* Web: add available column-level tags [`#2606`](https://github.com/MarquezProject/marquez/pull/2606) [@davidsharp7](https://github.com/davidsharp7) + *Adds a new column called "tags" to the dataset column view along with the tags associated with the dataset column.* +* Web: add HTML Tool Tip [`#2601`](https://github.com/MarquezProject/marquez/pull/2601) [@davidsharp7](https://github.com/davidsharp7) + *Adds a Tool Tip to display basic node details.* + +### Fixed +* Web: fix dataset saga for paging [`#2615`](https://github.com/MarquezProject/marquez/pull/2615) [@phixme](https://github.com/phixMe) + *Updates the saga, changes the default page size.* +* API: perf/improve `jobdao` query [`#2609`](https://github.com/MarquezProject/marquez/pull/2609) [@algorithmy1](https://github.com/algorithmy1) + *Optimizes the query to make use of Common Table Expressions to fetch the required data more efficiently and before the join, fixing a significant bottleneck.* + +### Changed +* Docker: Postgres `14` [`#2607`](https://github.com/MarquezProject/marquez/pull/2607) [@wslulciuc](https://github.com/wslulciuc) + *Bumps the recommended version of Postgres to 14.* + *When deploying locally, you might need to run `./docker/down.sh` to clean existing volumes.* + +### Removed +* Client: tolerate null transformation attrs in field model [`#2600`](https://github.com/MarquezProject/marquez/pull/2600) [@davidjgoss](https://github.com/davidjgoss) + *Removes the @NonNull annotation from the client class and the @NotNull from the model class.* ## [0.40.0](https://github.com/MarquezProject/marquez/compare/0.39.0...0.40.0) - 2023-08-15 ### Added From 44cb56b392166ced8ee89e9d92f28bb21d253e62 Mon Sep 17 00:00:00 2001 From: Michael Robinson Date: Wed, 20 Sep 2023 23:25:14 -0400 Subject: [PATCH 4/5] Prepare for release 0.41.0 Signed-off-by: Michael Robinson --- .circleci/db-migration.sh | 2 +- .env.example | 2 +- chart/Chart.yaml | 2 +- chart/values.yaml | 4 ++-- clients/java/README.md | 4 ++-- docker/up.sh | 4 ++-- docs/openapi.html | 44 ++++++++++++++++++++++----------------- gradle.properties | 2 +- spec/openapi.yml | 2 +- 9 files changed, 36 insertions(+), 30 deletions(-) diff --git a/.circleci/db-migration.sh b/.circleci/db-migration.sh index ab4c109c57..782cf25bcf 100755 --- a/.circleci/db-migration.sh +++ b/.circleci/db-migration.sh @@ -13,7 +13,7 @@ # Version of PostgreSQL readonly POSTGRES_VERSION="14" # Version of Marquez -readonly MARQUEZ_VERSION=0.40.0 +readonly MARQUEZ_VERSION=0.41.0 # Build version of Marquez readonly MARQUEZ_BUILD_VERSION="$(git log --pretty=format:'%h' -n 1)" # SHA1 diff --git a/.env.example b/.env.example index f9a022cfba..7e7935ddd6 100644 --- a/.env.example +++ b/.env.example @@ -1,4 +1,4 @@ API_PORT=5000 API_ADMIN_PORT=5001 WEB_PORT=3000 -TAG=0.40.0 +TAG=0.41.0 diff --git a/chart/Chart.yaml b/chart/Chart.yaml index 170afea3e0..1fe9c43c08 100644 --- a/chart/Chart.yaml +++ b/chart/Chart.yaml @@ -29,4 +29,4 @@ name: marquez sources: - https://github.com/MarquezProject/marquez - https://marquezproject.github.io/marquez/ -version: 0.40.0 +version: 0.41.0 diff --git a/chart/values.yaml b/chart/values.yaml index 4bd4547aff..e6a547c229 100644 --- a/chart/values.yaml +++ b/chart/values.yaml @@ -17,7 +17,7 @@ marquez: image: registry: docker.io repository: marquezproject/marquez - tag: 0.40.0 + tag: 0.41.0 pullPolicy: IfNotPresent ## Name of the existing secret containing credentials for the Marquez installation. ## When this is specified, it will take precedence over the values configured in the 'db' section. @@ -75,7 +75,7 @@ web: image: registry: docker.io repository: marquezproject/marquez-web - tag: 0.40.0 + tag: 0.41.0 pullPolicy: IfNotPresent ## Marquez website will run on this port ## diff --git a/clients/java/README.md b/clients/java/README.md index 0527351b85..cd418c45d6 100644 --- a/clients/java/README.md +++ b/clients/java/README.md @@ -10,14 +10,14 @@ Maven: io.github.marquezproject marquez-java - 0.40.0 + 0.41.0 ``` or Gradle: ```groovy -implementation 'io.github.marquezproject:marquez-java:0.40.0 +implementation 'io.github.marquezproject:marquez-java:0.41.0 ``` ## Usage diff --git a/docker/up.sh b/docker/up.sh index 6fd155edbc..ff6b2527ac 100755 --- a/docker/up.sh +++ b/docker/up.sh @@ -8,9 +8,9 @@ set -e # Version of Marquez -readonly VERSION=0.40.0 +readonly VERSION=0.41.0 # Build version of Marquez -readonly BUILD_VERSION=0.40.0 +readonly BUILD_VERSION=0.41.0 title() { echo -e "\033[1m${1}\033[0m" diff --git a/docs/openapi.html b/docs/openapi.html index f8621c9873..ad9c2a29e0 100644 --- a/docs/openapi.html +++ b/docs/openapi.html @@ -2033,6 +2033,9 @@ data-styled.g73[id="sc-TtZnY"]{content:"hUSnpT,"}/*!sc*/ .bsGeIE{color:#d41f1c;font-size:0.9em;font-weight:normal;margin-left:20px;line-height:1;}/*!sc*/ data-styled.g74[id="sc-jHNicF"]{content:"bsGeIE,"}/*!sc*/ +.ffLgqz{color:#0e7c86;}/*!sc*/ +.ffLgqz::before,.ffLgqz::after{font-weight:bold;}/*!sc*/ +data-styled.g76[id="sc-jOFryr"]{content:"ffLgqz,"}/*!sc*/ .cfctgs{border-radius:2px;background-color:rgba(51,51,51,0.05);color:rgba(51,51,51,0.9);padding:0 5px;border:1px solid rgba(51,51,51,0.1);font-family:Courier,monospace;}/*!sc*/ .sc-hmbstg + .sc-hmbstg{margin-left:0;}/*!sc*/ data-styled.g77[id="sc-hmbstg"]{content:"cfctgs,"}/*!sc*/ @@ -2174,7 +2177,7 @@ 55.627 l 55.6165,55.627 -231.245496,231.24803 c -127.185,127.1864 -231.5279,231.248 -231.873,231.248 -0.3451,0 -104.688, -104.0616 -231.873,-231.248 z - " fill="currentColor">

Marquez (0.40.0)

Download OpenAPI specification:Download

License: Apache 2.0

Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem's metadata.

+ " fill="currentColor">

Marquez (0.41.0)

Download OpenAPI specification:Download

License: Apache 2.0

Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem's metadata.

Namespaces

Create a namespace

Creates a new namespace object. A namespace enables the contextual grouping of related jobs and datasets. Namespaces must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), dashes (-), colons (:), slashes (/), or dots (.). A namespace is case-insensitive with a maximum length of 1024 characters. Note jobs and datasets will be unique within a namespace, but not across namespaces.

path Parameters
namespace
required
string <= 1024 characters
Example: my-namespace

The name of the namespace.

Request Body schema: application/json
ownerName
required
string

The owner of the namespace.

@@ -2190,16 +2193,16 @@

Responses

Response samples

Content type
application/json
{
  • "name": "my-namespace",
  • "createdAt": "2019-05-09T19:49:24.201361Z",
  • "updatedAt": "2019-05-09T19:49:24.201361Z",
  • "ownerName": "me",
  • "description": "My first namespace!"
}

List all namespaces

Returns a list of namespaces.

-
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset

-
offset
integer
Default: 0

The initial position from which to return results

+
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset.

+
offset
integer
Default: 0

The initial position from which to return results.

Responses

Response samples

Content type
application/json
{
  • "namespaces": [
    ]
}

Events

List all received OpenLineage events.

Returns a list of OpenLineage events, sorted in direction of passed sort parameter. By default it is desc.

query Parameters
sortDirection
string
Example: sortDirection=name

Sorts the results of your query by indicated direction asc or desc.

before
string <date-time>
Example: before=2022-09-15T07:47:19Z

Returns events before passed date.

after
string <date-time>
Example: after=2022-09-15T07:47:19Z

Returns events after passed date.

-
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset

-
offset
integer
Default: 0

The initial position from which to return results

+
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset.

+
offset
integer
Default: 0

The initial position from which to return results.

Responses

Response samples

Content type
application/json
{}

Sources

Create a source Deprecated

Creates a new source object. A source is the physical location of a dataset such as a table in PostgreSQL, or topic in Kafka. A source enables the grouping of physical datasets to their physical source.

@@ -2214,8 +2217,8 @@

Responses

Response samples

Content type
application/json
{
  • "type": "POSTGRESQL",
  • "name": "my-source",
  • "createdAt": "2019-05-09T19:49:24.201361Z",
  • "updatedAt": "2019-05-09T19:49:24.201361Z",
  • "connectionUrl": "jdbc:postgresql://db.example.com/mydb",
  • "description": "My first source!"
}

List all sources

Returns a list of sources.

-
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset

-
offset
integer
Default: 0

The initial position from which to return results

+
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset.

+
offset
integer
Default: 0

The initial position from which to return results.

Responses

Response samples

Content type
application/json
{
  • "sources": [
    ]
}

Datasets

Create a dataset Deprecated

Creates a new dataset.

@@ -2249,15 +2252,15 @@
http://localhost:5000/api/v1/namespaces/{namespace}/datasets/{dataset}/versions/{version}

Response samples

Content type
application/json
{
  • "id": {
    },
  • "type": "DB_TABLE",
  • "name": "my-dataset",
  • "physicalName": "public.mytable",
  • "createdAt": "2019-05-09T19:49:24.201361Z",
  • "version": "d224dac0-35d7-4d9b-bbbe-6fff1a8485ad",
  • "namespace": "my-namespace",
  • "sourceName": "my-source",
  • "fields": [
    ],
  • "tags": [ ],
  • "description": "My first dataset!",
  • "createdByRun": {
    }
}

List all versions for a dataset

Returns a list of versions for a dataset.

path Parameters
namespace
required
string <= 1024 characters
Example: my-namespace

The name of the namespace.

dataset
required
string <= 1024 characters
Example: my-dataset

The name of the dataset.

-
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset

-
offset
integer
Default: 0

The initial position from which to return results

+
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset.

+
offset
integer
Default: 0

The initial position from which to return results.

Responses

Response samples

Content type
application/json
{
  • "versions": [
    ]
}

List all datasets

Returns a list of datasets.

path Parameters
namespace
required
string <= 1024 characters
Example: my-namespace

The name of the namespace.

dataset
required
string <= 1024 characters
Example: my-dataset

The name of the dataset.

-
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset

-
offset
integer
Default: 0

The initial position from which to return results

+
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset.

+
offset
integer
Default: 0

The initial position from which to return results.

Responses

Response samples

Content type
application/json
{
  • "datasets": [
    ],
  • "totalCount": 0
}

Tag a dataset

Tag an existing dataset.

@@ -2298,8 +2301,8 @@

Response samples

Content type
application/json
{
  • "id": {
    },
  • "type": "BATCH",
  • "name": "my-job",
  • "createdAt": "2019-05-09T19:49:24.201361Z",
  • "updatedAt": "2019-05-09T19:49:24.201361Z",
  • "namespace": "my-namespace",
  • "inputs": [
    ],
  • "outputs": [ ],
  • "context": {
    },
  • "description": "My first job!",
  • "latestRun": null,
  • "facets": { },
  • "currentVersion": "b1d626a2-6d3a-475e-9ecf-943176d4a8c6"
}

List all jobs

Returns a list of jobs.

path Parameters
namespace
required
string <= 1024 characters
Example: my-namespace

The name of the namespace.

-
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset

-
offset
integer
Default: 0

The initial position from which to return results

+
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset.

+
offset
integer
Default: 0

The initial position from which to return results.

Responses

Response samples

Content type
application/json
{
  • "jobs": [
    ],
  • "totalCount": 0
}

Retrieve a version for a job

Returns a version for a job.

@@ -2325,8 +2328,8 @@
http://localhost:5000/api/v1/namespaces/{namespace}/jobs/{job}/runs

Request samples

Content type
application/json
{
  • "args": {
    }
}

Response samples

Content type
application/json
Example
{
  • "id": "870492da-ecfb-4be0-91b9-9a89ddd3db90",
  • "createdAt": "2019-05-09T19:49:24.201361Z",
  • "updatedAt": "2019-05-09T19:49:24.201361Z",
  • "nominalStartTime": null,
  • "nominalEndTime": null,
  • "state": "RUNNING",
  • "startedAt": "2019-05-09T15:17:32.690346",
  • "endedAt": null,
  • "durationMs": null,
  • "args": {
    },
  • "facets": { }
}

List all runs

Returns a list of runs for a job.

path Parameters
namespace
required
string <= 1024 characters
Example: my-namespace

The name of the namespace.

job
required
string <= 1024 characters
Example: my-job

The name of the job.

-
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset

-
offset
integer
Default: 0

The initial position from which to return results

+
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset.

+
offset
integer
Default: 0

The initial position from which to return results.

Responses

Response samples

Content type
application/json
{
  • "runs": [
    ]
}

Retrieve a run

Retrieve a run.

@@ -2376,20 +2379,23 @@

Responses

Request samples

Content type
application/json
{
  • "description": "My first tag!"
}

Response samples

Content type
application/json
{
  • "tags": [
    ]
}

List all tags

Returns a list of tags.

-
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset

-
offset
integer
Default: 0

The initial position from which to return results

+
query Parameters
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset.

+
offset
integer
Default: 0

The initial position from which to return results.

Responses

Response samples

Content type
application/json
{
  • "tags": [
    ]
}

Search

Query all datasets and jobs

Returns one or more datasets and jobs of your query.

query Parameters
q
required
string
Example: q=my-dataset

Query containing pattern to match; datasets and jobs pattern matching is string based and case-insensitive. Use percent sign (%) to match any string of zero or more characters (my-job%), or an underscore (_) to match a single character (_job_).

filter
string
Example: filter=dataset

Filters the results of your query by dataset or job.

sort
string
Example: sort=name

Sorts the results of your query by name or updated_at.

-
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset

+
limit
integer
Default: 100
Example: limit=25

The number of results to return from offset.

+
namespace
string <= 1024 characters
Example: namespace=my-namespace

Match jobs or datasets within the given namespace.

+
before
stringYYYY-MM-DD
Example: before=2022-09-15

Match jobs or datasets before YYYY-MM-DD.

+
after
stringYYYY-MM-DD
Example: after=2022-09-15

Match jobs or datasets after YYYY-MM-DD.

Responses

Response samples

Content type
application/json
{
  • "totalCount": 1,
  • "results": [
    ]
}