Skip to content

Commit

Permalink
Merge branch 'main' into proposal/separate-facets-tables-for-ol-events
Browse files Browse the repository at this point in the history
  • Loading branch information
pawel-big-lebowski authored Dec 21, 2022
2 parents 3c5ebed + 1d28adf commit 28a07e9
Show file tree
Hide file tree
Showing 48 changed files with 568 additions and 331 deletions.
1 change: 1 addition & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ jobs:
- v1-web-{{ .Branch }}
- run: npm install
- run: npm run test
- run: npm run eslint-fix
- run: npm run build
- save_cache:
paths:
Expand Down
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
API_PORT=5000
API_ADMIN_PORT=5001
WEB_PORT=3000
TAG=0.28.0
TAG=0.29.0
51 changes: 43 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,62 @@
# Changelog

## [Unreleased](https://github.com/MarquezProject/marquez/compare/0.28.0...HEAD)
## [Unreleased](https://github.com/MarquezProject/marquez/compare/0.29.0...HEAD)

## [0.29.0](https://github.com/MarquezProject/marquez/compare/0.28.0...0.29.0) - 2022-12-19

### Added

* Column-lineage endpoints supports point-in-time requests [`#2265`](https://github.com/MarquezProject/marquez/pull/2265) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*Enable requesting `column-lineage` endpoint by a dataset version, job version or dataset field of a specific dataset version.*
* Add point-in-time requests support to column-lineage endpoints [`#2265`](https://github.com/MarquezProject/marquez/pull/2265) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*Enables requesting `column-lineage` endpoint by a dataset version, job version or dataset field of a specific dataset version.*
* Add column lineage point-in-time Java client methods [`#2269`](https://github.com/MarquezProject/marquez/pull/2269) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*Java client methods to retrieve point-in-time `column-lineage`. Please note that the existing methods `getColumnLineageByDataset`, `getColumnLineageByDataset` and `getColumnLineageByDatasetField` are replaced by a single `getColumnLineage` method taking `NodeId` as a parameter.*
* Add raw event viewer to UI [`#2249`](https://github.com/MarquezProject/marquez/pull/2249) [@tito12](https://github.com/tito12)
*A new events page enables filtering events by date and expanding the payload by clicking on each event.*
* Update events page with styling synchronization [`#2324`](https://github.com/MarquezProject/marquez/pull/2324) [@phixMe](https://github.com/phixMe)
*Makes some updates to the new page to make it conform better to the overall design system.*
* Update helm Ingress template to be cross-compatible with recent k8s versions [`#2275`](https://github.com/MarquezProject/marquez/pull/2275) [@jlukenoff](https://github.com/jlukenoff)
*Certain components of the Ingress schema have changed in recent versions of Kubernetes. This change updates the Ingress helm template to render based on the semantic Kubernetes version.*
* Add delete namespace endpoint doc to OpenAPI docs [`#2295`](https://github.com/MarquezProject/marquez/pull/2295) [@mobuchowski](https://github.com/mobuchowski)
*Adds a doc about the delete namespace endpoint.*
* Add i18next and language switcher for i18n of UI [`#2254`](https://github.com/MarquezProject/marquez/pull/2254) [@merobi-hub](https://github.com/merobi-hub) [@phixMe](https://github.com/phixMe)
*Adds i18next framework, language switcher, and translations for i18n of UI.*
* Add indexed `created_at` column to lineage events table [`#2299`](https://github.com/MarquezProject/marquez/pull/2299) [@prachim-collab](https://github.com/prachim-collab)
*A new timestamp column in the database supports analytics use cases by allowing for identification of incrementally created events (backwards-compatible).*

### Fixed

* Allow null column type in column-lineage [`#2272`](https://github.com/MarquezProject/marquez/pull/2272) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
* Allow null column type in column lineage [`#2272`](https://github.com/MarquezProject/marquez/pull/2272) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*The column-lineage endpoint was throwing an exception when no data type of the field was provided. Includes a test.*
* Include error message for JSON processing exception [`#2271`](https://github.com/MarquezProject/marquez/pull/2271) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*In case of JSON processing exceptions Marquez API should return exception message to a client.*
* Fix column lineage when multiple jobs write to same dataset [`#2289`](https://github.com/MarquezProject/marquez/pull/2289) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*The fix deprecates the way fields `transformationDescription` and `transformationType` are returned. The depracated way of returning those fields will be removed in 0.30.0.*
*In case of JSON processing exceptions, the Marquez API now returns an exception message to a client.*
* Fix column lineage when multiple jobs write to same dataset [`#2289`](https://github.com/MarquezProject/marquez/pull/2289) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*The fix deprecates the way the fields `transformationDescription` and `transformationType` are returned. The deprecated way of returning those fields will be removed in 0.30.0.*
* Use raw link for `iconSearchArrow.svg` [`#2280`](https://github.com/MarquezProject/marquez/pull/2280) [@wslulciuc](https://github.com/wslulciuc)
*Using a direct link to the events viewer icon fixes a loading issue.*
* Fill run state of parent run when created by child run [`#2296`](https://github.com/MarquezProject/marquez/pull/2296) [@fm100](https://github.com/fm100)
*Adds a run state to the parent at creation time to address a missing run state issue in Airflow integration.*
* Update migration query to make it work with existing view [`#2308`](https://github.com/MarquezProject/marquez/pull/2308) [@fm100](https://github.com/fm100)
*Changes the V52 migration query to drop the view before `ALTER`. Because repeatable migration runs only when its checksum changes, it was necessary to get the view definition first then drop and recreate it.*
* Fix lineage for orphaned datasets [`#2314`](https://github.com/MarquezProject/marquez/pull/2314) [@collado-mike](https://github.com/collado-mike)
*Fixes lineage for datasets generated by jobs whose current versions no longer write to the databases in question.*
* Ensure job data in lineage query is not null or empty [`#2253`](https://github.com/MarquezProject/marquez/pull/2253) [@wslulciuc](https://github.com/wslulciuc)
*Changes the API to return an empty graph in the edge case of a job UUID that has no lineage when calling `LineageDao.getLineage()` yet is associated with a dataset. This case formerly resulted in an empty set and backend exception. Also includes logging and an API check for a `nodeID`.*
* Make `name` and `type` required for datasets [`#2305`](https://github.com/MarquezProject/marquez/pull/2305) [@wslulciuc](https://github.com/wslulciuc)
*When generating Typescript from the OpenAPI spec, `name` and `type` were not required but should have been.*
* Remove unused filter on `RunDao.updateStartState()` [`#2319`](https://github.com/MarquezProject/marquez/pull/2319) [@wslulciuc](https://github.com/wslulciuc)
*Removes the conditions `updated_at < transitionedAt` and `start_run_state_uuid != null` to allow for updating the run state.*
* Update linter [`#2322`](https://github.com/MarquezProject/marquez/pull/2322) [@phixMe](https://github.com/phixMe)
*Adds `npm run eslint-fix` to the CI config to fail if it does not return with a RC 0.*
* Fix asset loading for web [`#2323`](https://github.com/MarquezProject/marquez/pull/2323) [@phixMe](https://github.com/phixMe)
*Fixes the webpack config and allows files to be imported in a modern capacity that enforces the assets exist.*

## [0.28.0](https://github.com/MarquezProject/marquez/compare/0.27.0...0.28.0) - 2022-11-21

### Added

* Optimize current runs query for lineage API [`#2211`](https://github.com/MarquezProject/marquez/pull/2211) [@prachim-collab](https://github.com/prachim-collab)
*Add a simpler, alternate `getCurrentRuns` query that gets only simple runs from the database without the additional data from tables such as `run_args`, `job_context`, `facets`, etc., which required extra table joins.*
* Add Code Quality, DCO and Governance docs to project [`#2237`](https://github.com/MarquezProject/marquez/pull/2237) [`#2241`](https://github.com/MarquezProject/marquez/pull/2241) [@merobi-hub](https://github.com/MarquezProject/marquez/commits?author=merobi-hub)
* Add Code Quality, DCO and Governance docs to project [`#2237`](https://github.com/MarquezProject/marquez/pull/2237) [`#2241`](https://github.com/MarquezProject/marquez/pull/2241) [@merobi-hub](https://github.com/merobi-hub)
*Adds a number of standard governance and procedure docs to the project.*
* Add possibility to soft-delete namespaces [`#2244`](https://github.com/MarquezProject/marquez/pull/2244) [@mobuchowski](https://github.com/mobuchowski)
*Adds the ability to "hide" inactive namespaces. The namespaces are undeleted when a relevant OL event is received.*
Expand Down
2 changes: 1 addition & 1 deletion api/src/main/java/marquez/db/RunDao.java
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ public interface RunDao extends BaseDao {
+ "SET updated_at = :transitionedAt, "
+ " start_run_state_uuid = :startRunStateUuid,"
+ " started_at = :transitionedAt "
+ "WHERE uuid = :rowUuid AND (updated_at < :transitionedAt or start_run_state_uuid is null)")
+ "WHERE uuid = :rowUuid")
void updateStartState(UUID rowUuid, Instant transitionedAt, UUID startRunStateUuid);

@SqlUpdate(
Expand Down
17 changes: 12 additions & 5 deletions api/src/test/java/marquez/ColumnLineageIntegrationTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@
import java.util.Optional;
import marquez.api.JdbiUtils;
import marquez.client.MarquezClient;
import marquez.client.models.DatasetFieldId;
import marquez.client.models.DatasetId;
import marquez.client.models.JobId;
import marquez.client.models.Node;
import marquez.client.models.NodeId;
import marquez.db.LineageTestUtils;
import marquez.db.OpenLineageDao;
import marquez.jdbi.MarquezJdbiExternalPostgresExtension;
Expand Down Expand Up @@ -64,7 +68,8 @@ public void tearDown(Jdbi jdbi) {

@Test
public void testColumnLineageEndpointByDataset() {
MarquezClient.Lineage lineage = client.getColumnLineageByDataset("namespace", "dataset_b");
MarquezClient.Lineage lineage =
client.getColumnLineage(NodeId.of(new DatasetId("namespace", "dataset_b")));

assertThat(lineage.getGraph()).hasSize(3);
assertThat(getNodeByFieldName(lineage, "col_a")).isPresent();
Expand All @@ -75,7 +80,7 @@ public void testColumnLineageEndpointByDataset() {
@Test
public void testColumnLineageEndpointByDatasetField() {
MarquezClient.Lineage lineage =
client.getColumnLineageByDataset("namespace", "dataset_b", "col_c");
client.getColumnLineage(NodeId.of(new DatasetFieldId("namespace", "dataset_b", "col_c")));

assertThat(lineage.getGraph()).hasSize(3);
assertThat(getNodeByFieldName(lineage, "col_a")).isPresent();
Expand All @@ -86,7 +91,8 @@ public void testColumnLineageEndpointByDatasetField() {
@Test
public void testColumnLineageEndpointWithDepthLimit() {
MarquezClient.Lineage lineage =
client.getColumnLineageByDatasetField("namespace", "dataset_c", "col_d", 1, false);
client.getColumnLineage(
NodeId.of(new DatasetFieldId("namespace", "dataset_c", "col_d")), 1, false);

assertThat(lineage.getGraph()).hasSize(2);
assertThat(getNodeByFieldName(lineage, "col_c")).isPresent();
Expand All @@ -96,15 +102,16 @@ public void testColumnLineageEndpointWithDepthLimit() {
@Test
public void testColumnLineageEndpointWithDownstream() {
MarquezClient.Lineage lineage =
client.getColumnLineageByDatasetField("namespace", "dataset_b", "col_c", 10, true);
client.getColumnLineage(NodeId.of(new JobId("namespace", "job1")), 10, true);

assertThat(lineage.getGraph()).hasSize(4);
assertThat(getNodeByFieldName(lineage, "col_d")).isPresent();
}

@Test
public void testColumnLineageEndpointByJob() {
MarquezClient.Lineage lineage = client.getColumnLineageByJob("namespace", "job1");
MarquezClient.Lineage lineage =
client.getColumnLineage(NodeId.of(new JobId("namespace", "job1")), 1, false);

assertThat(lineage.getGraph()).hasSize(3);
assertThat(getNodeByFieldName(lineage, "col_a")).isPresent();
Expand Down
22 changes: 14 additions & 8 deletions api/src/test/java/marquez/service/models/NodeIdTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;

import java.util.UUID;
import marquez.common.models.DatasetFieldId;
import marquez.common.models.DatasetFieldVersionId;
import marquez.common.models.DatasetId;
import marquez.common.models.DatasetName;
import marquez.common.models.FieldName;
Expand Down Expand Up @@ -150,25 +152,29 @@ public void testDatasetField(String namespace, String dataset, String field) {
"gs://bucket$/path/to/data$col_A#aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
},
delimiter = '$')
public void testDatasetFieldVersion(String namespace, String dataset, String field) {
public void testDatasetFieldVersion(String namespace, String dataset, String fieldWithVersion) {
String version = fieldWithVersion.split(VERSION_DELIM)[1];
String field = fieldWithVersion.split(VERSION_DELIM)[0];

NamespaceName namespaceName = NamespaceName.of(namespace);
FieldName fieldName = FieldName.of(field);
FieldName fieldName = FieldName.of(field.split(VERSION_DELIM)[0]);
DatasetName datasetName = DatasetName.of(dataset);
DatasetId dsId = new DatasetId(namespaceName, datasetName);
DatasetFieldId dsfId = new DatasetFieldId(dsId, fieldName);
DatasetFieldVersionId dsfId =
new DatasetFieldVersionId(dsId, fieldName, UUID.fromString(version));
NodeId nodeId = NodeId.of(dsfId);
assertFalse(nodeId.isRunType());
assertFalse(nodeId.isJobType());
assertFalse(nodeId.isDatasetType());
assertTrue(nodeId.hasVersion());
assertTrue(nodeId.isDatasetFieldVersionType());

assertEquals(dsfId, nodeId.asDatasetFieldId());
assertEquals(dsfId, nodeId.asDatasetFieldVersionId());
assertEquals(nodeId, NodeId.of(nodeId.getValue()));
assertEquals(namespace, nodeId.asDatasetFieldId().getDatasetId().getNamespace().getValue());
assertEquals(dataset, nodeId.asDatasetFieldId().getDatasetId().getName().getValue());
assertEquals(field, nodeId.asDatasetFieldId().getFieldName().getValue());
assertEquals(
field.split(VERSION_DELIM)[1], nodeId.asDatasetFieldVersionId().getVersion().toString());
namespace, nodeId.asDatasetFieldVersionId().getDatasetId().getNamespace().getValue());
assertEquals(dataset, nodeId.asDatasetFieldVersionId().getDatasetId().getName().getValue());
assertEquals(field, nodeId.asDatasetFieldVersionId().getFieldName().getValue());
assertEquals(version, nodeId.asDatasetFieldVersionId().getVersion().toString());
}
}
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ subprojects {

pmd {
consoleOutput = true
toolVersion = "6.46.0"
toolVersion = "6.52.0"
rulesMinimumPriority = 5
ruleSetFiles = rootProject.files("pmd-marquez.xml")
ruleSets = []
Expand Down
2 changes: 1 addition & 1 deletion chart/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ name: marquez
sources:
- https://github.com/MarquezProject/marquez
- https://marquezproject.github.io/marquez/
version: 0.28.0
version: 0.29.0
6 changes: 3 additions & 3 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ marquez:
image:
registry: docker.io
repository: marquezproject/marquez
tag: 0.28.0
tag: 0.29.0
pullPolicy: IfNotPresent
## Name of the existing secret containing credentials for the Marquez installation.
## When this is specified, it will take precedence over the values configured in the 'db' section.
Expand Down Expand Up @@ -75,7 +75,7 @@ web:
image:
registry: docker.io
repository: marquezproject/marquez-web
tag: 0.28.0
tag: 0.29.0
pullPolicy: IfNotPresent
## Marquez website will run on this port
##
Expand Down Expand Up @@ -107,7 +107,7 @@ postgresql:
## @param image.tag PostgreSQL image tag (immutable tags are recommended)
##
image:
tag: 12.1.0
tag: 0.29.0
## Authentication parameters
## ref: https://github.com/bitnami/bitnami-docker-postgresql/blob/master/README.md#setting-the-root-password-on-first-run
## ref: https://github.com/bitnami/bitnami-docker-postgresql/blob/master/README.md#creating-a-database-on-first-run
Expand Down
4 changes: 2 additions & 2 deletions clients/java/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ Maven:
<dependency>
<groupId>io.github.marquezproject</groupId>
<artifactId>marquez-java</artifactId>
<version>0.28.0</version>
<version>0.29.0</version>
</dependency>
```

or Gradle:

```groovy
implementation 'io.github.marquezproject:marquez-java:0.28.0
implementation 'io.github.marquezproject:marquez-java:0.29.0
```

## Usage
Expand Down
47 changes: 5 additions & 42 deletions clients/java/src/main/java/marquez/client/MarquezClient.java
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
import marquez.client.models.Namespace;
import marquez.client.models.NamespaceMeta;
import marquez.client.models.Node;
import marquez.client.models.NodeId;
import marquez.client.models.Run;
import marquez.client.models.RunMeta;
import marquez.client.models.RunState;
Expand Down Expand Up @@ -115,50 +116,12 @@ public enum SortDirection {
@Getter public final String value;
}

public Lineage getColumnLineageByDataset(
@NonNull String namespaceName, @NonNull String datasetName) {
return getColumnLineageByDataset(
namespaceName, datasetName, DEFAULT_LINEAGE_GRAPH_DEPTH, false);
}

public Lineage getColumnLineageByDataset(
@NonNull String namespaceName, @NonNull String datasetName, @NonNull String field) {
return getColumnLineageByDatasetField(
namespaceName, datasetName, field, DEFAULT_LINEAGE_GRAPH_DEPTH, false);
}

public Lineage getColumnLineageByDataset(
@NonNull String namespaceName,
@NonNull String datasetName,
int depth,
boolean withDownstream) {
final String bodyAsJson =
http.get(
url.toColumnLineageUrlByDataset(namespaceName, datasetName, depth, withDownstream));
return Lineage.fromJson(bodyAsJson);
public Lineage getColumnLineage(NodeId nodeId) {
return getColumnLineage(nodeId, DEFAULT_LINEAGE_GRAPH_DEPTH, false);
}

public Lineage getColumnLineageByDatasetField(
@NonNull String namespaceName,
@NonNull String datasetName,
@NonNull String field,
int depth,
boolean withDownstream) {
final String bodyAsJson =
http.get(
url.toColumnLineageUrlByDatasetField(
namespaceName, datasetName, field, depth, withDownstream));
return Lineage.fromJson(bodyAsJson);
}

public Lineage getColumnLineageByJob(@NonNull String namespaceName, @NonNull String jobName) {
return getColumnLineageByJob(namespaceName, jobName, DEFAULT_LINEAGE_GRAPH_DEPTH, false);
}

public Lineage getColumnLineageByJob(
@NonNull String namespaceName, @NonNull String jobName, int depth, boolean withDownstream) {
final String bodyAsJson =
http.get(url.toColumnLineageUrlByJob(namespaceName, jobName, depth, withDownstream));
public Lineage getColumnLineage(NodeId nodeId, int depth, boolean withDownstream) {
final String bodyAsJson = http.get(url.toColumnLineageUrl(nodeId, depth, withDownstream));
return Lineage.fromJson(bodyAsJson);
}

Expand Down
Loading

0 comments on commit 28a07e9

Please sign in to comment.