From 8cf856a793cbdddb96992c1ebdee953f633e957e Mon Sep 17 00:00:00 2001 From: Evan Anderson Date: Fri, 30 Aug 2019 15:03:53 -0700 Subject: [PATCH 1/5] Update AVRO mappings to match current event schema and type system. Signed-off-by: Evan Anderson --- avro-format.md | 120 +++++++++++++++++++------------------------------ spec.avsc | 8 +++- 2 files changed, 52 insertions(+), 76 deletions(-) diff --git a/avro-format.md b/avro-format.md index d151d66c0..ac14530ed 100644 --- a/avro-format.md +++ b/avro-format.md @@ -2,8 +2,8 @@ ## Abstract -The Avro Format for CloudEvents defines how events attributes are expressed -in the [Avro 1.9.0 Specification][avro-spec]. +The Avro Format for CloudEvents defines how events attributes are expressed in +the [Avro 1.9.0 Specification][avro-spec]. ## Status of this document @@ -13,7 +13,8 @@ This document is a working draft. 1. [Introduction](#1-introduction) 2. [Attributes](#2-attributes) -3. [Examples](#3-examples) +3. [Data](#3-data) +4. [Examples](#4-examples) ## 1. Introduction @@ -47,108 +48,81 @@ The CloudEvents type system MUST be mapped to Avro types as follows. | CloudEvents | Avro | | ------------- | ---------------------------------------------------------------------- | +| Boolean | [boolean][avro-primitives] | | Integer | [int][avro-primitives] | | String | [string][avro-primitives] | | Binary | [bytes][avro-primitives] | | URI | [string][avro-primitives] following [RFC 3986 §4.3][rfc3986-section43] | | URI-reference | [string][avro-primitives] following [RFC 3986 §4.1][rfc3986-section41] | | Timestamp | [string][avro-primitives] following [RFC 3339][rfc3339] (ISO 8601) | -| Any | See [2.2](#22-mapping-any-typed-attributes) | Extension specifications MAY define diverging mapping rules for the values of attributes they define. -### 2.2 Mapping Any-typed Attributes - -`Any`-typed CloudEvents values can either hold a `String`, or a `Binary` value, -or a `Map`, or any of all other types. Avro type system satisfies this requirement by employing a recursive reference, -where a `record` type is referenced as a value inside of its own `map`. - -Example: - -```json -{ - "type":"record", - "name":"MyRecord", - "fields":[ - { - "name":"wow", - "type":{ - "type":"map", - "values":[ - "null", - "string", - "MyRecord" - ] - } - } - ] -} -``` - ### 2.3 OPTIONAL Attributes -CloudEvents Spec defines OPTIONAL attributes. The Avro format defines that -these fields MUST use the `null` type and the actual type through -the [union][avro-unions]. +CloudEvents Spec defines OPTIONAL attributes. The Avro format defines that these +fields MUST use the `null` type and the actual type through the +[union][avro-unions]. Example: ```json -[ - "null", - "string" -] +["null", "string"] ``` ### 2.4 Definition -Users of Avro MUST use a message whose binary encoding is identical -to the one described by the [CloudEvent Avro Schema](./spec.avsc): +Users of Avro MUST use a message whose binary encoding is identical to the one +described by the [CloudEvent Avro Schema](./spec.avsc): ```json { - "namespace":"io.cloudevents", - "type":"record", - "name":"CloudEvent", - "version":"0.4-wip", - "doc":"Avro Event Format for CloudEvents", - "fields":[ + "namespace": "io.cloudevents", + "type": "record", + "name": "CloudEvent", + "version": "0.4-wip", + "doc": "Avro Event Format for CloudEvents", + "fields": [ { - "name":"attribute", - "type":{ - "type":"map", - "values":[ - "null", - "int", - "string", - "bytes", - "CloudEvent" - ] + "name": "attribute", + "type": { + "type": "map", + "values": ["null", "boolean", "int", "string", "bytes"] } + }, + { + "name": "data", + "type": "bytes" } ] } ``` -## 3 Examples +## 3 Data + +The `data` of the CloudEvent should be encoded in a top-level field called +`data` of type `bytes`. No additional encoding should be done, regardless of the +`contenttype` attribute. + +## 4 Examples The following table shows exemplary mappings: -| CloudEvents | Type | Exemplary Avro Value | -| ------------ | --------- | ---------------------------------------------- | -| type | string | `"com.example.someevent"` | -| specversion | string | `"0.4-wip` | -| source | string | `"/mycontext"` | -| id | string | `"7a0dc520-c870-4193c8"` | -| time | string | `"2019-06-05T23:45:00Z"` | -| dataschema | string | `"http://registry.com/schema/v1/much.json"` | -| contenttype | string | `"application/json"` | -| data | string | `"{"much":{"wow":"json"}}"` | -|||| -| dataschema | string | `"http://registry.com/subjects/ce/versions/1"` | -| contenttype | string | `"application/avro"` | -| data | string | `"Q2xvdWRFdmVudHM="` | +| CloudEvents | Type | Exemplary Avro Value | +| ----------- | ------ | ---------------------------------------------- | +| type | string | `"com.example.someevent"` | +| specversion | string | `"0.4-wip` | +| source | string | `"/mycontext"` | +| id | string | `"7a0dc520-c870-4193c8"` | +| time | string | `"2019-06-05T23:45:00Z"` | +| dataschema | string | `"http://registry.com/schema/v1/much.json"` | +| contenttype | string | `"application/json"` | +| data | bytes | `"{"much":{"wow":"json"}}"` | +| | | | +| dataschema | string | `"http://registry.com/subjects/ce/versions/1"` | +| contenttype | string | `"application/avro"` | +| data | string | `"Q2xvdWRFdmVudHM="` | ## References @@ -158,9 +132,7 @@ The following table shows exemplary mappings: [avro-primitives]: http://avro.apache.org/docs/1.9.0/spec.html#schema_primitive [avro-logical-types]: http://avro.apache.org/docs/1.9.0/spec.html#Logical+Types [avro-unions]: http://avro.apache.org/docs/1.9.0/spec.html#Unions - [ce]: ./spec.md - [rfc2119]: https://tools.ietf.org/html/rfc2119 [rfc3986-section41]: https://tools.ietf.org/html/rfc3986#section-4.1 [rfc3986-section43]: https://tools.ietf.org/html/rfc3986#section-4.3 diff --git a/spec.avsc b/spec.avsc index 7b638288f..2e3c712e3 100644 --- a/spec.avsc +++ b/spec.avsc @@ -11,12 +11,16 @@ "type":"map", "values":[ "null", + "boolean", "int", "string", - "bytes", - "CloudEvent" + "bytes" ] } + }, + { + "name": "data", + "type": "bytes" } ] } From 597453de46947add52b77ed93ca1fe31807d87d1 Mon Sep 17 00:00:00 2001 From: Evan Anderson Date: Tue, 3 Sep 2019 14:18:06 -0700 Subject: [PATCH 2/5] Address @duglin feedback. Signed-off-by: Evan Anderson --- avro-format.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/avro-format.md b/avro-format.md index ac14530ed..5f871d291 100644 --- a/avro-format.md +++ b/avro-format.md @@ -101,8 +101,8 @@ described by the [CloudEvent Avro Schema](./spec.avsc): ## 3 Data -The `data` of the CloudEvent should be encoded in a top-level field called -`data` of type `bytes`. No additional encoding should be done, regardless of the +The `data` of the CloudEvent MUST be encoded in a top-level field called +`data` of type `bytes`. Additional encoding MUST NOT be done, regardless of the `contenttype` attribute. ## 4 Examples From 5d8285d0e47f2858aecb41286ced281fdfcec686 Mon Sep 17 00:00:00 2001 From: Evan Anderson Date: Tue, 10 Sep 2019 17:20:14 -0700 Subject: [PATCH 3/5] Make AVRO spec support nested JSON object data translated to structured form Signed-off-by: Evan Anderson --- spec.avsc | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/spec.avsc b/spec.avsc index 2e3c712e3..6c1cf7e88 100644 --- a/spec.avsc +++ b/spec.avsc @@ -20,7 +20,45 @@ }, { "name": "data", - "type": "bytes" + "type": [ + "bytes", + "null", + "boolean", + { + "type": "map", + "values": [ + "null", + "boolean", + { + "type": "record", + "name": "CloudEventData", + "doc": "Representation of a JSON Value", + "fields": [ + { + "name": "value", + "type": { + "type": "map", + "values": [ + "null", + "boolean", + { "type": "map", "values": "CloudEventData" }, + { "type": "array", "items": "CloudEventData" }, + "double", + "string" + ] + } + } + ] + }, + "double", + "string" + ] + }, + { "type": "array", "items": "CloudEventData" }, + "double", + "string" + ] } ] } + From 98862b975572240216b911a0fcfb63d2ce4140af Mon Sep 17 00:00:00 2001 From: Evan Anderson Date: Tue, 10 Sep 2019 17:25:10 -0700 Subject: [PATCH 4/5] Update Avro format definition to match avsc file. Signed-off-by: Evan Anderson --- avro-format.md | 52 ++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 48 insertions(+), 4 deletions(-) diff --git a/avro-format.md b/avro-format.md index 5f871d291..e67f87705 100644 --- a/avro-format.md +++ b/avro-format.md @@ -93,7 +93,44 @@ described by the [CloudEvent Avro Schema](./spec.avsc): }, { "name": "data", - "type": "bytes" + "type": [ + "bytes", + "null", + "boolean", + { + "type": "map", + "values": [ + "null", + "boolean", + { + "type": "record", + "name": "CloudEventData", + "doc": "Representation of a JSON Value", + "fields": [ + { + "name": "value", + "type": { + "type": "map", + "values": [ + "null", + "boolean", + { "type": "map", "values": "CloudEventData" }, + { "type": "array", "items": "CloudEventData" }, + "double", + "string" + ] + } + } + ] + }, + "double", + "string" + ] + }, + { "type": "array", "items": "CloudEventData" }, + "double", + "string" + ] } ] } @@ -101,9 +138,16 @@ described by the [CloudEvent Avro Schema](./spec.avsc): ## 3 Data -The `data` of the CloudEvent MUST be encoded in a top-level field called -`data` of type `bytes`. Additional encoding MUST NOT be done, regardless of the -`contenttype` attribute. +Before encoding, the AVRO serializer MUST first determine the runtime data type +of the content. This may be determined by examining the data for invalid UTF-8 +sequences or by consulting the `datacontenttype` attribute. + +If the implementation determines that the type of the data is binary, the value +MUST be stored in the `data` field using the `bytes` type. + +For other types (non-binary data without a `datacontenttype` attribute), the +implementation MUST translate the data value into a representation of the JSON +value using the union types described for the `data` record. ## 4 Examples From cef0a855cc85b260f0079f948128d00638a5b155 Mon Sep 17 00:00:00 2001 From: Evan Anderson Date: Fri, 13 Sep 2019 22:18:21 -0700 Subject: [PATCH 5/5] s/may/can/ Signed-off-by: Evan Anderson --- avro-format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/avro-format.md b/avro-format.md index e67f87705..0c1435138 100644 --- a/avro-format.md +++ b/avro-format.md @@ -139,7 +139,7 @@ described by the [CloudEvent Avro Schema](./spec.avsc): ## 3 Data Before encoding, the AVRO serializer MUST first determine the runtime data type -of the content. This may be determined by examining the data for invalid UTF-8 +of the content. This can be determined by examining the data for invalid UTF-8 sequences or by consulting the `datacontenttype` attribute. If the implementation determines that the type of the data is binary, the value