Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AVRO mappings to match current event schema and type system. #497

Merged
merged 5 commits into from
Sep 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 90 additions & 74 deletions avro-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

## Abstract

The Avro Format for CloudEvents defines how events attributes are expressed
in the [Avro 1.9.0 Specification][avro-spec].
The Avro Format for CloudEvents defines how events attributes are expressed in
the [Avro 1.9.0 Specification][avro-spec].

## Status of this document

Expand All @@ -13,7 +13,8 @@ This document is a working draft.

1. [Introduction](#1-introduction)
2. [Attributes](#2-attributes)
3. [Examples](#3-examples)
3. [Data](#3-data)
4. [Examples](#4-examples)

## 1. Introduction

Expand Down Expand Up @@ -47,108 +48,125 @@ The CloudEvents type system MUST be mapped to Avro types as follows.

| CloudEvents | Avro |
| ------------- | ---------------------------------------------------------------------- |
| Boolean | [boolean][avro-primitives] |
| Integer | [int][avro-primitives] |
| String | [string][avro-primitives] |
| Binary | [bytes][avro-primitives] |
| URI | [string][avro-primitives] following [RFC 3986 §4.3][rfc3986-section43] |
| URI-reference | [string][avro-primitives] following [RFC 3986 §4.1][rfc3986-section41] |
| Timestamp | [string][avro-primitives] following [RFC 3339][rfc3339] (ISO 8601) |
| Any | See [2.2](#22-mapping-any-typed-attributes) |

Extension specifications MAY define diverging mapping rules for the values of
attributes they define.

### 2.2 Mapping Any-typed Attributes

`Any`-typed CloudEvents values can either hold a `String`, or a `Binary` value,
or a `Map`, or any of all other types. Avro type system satisfies this requirement by employing a recursive reference,
where a `record` type is referenced as a value inside of its own `map`.

Example:

```json
{
"type":"record",
"name":"MyRecord",
"fields":[
{
"name":"wow",
"type":{
"type":"map",
"values":[
"null",
"string",
"MyRecord"
]
}
}
]
}
```

### 2.3 OPTIONAL Attributes

CloudEvents Spec defines OPTIONAL attributes. The Avro format defines that
these fields MUST use the `null` type and the actual type through
the [union][avro-unions].
CloudEvents Spec defines OPTIONAL attributes. The Avro format defines that these
fields MUST use the `null` type and the actual type through the
[union][avro-unions].

Example:

```json
[
"null",
"string"
]
["null", "string"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can undo this JSON auto-fomatting if preferred.

```

### 2.4 Definition

Users of Avro MUST use a message whose binary encoding is identical
to the one described by the [CloudEvent Avro Schema](./spec.avsc):
Users of Avro MUST use a message whose binary encoding is identical to the one
described by the [CloudEvent Avro Schema](./spec.avsc):

```json
{
"namespace":"io.cloudevents",
"type":"record",
"name":"CloudEvent",
"version":"0.4-wip",
"doc":"Avro Event Format for CloudEvents",
"fields":[
"namespace": "io.cloudevents",
"type": "record",
"name": "CloudEvent",
"version": "0.4-wip",
"doc": "Avro Event Format for CloudEvents",
"fields": [
{
"name":"attribute",
"type":{
"type":"map",
"values":[
"null",
"int",
"string",
"bytes",
"CloudEvent"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with Avro... is the "CloudEvent" above replaced by the stuff on line 95? (the data/bytes thingy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "CloudEvent" was a recursive allowed value of the same object, to support Map values.

Without Map, these structures don't need to be recursive in their definition, which makes me much more comfortable.

The actual diff here is +boolean -CloudEvent

]
"name": "attribute",
"type": {
"type": "map",
"values": ["null", "boolean", "int", "string", "bytes"]
}
},
{
"name": "data",
"type": [
"bytes",
"null",
"boolean",
{
"type": "map",
"values": [
"null",
"boolean",
{
"type": "record",
"name": "CloudEventData",
"doc": "Representation of a JSON Value",
"fields": [
{
"name": "value",
"type": {
"type": "map",
"values": [
"null",
"boolean",
{ "type": "map", "values": "CloudEventData" },
{ "type": "array", "items": "CloudEventData" },
"double",
"string"
]
}
}
]
},
"double",
"string"
]
},
{ "type": "array", "items": "CloudEventData" },
"double",
"string"
]
}
]
}
```

## 3 Examples
## 3 Data

Before encoding, the AVRO serializer MUST first determine the runtime data type
of the content. This can be determined by examining the data for invalid UTF-8
sequences or by consulting the `datacontenttype` attribute.

If the implementation determines that the type of the data is binary, the value
MUST be stored in the `data` field using the `bytes` type.

For other types (non-binary data without a `datacontenttype` attribute), the
implementation MUST translate the data value into a representation of the JSON
value using the union types described for the `data` record.

## 4 Examples

The following table shows exemplary mappings:

| CloudEvents | Type | Exemplary Avro Value |
| ------------ | --------- | ---------------------------------------------- |
| type | string | `"com.example.someevent"` |
| specversion | string | `"0.4-wip` |
| source | string | `"/mycontext"` |
| id | string | `"7a0dc520-c870-4193c8"` |
| time | string | `"2019-06-05T23:45:00Z"` |
| dataschema | string | `"http://registry.com/schema/v1/much.json"` |
| contenttype | string | `"application/json"` |
| data | string | `"{"much":{"wow":"json"}}"` |
||||
| dataschema | string | `"http://registry.com/subjects/ce/versions/1"` |
| contenttype | string | `"application/avro"` |
| data | string | `"Q2xvdWRFdmVudHM="` |
| CloudEvents | Type | Exemplary Avro Value |
| ----------- | ------ | ---------------------------------------------- |
| type | string | `"com.example.someevent"` |
| specversion | string | `"0.4-wip` |
| source | string | `"/mycontext"` |
| id | string | `"7a0dc520-c870-4193c8"` |
| time | string | `"2019-06-05T23:45:00Z"` |
| dataschema | string | `"http://registry.com/schema/v1/much.json"` |
| contenttype | string | `"application/json"` |
| data | bytes | `"{"much":{"wow":"json"}}"` |
| | | |
| dataschema | string | `"http://registry.com/subjects/ce/versions/1"` |
| contenttype | string | `"application/avro"` |
| data | string | `"Q2xvdWRFdmVudHM="` |

## References

Expand All @@ -158,9 +176,7 @@ The following table shows exemplary mappings:
[avro-primitives]: http://avro.apache.org/docs/1.9.0/spec.html#schema_primitive
[avro-logical-types]: http://avro.apache.org/docs/1.9.0/spec.html#Logical+Types
[avro-unions]: http://avro.apache.org/docs/1.9.0/spec.html#Unions

[ce]: ./spec.md

[rfc2119]: https://tools.ietf.org/html/rfc2119
[rfc3986-section41]: https://tools.ietf.org/html/rfc3986#section-4.1
[rfc3986-section43]: https://tools.ietf.org/html/rfc3986#section-4.3
Expand Down
46 changes: 44 additions & 2 deletions spec.avsc
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,54 @@
"type":"map",
"values":[
"null",
"boolean",
"int",
"string",
"bytes",
"CloudEvent"
"bytes"
]
}
},
{
"name": "data",
"type": [
"bytes",
"null",
"boolean",
{
"type": "map",
"values": [
"null",
"boolean",
{
"type": "record",
"name": "CloudEventData",
"doc": "Representation of a JSON Value",
"fields": [
{
"name": "value",
"type": {
"type": "map",
"values": [
"null",
"boolean",
{ "type": "map", "values": "CloudEventData" },
{ "type": "array", "items": "CloudEventData" },
"double",
"string"
]
}
}
]
},
"double",
"string"
]
},
{ "type": "array", "items": "CloudEventData" },
"double",
"string"
]
}
]
}