forked from dcodeIO/PSON
-
Notifications
You must be signed in to change notification settings - Fork 2
/
PSONspec.txt
340 lines (214 loc) · 9.66 KB
/
PSONspec.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
Working Draft D. Wirtz
Version 2 July 2013
Protocol JSON - PSON
Status of This Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (c) 2013 Daniel Wirtz
Abstract
Protocol JSON (PSON) is a lightweight, binary, language-independent
data interchange format. PSON defines a set of encoding rules for
the portable representation of structured data.
1. Introduction
Protocol JSON (PSON) is a binary format for the serialization of
structured data. It is derived from JavaScript Object Notation
(JSON), as defined in [RFC4627].
PSON can represent five primitive types (strings, numbers, booleans,
null and raw bytes) and two structured types (objects and arrays).
A string is a sequence of zero or more UTF-8 characters [UNICODE].
An object is an unordered collection of zero or more name/value
pairs, where a name is a string and a value is a string, number,
boolean, null, object or array.
An array is an ordered sequence of zero or more values.
The terms "object" and "array" come from the conventions of
JavaScript.
A variable length integer (varint) is a base 128 variable length
integer as described in the Encoding section of the Protocol Buffers
(protobuf) developer guide.
PSON's design goals were for it to be small, portable, binary and
a superset of JSON.
1.1. Conventions Used in This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
The grammatical rules in this document are to be interpreted as
described in [RFC4234].
2. PSON Grammar
PSON data is a sequence of tokens, varints and arbitrary bytes.
2.1 Values
A PSON value MUST start with a token and MAY be followed by
a zig-zag encoded varint
or
an unsigned varint determining the length of arbitrary byte data
following
or
an unsigned varint determining the number of PSON values
following.
No other combinations are allowed.
There are 256 tokens:
ZERO = %x00 ; 0
NEGONE = %x01 ; -1
ONE = %x02 ; +1
...
MAX = %xEF ; -120
NULL = %xF0
TRUE = %xF1
FALSE = %xF2
EOBJECT = %xF3
EARRAY = %xF4
ESTRING = %xF5
OBJECT = %xF6 ; + varint32 + key/values
ARRAY = %xF7 ; + varint32 + values
INTEGER = %xF8 ; + varint32
LONG = %xF9 ; + varint64
FLOAT = %xFA ; + float32
DOUBLE = %xFB ; + float64
STRING = %xFC ; + varint32 + bytes
STRING_ADD = %xFD ; + varint32 + bytes
STRING_GET = %xFE ; + varint32
BINARY = %xFF ; + varint32 + bytes
2.2. Boolean
A boolean value evaluating to true MUST be encoded as the token:
TRUE = %xF1
A boolean value evaluating to false MUST be encoded as the token:
FALSE = %xF2
2.2. Numbers
2.2.1. Integer
Integer values greater than or equal -120 and less than or equal
119 SHOULD be encoded as a token / single byte beginning at
ZERO = %x00 ; 0
and ending at
MAX = %0xEF ; -120
corresponding to the value's zig-zag encoded varint representation.
Otherwise and values less than -120 or greater than 119 MUST be
encoded as the token
INTEGER = %xF8
followed by its value as a zig-zag encoded 32 bit varint.
If an integer value exceeds 32 bits of information and thus does not
fit into a zig-zag encoded 32 bit varint, it SHOULD be encoded as
the token
LONG = %xF9
followed by its value as a zig-zag encoded 64 bit varint or MUST be
reduced to 32 bits otherwise which MAY rise a warning.
2.2.2. Floating point
A 32 bit float SHOULD be encoded as the token
FLOAT = %xFA
followed by the little endian 32 bit float value.
Otherwise and a 64 bit double precision float MUST be encoded as
the token
DOUBLE = %xFB
followed by the little endian 64 bit float value.
If a 64 bit float can be converted to a 32 bit float without losing
any information, it SHOULD be encoded as a 32 bit float instead.
If a float can be converted to an integer without losing any
information, it SHOULD be encoded as an integer.
2.2. Arrays
An array with zero elements SHOULD be encoded as the token
EARRAY = %xF4
Otherwise and arrays with one or more elements MUST be encoded as
the token
ARRAY = %xF7
followed by the number of elements as an unsigned 32 bit varint
followed by all elements as a PSON encoded value.
If a value evaluates to the JavaScript constant
undefined
it must instead be encoded as the token:
NULL = %xF0
2.3. Objects
An object evaluating to the JavaScript constant
null
MUST be encoded as the token:
NULL = %xF0
An object with zero key/value pairs SHOULD be encoded as the token:
EOBJECT = %xF3
Otherwise it and objects with one or more key/value pairs MUST be
encoded as the token
OBJECT = %xF6
followed by the number of key/value pairs as an unsigned 32 bit
varint followed by the alternating keys and values as PSON encoded
values.
If a value inside of an object evaluates to the JavaScript constant
undefined
the corresponding key/value pair MUST be omitted.
Order of key/value pairs SHOULD be preserved if supported by the
language runtime.
2.4. Strings
A string with zero characters SHOULD be encoded as the token:
ESTRING = %xF5
Otherwise it and strings with one or more characters MUST be encoded
as the token
STRING = %xFC
followed by the number of raw bytes as an unsigned 32 bit varint
followed by the UTF-8 encoded raw bytes.
2.5. Binary data
Binary data MUST be encoded as the token
BINARY = %xFF
followed by the number of raw bytes as an unsigned 32 bit varint
followed by the raw bytes.
2.6. undefined
In PSON there is no token for a value that equals the JavaScript
constant
undefined
and a value evaluating to undefined MUST either be skipped if it is
a value inside of an object or, otherwise, be encoded as if it would
equal the JavaScript constant:
null
3. Dictionaries
3.1. Progressive substitution
In addition to encoding strings as defined in 2.4, strings SHOULD
also be stored in a dictionary if requested by the application on
the encoding side.
A string that is not yet present in the dictionary SHOULD be added
to the dictionary on the encoding side. If a key is added to the
dictionary on the encoding side, it MUST be assigned the value of
the number of elements contained in the dictionary before the value
has been added (index) and, instead of being encoded like in 2.4,
be encoded as the token
STRING_ADD = %0xFD
followed by the number of raw bytes as an unsigned 32 bit varint
followed by the UTF-8 encoded raw bytes.
When the decoding side decodes a string that has been encoded in
this way, it MUST add the value to its dictionary and assign it
the value of the number of elements contained in the dictionary
before the value has been added (index).
A string that has previously been added to the dictionary SHOULD,
instead of being encoded as in 2.4, be encoded as the token
STRING_GET = %0xFE
followed by the previously assigned index as an unsigned 32 bit
varint.
When the decoding side decodes a string that has been encoded
in this way, it MUST look up the index in the dictionary and
return the remembered string value instead.
3.2. Static substitution
In addition to adding string values to the dictionary as defined in
3.1, the initial dictionary MAY be negotiated between the encoding
and the decoding side prior to encoding/decoding any values.
If the encoding side uses static substitution, the decoding side MUST
use the same dictionary entries in the same order.
4. Encoding
All floating point values MUST be encoded in little endian byte
order.
All string values MUST be encoded as UTF-8.
5. Decoding
A decoder MUST be able to process all data types defined in this
document. It SHOULD return the corresponding values if available in
the language runtime and MAY rise a warning otherwise.
6. MIME media type
The MIME media type for PSON is application/octet-stream.
Author's Address
Daniel Wirtz
dcode.io
EMail: dcode@dcode.io
Full Copyright Statement
Copyright 2013 Daniel Wirtz <dcode@dcode.io>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.