This repository contains a command-line tool and a library that can handle JSONLine format.
- preserve order of keys
- format values to valid JSON types : string, numeric, boolean
- handle specific format that are not part of the JSON specification : binary, datetime, time, timestamp
- read JSON lines with specific underlying var type (e.g. store the binary string read from JSON inside a int64)
- validate format of input lines
Order keys and enforce format of JSON lines.
Usage:
jl [flags]
Examples:
jl -t '{"first":"string","second":"string"}' <dirty.jsonl
Flags:
-t, --template string row template definition (-t {"name":"format"} or -t {"name":"format(type)"}) or -t {"name":"format(type):format"})
possible formats : string, numeric, boolean, binary, datetime, time, timestamp, auto, hidden
possible types : int, int64, int32, int16, int8, uint, uint64, uint32, uint16, uint8, float64, float32, bool, byte, rune, string, []byte, time.Time, json.Number (default "{}")
-f, --filename string name of row template filename (default "./row.yml")
-v, --verbosity string set level of log verbosity : none (0), error (1), warn (2), info (3), debug (4), trace (5) (default "error")
--debug add debug information to logs (very slow)
--log-json output logs in JSON format
--color string use colors in log outputs : yes, no or auto (default "auto")
-h, --help help for jl
--version version for jl
Look at this file.
{"title":"Jurassic Park", "year":1993, "release-date": 739828800}
{"year":1999, "release-date": "922910400", "title":"The Matrix", "running-time":136}
{"title":"Titanic", "running-time":"195", "release-date": "1997-12-19T08:00:00-04:00", "director":"James Cameron"}
Let's define a template in a configuration file named row.yml
, it will help organize columns.
columns:
- name: "title"
- name: "director"
- name: "year"
output: "numeric"
- name: "running-time"
output: "numeric"
- name: "release-date"
output: "datetime"
Use the jl
command line to enforce line format.
$ jl <movies.jsonl
{"title":"Jurassic Park","director":null,"year":1993,"running-time":null,"release-date":"1993-06-11T22:00:00+02:00"}
{"title":"The Matrix","director":null,"year":1999,"running-time":136,"release-date":"1999-03-31T22:00:00+02:00"}
{"title":"Titanic","director":"James Cameron","year":null,"running-time":195,"release-date":"1997-12-19T08:00:00-04:00"}
Finally, let's improve output display a bit with mlr.
$ jl <movies.jsonl | mlr --j2p --barred cat
+---------------+---------------+------+--------------+---------------------------+
| title | director | year | running-time | release-date |
+---------------+---------------+------+--------------+---------------------------+
| Jurassic Park | - | 1993 | - | 1993-06-11T22:00:00+02:00 |
| The Matrix | - | 1999 | 136 | 1999-03-31T22:00:00+02:00 |
| Titanic | James Cameron | - | 195 | 1997-12-19T08:00:00-04:00 |
+---------------+---------------+------+--------------+---------------------------+
Columns definition can also be defined by argument in command line, using the -t
flag (or --template
).
# give the same result as previous command
jl -t '{"title":"","director":"","year":"numeric","running-time":"numeric","release-date":"datetime"}' <movies.jsonl
A row definition can contain sub rows.
columns:
- name: "title"
- name: "director"
# this is a sub-row definition, it will be added if missing from the input
- name: "producer"
columns:
- name: "first-name"
- name: "last-name"
# template version
jl -t '{"title":"string","director":"","producer":{"first_name":"","last_name":""}}' <movies.jsonl
Check this file, it stores int64 integers in binary format.
{"value":"AgAAAAAAAAA="}
{"value":"KgAAAAAAAAA="}
{"value":"aGVsbG8="}
But one of the lines is invalid (3rd line can't be an integer 64bit because it's only 5 bytes).
$ # this command doesn't catch the invalid value
$ jl -t '{"value":"binary"}' < file.jsonl
{"value":"AgAAAAAAAAA="}
{"value":"KgAAAAAAAAA="}
{"value":"aGVsbG8="}
$ # this command will catch the invalid value because the value will be cast to int64
$ jl -t '{"value":"binary(int64)"}' < file.jsonl
{"value":"AgAAAAAAAAA="}
{"value":"KgAAAAAAAAA="}
3:54PM ERR failed to process JSON line error="can't import type []uint8 to int64 format: unable to cast value to int64: []uint8([104 101 108 108 111])" line-number=2
# same effect but with YAML configuration
columns:
- name: "value"
output: "binary(int64)"
Valid types are : int
, int64
, int32
, int16
, int8
, uint
, uint64
, uint32
, uint16
, uint8
, float64
, float32
, bool
, byte
, rune
, string
, []byte
, time.Time
, json.Number
columns:
# this column will be read as a datetime, and written as a timestamp
- name: "release-date"
input: "datetime"
output: "timestamp"
{"release-date": 739828800}
{"release-date": "922910400"}
{"release-date": "1997-12-19T08:00:00-04:00"}
$ jl <movies.jsonl
{"release-date":739828800}
{"release-date":922910400}
{"release-date":882532800}
Check the examples folder.
// Add in your go file
import "github.com/cgi-fr/jsonline/pkg/jsonline"
A row is like a map, it store key/value pairs, but you decide how it will output the values in JSON. Different format options are available : String, Numeric, DateTime, Time, Timestamp, Binary, ...
row := jsonline.NewRow()
row.Set("address", jsonline.NewValueString("123 Main Street, New York, NY 10030"))
row.Set("last-update", jsonline.NewValueDateTime(time.Now()))
But, unlike a map, the keys will always appear in the order of insertion. Dates and times will be formatted to RFC3339.
// Will print : {"address":"123 Main Street, New York, NY 10030","last-update":"2021-09-25T08:51:10+02:00"}
fmt.Println(row)
A template defines a JSONLine structure.
template := jsonline.NewTemplate().WithString("name").WithNumeric("age").WithDateTime("birthdate")
The template can create rows for you, give it either a map, or a slice. Keys order and format will be enforced for every rows created from the template.
person1, err := template.CreateRow([]interface{}{"Dorothy", 30, time.Date(1991, time.September, 24, 21, 21, 0, 0, time.UTC)})
if err == nil {
fmt.Println(person1) // {"name":"Dorothy","age":30,"birthdate":"1991-09-24T21:21:00Z"}
}
Templates can contains sub templates, to nest JSON objects.
template = template.WithRow("house", jsonline.NewTemplate().WithString("address").WithDateTime("last-update"))
person1.Set("house", row)
fmt.Println(person1) // {"name":"Dorothy","age":30,"birthdate":"1991-09-24T21:21:00Z","house":{"address":"123 Main Street, New York, NY 10030","last-update":"2021-09-25T09:22:54+02:00"}}
Standard Go interface Marshaler and Unmarshaler are supported.
b, err := person1.MarshalJSON()
fmt.Println(string(b)) // same result as fmt.Println(person1)
person2 := jsonline.NewRow()
person2.UnmarshalJSON(b)
fmt.Println(person2) // same result as fmt.Println(person1)
Extra field that are not defined in the template will appear at the end of the JSONLine.
person3, err := template.CreateRow(map[string]interface{}{"name":"Alice", "extra":true, "age":17, "birthdate":time.Date(2004, time.June, 15, 21, 8, 47, 0, time.UTC)})
if err == nil {
fmt.Println(person3) // {"name":"Alice","age":17,"birthdate":"2004-06-15T21:08:47Z","extra":true}
} else {
fmt.Println("ERROR:", err)
}
An exporter will write objects as JSON lines into os.Writer.
exporter := jsonline.NewExporter(os.Stdout).WithTemplate(template) // or template.GetExporter(os.Stdout)
exporter.Export([]interface{}{"Dorothy", 30, time.Date(1991, time.September, 24, 21, 21, 0, 0, time.UTC)})
exporter.Export([]interface{}{"Alice", 17, time.Date(2004, time.June, 15, 21, 8, 47, 0, time.UTC)})
An importer will read JSON lines from os.Reader.
for importer := template.GetImporter(os.Stdin); importer.Import(); { // or importer := jsonline.NewImporter(os.Stdin).WithTemplate(template)
row, err := importer.GetRow()
if err != nil {
fmt.Println("an error occurred!", err)
} else {
fmt.Println(row)
}
}
A streamer will process JSON lines from os.Reader to os.Writer.
importer := template.GetImporter(os.Stdin)
exporter := template.GetExporter(os.Stdout)
streamer := jsonline.NewStreamer(importer, exporter)
streamer.Stream()
Copyright (C) 2021 CGI France
JL is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
JL is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
See the LICENSE file for more information.
Some files contains a GPL linking exception to allow linking of modules that are not derived from or based on this library (everything under pkg folder).