-
Notifications
You must be signed in to change notification settings - Fork 31
Items
An item is the basic unit of data processing in Catmandu. Items are data structures build of key-value-pairs (aka objects), lists (aka arrays), strings, numbers, and null-values. All items can be expressed in JSON and YAML, among other formats.
Internally all data processing by Catmandu is using a generic data format not unlike JSON. If one imports MARC, XML, Excel, OAI-PMH, SPARQL, data from a database or any other format, everything can be expressed as JSON.
For example:
- JSON/YAML - when importing a large JSON/YAML collections as an array, every item is a Catmandu item.
- Text - for text import every line of text is one Catmandu item.
- MARC - when importing MARC data, every record in a MARC file is one Catmandu item.
- XLS,CSV - for tabular formats such as Excel, CSV and TSV, each row in a table is one Catmandu item
- RDF - for linked data formats such as RDF/XML, RDF/nTriples, RDF/Turtle each triple is one Catmandu item
- SPARQL - for a result set of a SPARQL or LDF query, every result (with the variable bindings) is one Catmandu item
- MongoDB,ElasticSearch,Solr,DBI - for databases every record in the database is one Catmandu item
To transform items with the Fix language one points to the fields in items with a JSONPath expression (Catmandu uses an extension of JSONPath actually). The fixes provided to a catmandu command operate on all individual items.
For instance, the command below will upcase the publisher field for every item (row) in the data.xls file:
$ catmandu convert XLS --fix 'upcase(publisher)' < data.xls
This command will select only the JSON items that contain 'Tsjechov' in a nested authors field:
$ catmandu convert XLS --fix 'select any_match(authors.*,"Tsjechov.*")' < data.json
This command will delete all the uppercase A characters from a Text file:
$ catmandu convert Text to Text --fix 'replace_all(A,"")' < data.txt
To see the internal representation of a MARC file in Catmandu, transform it for instance to YAML
$ catmandu convert MARC to YAML < data.mrc
One will see that a MARC record is treated as an array of arrays for each item.