-
Notifications
You must be signed in to change notification settings - Fork 1
Workflow
The NIAS workflow uses a chain of microservices, receiving content from a REST call, and passing it through different topics in Kafka. The chain of calls is summarised below.
-
HTTP POST xml to
/:topic/:stream/bulk
>> Kafka topic:sifxml.bulkingest
, message:TOPIC topic.stream
+ xml-fragment1, xml-fragment2, xml-fragment3 ...- HTTP POST xml to
/:topic/:stream
>> Kafka topic:sifxml.ingest
, message:TOPIC topic.stream
+ xml
- HTTP POST xml to
-
Kafka topic:
sifxml.bulkingest
, message:TOPIC topic.stream
+ xml-fragment1, xml-fragment2, xml-fragment3 ... >> Kafka topic:sifxml.validated
, message:TOPIC topic.stream 1:n:doc-id
+ xml-node1,TOPIC topic.stream 2:n:doc-id
+ xml-node2,TOPIC topic.stream 3:n:doc-id
+ xml-node3 ...- Kafka topic:
sifxml.ingest
, message:TOPIC topic.stream
+ xml ... >> Kafka topic:sifxml.validated
, message:TOPIC topic.stream 1:n:doc-id
+ xml-node1,TOPIC topic.stream 2:n:doc-id
+ xml-node2,TOPIC topic.stream 3:n:doc-id
+ xml-node3 ...
- Kafka topic:
-
Kafka topic:
sifxml.validated
, message m >> Kafka topic:sifxml.processed
, message m -
Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.none
, message: xml-node- Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.low
, message: redacted_extreme(xml-node) - Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.medium
, message: redacted_high(redacted_extreme(xml-node)) - Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.high
, message: redacted_medium(redacted_high(redacted_extreme(xml-node))) - Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.extreme
, message: redacted_low(redacted_medium(redacted_high(redacted_extreme(xml-node))))
- Kafka topic:
-
Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:sms.indexer
, message: tuple representing graph information of xml-node -
Kafka topic:
sms.indexer
, message: tuple representing graph information of xml-node >> Redis: sets representing the contribution of xml-node to the graph of nodes -
Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Moneta key: xml-node RefId, value: xml-node -
HTTP POST staffcsv to
/:naplan/:csv_staff/
-
Kafka topic:
naplan.csv_staff
, message: staffcsv -
Kafka topic:
sifxml.ingest
, message:TOPIC naplan.sifxmlout_staff
+ csv2sif(staffcsv) -
Kafka topic:
naplan.sifxmlout_staff.none
, message: csv2sif(staffcsv) -
Moneta key:
naplan.csv_staff
::LocalId(staffcsv), value: staffcsv
-
-
HTTP POST studentcsv to
/:naplan/:csv/
>> Kafka topic:naplan.csv
, message: studentcsv-
Kafka topic:
sifxml.ingest
, message:TOPIC naplan.sifxmlout
+ csv2sif(studentcsv) -
Kafka topic:
sifxml.processed
, message:TOPIC naplan.sifxmlout
+ csv2sif(studentcsv) + Platform Student Identifier -
Kafka topic:
naplan.sifxmlout.none
, message: csv2sif(studentcsv) + Platform Student Identifier -
Moneta key:
naplan.csv
::LocalId(studentcsv), value: studentcsv -
Kafka topic:
naplan.filereport
, message: report on contents of file
-
-
HTTP POST staffxml to
/:naplan/:sifxml_staff/
-
Kafka topic:
naplan.sifxml_staff.none
, message: staffxml -
Kafka topic:
naplan.csvstaff_out
, message: sif2csv(staffxml) -
Moneta key: RefId(staffxml), value: staffxml
-
-
HTTP POST studentxml to
/:naplan/:sifxml/
-
Kafka topic:
sifxml.processed
, message: studentxml + Platform Student Identifier -
Kafka topic:
naplan.sifxml.none
, message: studentxml + Platform Student Identifier -
Kafka topic:
naplan.csv_out
, message: sif2csv(studentxml) -
Moneta key: RefId(studentxml), value: studentxml
-
Kafka topic:
naplan.filereport
, message: report on contents of file
-
-
ssf/ssf_server.rb
-
Input: HTTP POST xml to
/:topic/:stream
- Format: xml : XML, assumed to be SIF collection
-
Output: Kafka stream sifxml.ingest
-
Content: xml with header
TOPIC: topic.stream
-
Content: xml with header
- Constraints: XML less than 1 MB
-
Input: HTTP POST xml to
-
ssf/ssf_server.rb
-
Input: HTTP POST json to
/:topic/:stream
- Format: json : JSON
-
Output1: Kafka stream
/:topic/:stream
- Content: json
-
Output2: Kafka stream
json.storage
-
Content: json with header
TOPIC: topic.stream
-
Content: json with header
- Constraints: JSON less than 1 MB
-
Input: HTTP POST json to
-
ssf/ssf_server.rb
-
Input: HTTP POST csv to
/:topic/:stream
- Format: CSV
-
Output1: Kafka stream
/csv/errors
- Content: csv errors
- Constraints: CSV is malformed
-
Output1: Kafka stream
/csv/errors
- Content: csv errors
-
Constraints: CSV violates CSV schema, for specific topics with defined schemas (
naplan.csv_staff
,naplan.csv
)
-
Output3: Kafka stream
/:topic/:stream
- Content: csv parsed into JSON
- Constraints: CSV is valid
-
Output4: Kafka stream
json.storage
-
Content: csv parsed into JSON with header
TOPIC: topic.stream
- Constraints: CSV is valid
-
Content: csv parsed into JSON with header
- Constraints: XML less than 1 MB
-
Input: HTTP POST csv to
-
ssf/ssf_server.rb
-
Input: HTTP POST xml to
/:topic/:stream/bulk
- Format: XML, assumed to be SIF collection
-
Output: Kafka stream sifxml.bulkingest
-
Content: xml with header
TOPIC: topic.stream
, split up into blocks of 950 KB, each but the last terminating in===snip nn===
-
Content: xml with header
- Constraints: XML less than 500 MB
-
Input: HTTP POST xml to
-
ssf/ssf_server.rb
-
Input: HTTP GET from
/:topic/:stream
-
Output: Stream of messages from Kafka stream
/:topic/:stream
-
Content: JSON objects with keys
data
,key
,consumer_offset
,hwm
(= highwater mark),restart_from
(= next offset)
-
Content: JSON objects with keys
-
Input: HTTP GET from
-
ssf/ssf_server.rb
-
Input: HTTP GET from
/csverrors
-
Output: Stream of messages from Kafka streams
csv.errors
,sifxml.errors
,naplan.srm_errors
- *Content: Header:
m:n:i type
followed by the error message- type: type of error reported
- m: ordinal number of message within the error type
- n: total number of error messages within the error type
- i: record id, if the errors are being reported per record rather than for an entire file
- *Content: Header:
-
Input: HTTP GET from
-
cons-prod-sif-bulk-ingest-validate.rb
-
Input: Kafka stream sifxml.bulkingest
-
Format: sequence of XML fragments xml-fragment1, xml-fragment2, xml-fragment3, all but the last terminating in
===snip nn===
; whole message has headerTOPIC: topic.stream
-
Format: sequence of XML fragments xml-fragment1, xml-fragment2, xml-fragment3, all but the last terminating in
-
Output 1: Kafka stream sifxml.errors
-
Content: All well-formedness or validation errors from parsing xml, the concatenation of xml-fragment1, xml-fragment2, xml-fragment3 ... (with
===snip nn===
stripped out) - Constraint: xml is malformed or invalid
-
Content: All well-formedness or validation errors from parsing xml, the concatenation of xml-fragment1, xml-fragment2, xml-fragment3 ... (with
-
Output 2: Kafka stream sifxml.validated
-
Content: xml-node1, xml-node2, xml-node3, ... : one message for each child node of xml, each message with header
TOPIC: topic.stream m:n:doc-id
(m-th record out of n, with a hash used to identify the document that the messages came from) - Constraint: xml is valid
-
Content: xml-node1, xml-node2, xml-node3, ... : one message for each child node of xml, each message with header
-
Input: Kafka stream sifxml.bulkingest
-
cons-prod-sif-ingest-validate.rb
-
Input: Kafka stream sifxml.ingest
- Format: SIF/XML collection xml
-
Output 1: Kafka stream sifxml.errors
- Content: All well-formedness or validation errors from parsing xml. Add CSV original line if SIF document originates in CSV.
- Constraint: xml is malformed or invalid
-
Output 2: Kafka stream sifxml.validated
-
Content: xml-node1, xml-node2, xml-node3, ... : one message for each child node of xml, each message with header
TOPIC: topic.stream m:n:doc-id
(m-th record out of n, with a hash used to identify the payload that the messages came from) - Constraint: xml is valid
-
Content: xml-node1, xml-node2, xml-node3, ... : one message for each child node of xml, each message with header
-
Input: Kafka stream sifxml.ingest
-
cons-prod-sif-process.rb
-
Input: Kafka stream sifxml.ingest
- Format: SIF/XML collection xml
-
Output 1: Kafka stream sifxml.processed
- Content: xml subject to any needed processing
-
Input: Kafka stream sifxml.ingest
-
cons-prod-privacyfilter.rb
-
Input: Kafka stream sifxml.processed
-
Format: SIF/XML object xml-node with header
TOPIC: topic.stream m:n:doc-id
-
Format: SIF/XML object xml-node with header
-
Output 1: Kafka stream
topic.stream
.none- Content: xml-node with all content matching xpaths in privacyfilters/
-
Output 2: Kafka stream
topic.stream
.low-
Content: xml-node with all content matching xpaths in
privacyfilters/extreme.xpath
redacted
-
Content: xml-node with all content matching xpaths in
-
Output 3: Kafka stream
topic.stream
.medium-
Content: xml-node with all content matching xpaths in
privacyfilters/extreme.xpath
,privacyfilters/high.xpath
redacted
-
Content: xml-node with all content matching xpaths in
-
Output 4: Kafka stream
topic.stream
.high-
Content: xml-node with all content matching xpaths in
privacyfilters/extreme.xpath
,privacyfilters/high.xpath
,privacyfilters/medium.xpath
redacted
-
Content: xml-node with all content matching xpaths in
-
Output 5: Kafka stream
topic.stream
.extreme-
Content: xml-node with all content matching xpaths in
privacyfilters/extreme.xpath
,privacyfilters/high.xpath
,privacyfilters/medium.xpath
,privacyfilters/low.xpath
redacted
-
Content: xml-node with all content matching xpaths in
-
Input: Kafka stream sifxml.processed
-
cons-prod-sif-parser.rb
-
Input: Kafka stream sifxml.processed
-
Format: SIF/XML object xml-node with header
TOPIC: topic.stream m:n:doc-id
-
Format: SIF/XML object xml-node with header
-
Output: Kafka stream sms.indexer
-
Content: JSON tuple used to represent graphable information about xml-node:
-
type
=> xml-node name -
id
=> xml-node RefId -
links
=> other GUIDs in xml-node -
label
=> human-readable label of xml-node -
otherids
=> any alternate identifiers in xml-node
-
-
Content: JSON tuple used to represent graphable information about xml-node:
-
Input: Kafka stream sifxml.processed
-
cons-sms-indexer.rb
-
Input: Kafka stream sms.indexer
- Format: JSON tuple used to represent graphable information about xml-node
-
Output: Redis database
- Content: Sets representing the contribution of xml-node to the graph of nodes
-
Input: Kafka stream sms.indexer
-
cons-sms-storage.rb
-
Input: Kafka stream sifxml.processed
-
Format: SIF/XML object xml-node with header
TOPIC: topic.stream m:n:doc-id
-
Format: SIF/XML object xml-node with header
-
Output: Moneta Key/Value database
- Content: Key: xml-node RefId; Value: xml-node
-
Input: Kafka stream sifxml.processed
-
cons-sms-json-storage.rb
-
Input: Kafka stream json.storage
-
Format: JSON object json with header
TOPIC: topic.stream m:n:doc-id
-
Format: JSON object json with header
-
Output: Moneta Key/Value database
- Content: Key: topic.stream::id (where id is extracted from record); Value: json
-
Input: Kafka stream json.storage
-
cons-prod-csv2sif-staffpersonal-naplanreg-parser.rb
-
Input: Kafka stream naplan.csv_staff
- Format: CSV following NAPLAN specification for staff records
-
Output1: Kafka stream sifxml.ingest
-
Content:
TOPIC: naplan.sifxmlout_staff
+CSV line {linenumber}
+ XML following NAPLAN specification for staff records - Constraint: CSV record has NAPLAN staff header LocalStaffId
- Note: this payload will be validated as all other SIF payloads. If valid, it will end up on
naplan.sifxmlout_staff.none
-
Content:
-
Output2: Kafka stream csv.errors
- Content: Alert that wrong record type uploaded
- Constraint: CSV record has NAPLAN student header LocaStaffId
-
Input: Kafka stream naplan.csv_staff
-
cons-prod-csv2sif-studentpersonal-naplanreg-parser.rb
-
Input: Kafka stream naplan.csv
- Format: CSV following NAPLAN specification for student records
-
Output1: Kafka stream sifxml.ingest
-
Content:
TOPIC: naplan.sifxmlout
+CSV line {linenumber}
+ XML following NAPLAN specification for student records -
Constraint : JSON of CSV validates against
sms/services/naplan.student.json
- Note: this payload will be validated as all other SIF payloads. If valid, it will end up on
naplan.sifxmlout.none
-
Content:
-
Output2: Kafka stream csv.errors
- Content: Alert that wrong record type uploaded
- Constraint: CSV record has NAPLAN staff header LocalId
-
Input: Kafka stream naplan.csv
-
cons-prod-sif2csv-studentpersonal-naplanreg-parser.rb
-
Input: Kafka stream naplan.sifxml_staff.none
- Format: XML following NAPLAN specification for staff records
-
Output: Kafka stream naplan.csvstaff_out
- Content: CSV following NAPLAN specification for staff records
-
Input: Kafka stream naplan.sifxml_staff.none
-
cons-prod-sif2scv-studentpersonal-naplanreg-parser.rb
-
Input: Kafka stream naplan.sifxml.none
- Format: XML following NAPLAN specification for student records
-
Output: Kafka stream naplan.csv_out
- Content: CSV following NAPLAN specification for student records
-
Input: Kafka stream naplan.sifxml.none
-
cons-prod-studentpersonal-naplanreg-unique-ids-storage.rb
-
Input: Kafka stream sifxml.processed
- Format: XML following NAPLAN specification for student records
-
Output1: Redis database
- Content: Key: school+localid::(ASL School Id + Local Id) ; Value: XML RefId
- Constraint: Key is unique in Redis
-
Output2: Redis database
- Content: Key: school+name+dob::(ASL School Id + Given Name + Last Name + Date Of Birth) ; Value: XML RefId
- Constraint: Key is unique in Redis
-
Output3: Kafka stream naplan.srm_errors
- Content: Error report on duplicate key
- Constraint: Key school+localid::(ASL School Id + Local Id) is not unique in Redis; Key: school+name+dob::(ASL School Id + Given Name + Last Name + Date Of Birth) is not unique in Redis, and xml has not been converted from CSV file
-
Output3: Kafka stream naplan.srm_errors
- Content: Error report on duplicate key
- Constraint: Key school+localid::(ASL School Id + Local Id) is not unique in Redis; Key: school+name+dob::(ASL School Id + Given Name + Last Name + Date Of Birth) is not unique in Redis, and xml has been converted from CSV file, containing a comment with CSV line number
-
Input: Kafka stream sifxml.processed
-
cons-prod-naplan-studentpersonal-process-sif.rb
-
Input: Kafka stream sifxml.validated
- Format: XML following NAPLAN specification for student records
-
Output: Kafka stream sifxml.processed
- Content: XML following NAPLAN specification for student records + Platform Student Identifier (if not already supplied)
-
Constraint: XML source topic is
naplan.sifxml
ornaplan.sifxmlout
-
Input: Kafka stream sifxml.validated
-
cons-prod-sif2csv-SRM-validate.rb
-
Input: Kafka stream sifxml.processed
- Format: XML following NAPLAN specification for student records
-
Output: Kafka stream naplan.srm_errors
- Content: Any instances where XML violates the Student Registration Management system's validation rules
-
Constraint: XML source topic is
naplan.sifxml
ornaplan.sifxml_staff
-
Input: Kafka stream sifxml.processed
-
cons-prod-file-report.rb
-
Input: Kafka stream sifxml.processed
- Format: XML following NAPLAN specification for student records
-
Output: Kafka stream naplan.filereport
- Content: Report of number of schools, students, and student per year level, for each new doc-id seen (representing a new payload loaded into NIAS)
-
Constraint: XML source topic is
naplan.sifxml
ornaplan.sifxml_staff
-
Input: Kafka stream sifxml.processed