Skip to content

Latest commit

 

History

History
58 lines (55 loc) · 1.77 KB

sql-data-validator.md

File metadata and controls

58 lines (55 loc) · 1.77 KB

The SqlDataValidator is for identifying invalid records in a given view, and returns a new view with all valid records.

  • Either the validWhere or invalidWhere property needs to be defined in order to identify valid or invalid records. The validWhere/invalidWhere is a standard sql statement in where clause.
  • The action property must be one of the following values:
    • error: if invalid records found, error out the process
    • staging: all invalid records needs to be staged:
      • staging.uri: location where to stage invalid records
      • staging.format: the file format in which invalid records are staged.
      • staging.options: the options for staging invalid records.
    • ignore: discard all invalid records, and move forward the process with all valid records.
  • The view property defines the dataframe to be validated.

Actor Class: com.qwshen.etl.validation.SqlDataValidator

The definition of the SqlDataValidator:

  • in YAML
  actor:
    type: sql-data-validator
    properties:
      validWhere: gender in ('M', 'Male', 'F', 'Female')
      action: staging
      staging:
        uri: /tmp/bad-data/users
        format: parquet
      view: users
  • in JSON
  {
    "actor": {
      "type": "sql-data-validator",
      "properties": {
        "invalidWhere": "gender not in ('M', 'Male', 'F', 'Female')",
        "actor": "staging",
        "staging": {
          "uri": "/tmp/bad-data/users",
          "format": "csv",
          "options": {
            "header": "true",
            "delimiter": ","
          }
        },
        "view": "users"
      }
    }
  }
  • in XML
  <actor type="sql-data-validator">
    <properties>
      <validWhere>gender in ('M', 'Male', 'F', 'Female')</validWhere>
      <action>error</action>
    </properties>
  </actor>