-
Notifications
You must be signed in to change notification settings - Fork 41
Execute JDBC
Allows the execution arbitrary SQL statements through JDBC and user specified connection. Any valid JDBC statements for the specified driver can be used.
The list of statements are not transactional - each statement will be committed if successful. Due to this, it is highly recommended that each statement be idempotent (able to repeat itself without errors).
As a prerequisite, Herd must be allowed connection to the target database and the JDBC driver must be available to Herd's web container.
- This can be achieved by adding the datamgt-jdbc-support security group to your Redshift cluster to allow Herd to talk to your cluster.
Optionally, you may provide a S3 ARN to a Java properties file to be used to store sensitive information. See section "Security Considerations" for details on usage.
-
This service is not meant to be used as a mechanism for data processing. This service should be used to query for data required workflow logic. Therefore, Herd will impose some limits on the size of the requests and responses.
-
The number of statements per request are limited to 5 statements.
-
The number of rows returned per QUERY statement is limited to 10 rows.
-
If there are valid reasons to increase these limits, the limits are configurable as documented in Herd Web Application Configuration Options](TODO).
Name | Version | Driver Class Name |
---|---|---|
Redshift | redshift-1.6-3.el6.x86_64.rpm | com.amazon.redshift.jdbc41.Driver |
Oracle | ??? | oracle.jdbc.OracleDriver |
Postgres | ??? | org.postgresql.Driver |
org.finra.dm.service.activiti.task.ExecuteJdbc
Name | Required | Accepted Values | Default Value | Case Sensitive | Description |
---|---|---|---|---|---|
contentType | Y | xml, json | N | Determines whether the content type of the request parameter is XML or JSON. | |
jdbcExecutionRequest | Y | The JDBC execution request. See the JDBC Execution Request specification for details. |
The request can be defined as either XML or JSON.
Name | Required | Accepted Values | Default Value | Case Sensitive | Velocity Evaluated | Description |
---|---|---|---|---|---|---|
connection | Y | The JDBC connection information. | ||||
♦ url | Y | Y | Y | The URL to the database. It must be the full JDBC URL. ex. jdbc:oracle:thin:@localhost:1521:test | ||
♦ username | Y | Y | Y | The username to the database. | ||
♦ password | Y | Y | Y | The password to the database. | ||
♦ databaseType | Y | REDSHIFT, ORACLE, POSTGRES | Y | N | The type of database the query should be run against. | |
statements | Y | Not empty list | List of statements to execute. The statements will be executed in the order given in this list. | |||
♦ statement | ||||||
♦ ♦ type | Y | UPDATE, QUERY | Y | N | The type of JDBC statement execution to perform. The type mainly defines whether the result should be a scalar or a result set. | |
♦ ♦ sql | Y | Y | Y | The SQL to execute. The SQL should match the type of the statement to execute, that is, UPDATE should not specify a select statement. Otherwise, the result behavior is undefined. | ||
♦ ♦ continueOnError | N | true, false | false | N | N | Indicates whether the entire execution should fail if this statement fails to execute. "true" indicates that the statement failure will not stop execution, but will continue with the next statement. "false" indicates that the execution will stop if this statement fails. |
s3PropertiesLocation | N | Y | The full S3 location to the Java properties file which will be used for value replacement in the request. Herd MUST have access to this bucket and any related KMS encryption keys for this feature to work. See section "Security Considerations" for details on usage. | |||
♦ bucketName | Y | Y | N | The S3 bucket name to the properties file location. | ||
♦ key | Y | Y | N | The S3 object key to the properties file location. |
<serviceTask id="id" name="name" activiti:class="org.finra.dm.service.activiti.task.ExecuteJdbc">
<extensionElements>
<activiti:field name="contentType" stringValue="xml" />
<activiti:field name="jdbcExecutionRequest" expression="${jdbcExecutionRequest}" />
</extensionElements>
</serviceTask>
<jdbcExecutionRequest>
<connection>
<url>jdbc:redshift://examplecluster.us-east-1.redshift.amazonaws.com:5439/test</url>
<username>${username}</username>
<password>${password}</password>
<databaseType>REDSHIFT</databaseType>
</connection>
<statements>
<statement>
<type>UPDATE</type>
<sql><![CDATA[insert into FOO values ('BAR')]]></sql>
<continueOnError>true</continueOnError>
</statement>
<statement>
<type>UPDATE</type>
<sql><![CDATA[update FOO set BAR = 'BAZ']]></sql>
</statement>
</statements>
<s3PropertiesLocation>
<bucketName>my-secret-bucket</bucketName>
<key>path/to/properties/file.properties</key>
</s3PropertiesLocation>
</jdbcExecutionRequest>
{
"connection" : {
"url" : "jdbc:redshift://examplecluster.us-east-1.redshift.amazonaws.com:5439/test",
"username" : "${username}",
"password" : "${password}",
"databaseType" : "REDSHIFT"
},
"statements" : [ {
"type" : "UPDATE",
"sql" : "insert into FOO values ('BAR')",
"continueOnError" : true
}, {
"type" : "UPDATE",
"sql" : "update FOO set BAR = 'BAZ'"
} ],
"s3PropertiesLocation" : {
"bucketName" : "my-secret-bucket",
"key" : "path/to/properties/file.properties"
}
}
The response contains the information specified in the request in addition to the result of each statement, and the completion status.
Name | Description |
---|---|
{taskID}_jsonResponse | The response of the JDBC execution request in JSON format. See JDBC Execution Response specifications for details. This value will not be specified if an ERROR occurs due to parameter validation or connection error. However, this value will be set if a statement fails to execute. |
{taskId}_taskStatus | SUCCESS if no errors, or ERROR if there were parameter validation errors, connection errors, or any of the statement fails to execute. |
{taskId}_taskErrorMessage | A message describing the error if status is ERROR. |
Name | Description |
---|---|
statements | List of statements that have been executed. |
♦ statement | |
♦ ♦ type | The type of JDBC statement execution that have been performed. |
♦ ♦ sql | The SQL that have been executed. |
♦ ♦ continueOnError | Indicates whether this statement's failure would cause execution to stop. |
♦ ♦ status | The completion status of the statement. One of SUCCESS, ERROR, or SKIPPED. SUCCESS - the statement executed successfully and the result is expected to be a scalar value. ERROR - the statement execution failed due to errors, and the error message is set to the SQL exception message. Result is not set. SKIPPED - the statement was given in the request, but was never executed due to previous errors, and the failed statement was marked to stop on errors. |
♦ ♦ result | The result of the executed UPDATE type statement. This is typically the number of rows affected by the statement on SUCCESS. The result will only be set if status is SUCCESS. |
♦ ♦ resultSet | The result of the executed QUERY type statement. |
♦ ♦ ♦ columnNames | The list of column names of the result set. |
♦ ♦ ♦ ♦ columnName | The column name. |
♦ ♦ ♦ rows | The list of rows of the result set. |
♦ ♦ ♦ ♦ row | A row in the result. |
♦ ♦ ♦ ♦ ♦ column | The data at each column in the row. The order is the same as the order of the column names. |
♦ ♦ errorMessage | The error message if any error occurs during execution of the statement. The message will only be set if the status is ERROR. The error message will typically be the SQLException message thrown by the JDBC driver. |
{
"statements" : [ {
"type" : "UPDATE",
"sql" : "insert into FOO values ('BAR')",
"status" : "SUCCESS",
"result" : "1"
}, {
"type" : "UPDATE",
"sql" : "update FOO set BAR = 'BAZ'",
"status" : "SUCCESS",
"result" : "1"
} ]
}
{
"statements" : [ {
"type" : "UPDATE",
"sql" : "insert into FOO values ('BAR')",
"status" : "SUCCESS",
"result" : "1"
}, {
"type" : "UPDATE",
"sql" : "update FOO set BAR = 'BAZ'",
"status" : "ERROR",
"errorMessage" : "java.sql.SQLSyntaxErrorException: ORA-00903: invalid table name",
"continueOnError" : false
}, {
"type" : "UPDATE",
"sql" : "update FOO set BAR = 'BAZ'",
"status" : "SKIPPED"
} ]
}
{
"statements":
[
{
"type": "QUERY",
"sql": "select * from FOO",
"status": "SUCCESS",
"resultSet":
{
"columnNames": ["COL1", "COL2", "COL3"],
"rows":
[
{
"columns": ["A", "B", "C"]
},
{
"columns": ["D", "E", "F"]
},
{
"columns": ["G", "H", "I"]
}
]
}
}
]
}
You may optionally provide an S3 key as part of the request to store sensitive information, which then can be used to replace parts of the request values.
The S3 object must be in the Java properties format. The values then may contain Velocity template expressions to be evaluated when the service is executed.
- When using velocity template expressions within an activiti expression, be sure to escape () the $ so that activiti ignores the velocity expression.
The S3 bucket and key must be accessible by Herd application. The application's instance role must be given permissions to read from the provided location, and also allowed kms:decrypt if KMS encryption is enabled.
It is highly recommended that this approach be used to hide connection credentials (DB username, passwords) and statement including sensitive information (such as COPY command, which contain AWS keys)
hostname=examplecluster.us-east-1.redshift.amazonaws.com
username=foo
password=bar
accessKey=AKABC
secretKey=ABC123
{
"connection" : {
"url" : "jdbc:redshift://${hostname}:5439/test",
"username" : "${username}",
"password" : "${password}",
"databaseType" : "REDSHIFT"
},
"statements" : [ {
"type" : "UPDATE",
"sql" : "copy favoritemovies from 'dynamodb://ProductCatalog' credentials 'aws_access_key_id=${accessKey};aws_secret_access_key=${secretKey}' "
} ],
"s3PropertiesLocation" : {
"bucketName" : "my-secret-bucket",
"key" : "path/to/properties/file.properties"
}
}
The ${} expressions are Velocity variables, which will be replaced with the corresponding values provided in the properties file.
In case of SQL statement errors, the response error message will mask the sensitive data with 4 asterisks (****) to protect the data being exposed by the exception message.
Please note: This feature will be deprecated in favor of the S3 properties approach.
Due to security issues with specifying passwords in the Activiti workflow definition, it is highly recommended that the password be passed in as a workflow parameter instead of hard-coding in the definition. By specifying the parameter name with a name which contains the word "password" in Run Data Processing Job Post, the value of the parameter will be masked before being logged in the application logs.
<definitions xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:activiti="http://activiti.org/bpmn" xmlns:bpmndi="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:omgdc="http://www.omg.org/spec/DD/20100524/DC" xmlns:omgdi="http://www.omg.org/spec/DD/20100524/DI" xmlns:xsd="http://www.w3.org/2001/XMLSchema" typeLanguage="http://www.w3.org/2001/XMLSchema" expressionLanguage="http://www.w3.org/1999/XPath" targetNamespace="http://www.activiti.org/test">
<process id="test.test" name="test.test" isExecutable="true">
<startEvent id="startEvent" name="startEvent"></startEvent>
<scriptTask id="script" name="script" scriptFormat="javascript" activiti:autoStoreVariables="false">
<script>
var jdbcExecutionRequest = {
connection : {
url : 'jdbc:oracle:thin:@test_host:1521:test_sid',
username : 'test_username',
password : execution.getVariable('dbPassword'),
driverClassName : 'oracle.jdbc.OracleDriver'
},
statements : [ {
type : 'UPDATE',
sql : 'insert into FOO values ('BAR')'
} ]
};
execution.setVariable('jdbcExecutionRequest', JSON.stringify(jdbcExecutionRequest));
</script>
</scriptTask>
<serviceTask id="serviceTask" name="serviceTask" activiti:class="org.finra.dm.service.activiti.task.ExecuteJdbc">
<extensionElements>
<activiti:field name="contentType" stringValue="json" />
<activiti:field name="jdbcExecutionRequest" expression="${jdbcExecutionRequest}" />
</extensionElements>
</serviceTask>
<sequenceFlow id="sequenceFlow1" sourceRef="startEvent" targetRef="script"></sequenceFlow>
<sequenceFlow id="sequenceFlow2" sourceRef="script" targetRef="serviceTask"></sequenceFlow>
<sequenceFlow id="sequenceFlow3" sourceRef="serviceTask" targetRef="endEvent"></sequenceFlow>
<endEvent id="endEvent" name="endEvent"></endEvent>
</process>
</definitions>
Plese note: Since the parameter name contains "password," the value "test_password" will be masked during logging.
<jobCreateRequest>
<namespace>test</namespace>
<jobName>test</jobName>
<parameters>
<parameter>
<name>dbPassword</name>
<value>test_password</value>
</parameter>
</parameters>
</jobCreateRequest>
- Data Registration and Availability
- DDL Generation
- Cluster Management
- Workflow Control/Debugging