Direct Data API allows Veeva customers to replicate their data in external data warehouses and/or data lakes.
This project is a custom connector between Vault and AWS Redshift. This connector performs the following:
- List and download Direct Data files from Vault using Direct Data API
- Place Direct Data files to an S3 bucket
- Load Direct Data into an AWS Redshift database
The steps below outline how to create and configure resources in an AWS account to use the Direct Data API Connector.
Note: All resources should be created in the same AWS Region.
-
Navigate to the ECR service in the AWS Console
-
Under
Private registry
, selectRepositories
-
Select
Create repository
-
Configure the repository with the following settings:
- General Settings:
Private
- Repository name:
cf-direct-data
- General Settings:
-
All other settings are default. Click through to create the repository
-
Configure and Authenticate the AWS CLI for initial setup
-
Using AWS CLI, authenticate Docker to the ECR Repo:
aws ecr get-login-password --region {REGION} | docker login --username AWS --password-stdin {ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com```
-
Using Docker, pull the following two images from the Veeva ECR Public Repo:
docker pull public.ecr.aws/u0l6k3p6/direct-data:lambda-latest docker pull public.ecr.aws/u0l6k3p6/direct-data:batch-latest
-
Tag the images with the
cf-direct-data
ECR Repo URI that was created above:docker tag public.ecr.aws/u0l6k3p6/direct-data:lambda-latest {PRIVATE_REPO_URI}:lambda-latest docker tag public.ecr.aws/u0l6k3p6/direct-data:batch-latest {PRIVATE_REPO_URI}:batch-latest
-
Push the images to the ECR Repo:
docker push {PRIVATE_REPO_URI}:lambda-latest docker push {PRIVATE_REPO_URI}:batch-latest
- Navigate to the IAM service in the AWS Console
- Select
Roles
andCreate Role
- Configure the role with the following settings:
- Step 1:
- Trusted entity type:
AWS account
- An AWS account:
This account
- Use case:
Cloudformation
- Trusted entity type:
- Step 2: Attach the following AWS managed policies. These are required to access Cloudformation and create the direct data resources.
- AmazonAPIGatewayAdministrator
- AmazonEC2ContainerRegistryFullAccess
- AmazonRedshiftFullAccess
- AmazonS3FullAccess
- AmazonVPCFullAccess
- AWSBatchFullAccess
- AWSCloudFormationFullAccess
- AWSLambda_FullAccess
- IAMFullAccess
- SecretsManagerReadWrite
- Step 3:
- Give the role an appropriate name
- Step 1:
- All other settings are default. Click through to create the role
- Download the Direct Data Cloudformation Template
- Navigate to the Cloudformation service in the AWS Console
- Select
Create Stack
andWith new resources (standard)
- Configure the Stack with the following settings:
- Step 1:
- Prepare Template:
Choose an Existing Template
- Template Source:
Upload a template file
. - Click
Choose file
and select the Direct Data Cloudformation Template
- Prepare Template:
- Step 2:
- Give the stack an appropriate name
- Step 3:
- Permissions: Select the IAM Role created in the previous step
- Step 1:
- All other settings are default. Click through to create the stack. When the job completes, Select the
Resources
tab to view the resources created
- Navigate to the Lambda service in the AWS Console
- Search for and select the Lambda function named
cf-direct-data
- Select
Add trigger
from the function screen - Configure the trigger with the following settings:
- Trigger configuration:
API Gateway
- Intent:
Create a new API
- API type:
HTTP API
- Security:
Open
- Trigger configuration:
- All other settings are default. Click through to create the trigger
- Copy the
API endpoint
value of the trigger and note it down separately. This will be used to invoke the integration
- Navigate to the S3 service in the AWS Console
- Search for and select the S3 bucket named
{ACCOUNT_ID}-{REGION}-cf-direct-data
- Copy the s3 bucket name and note it down separately. This will be used in the Direct Data configuration file
- Create a folder at the root of the bucket named
direct-data
- Navigate to the Redshift service in the AWS Console
- Search for and select the Redshift cluster named
cf-direct-data
- From the cluster screen, copy the
Endpoint
value and note it down separately. This will be used in the Direct Data configuration file - Note: The following step is not required, but recommended for security
- From the
Actions
dropdown, selectChange admin user password
- Navigate to the IAM service in the AWS Console
- Select
Roles
- Search for and select the role named
cf-direct-data-redshift-role-{REGION}
- Copy the
ARN
value and note it down separately. This will be used in the Direct Data configuration file
- Navigate to the Secrets Manager service in the AWS Console
- Search for and select the secret named
direct-data-config.ini
- Select
Retrieve secret value
thenEdit
. Update the following values under the [demo] section:- vault_username
- vault_password
- vault_dns
- redshift_host (Use the previously copied redshift endpoint. Do not include the port number/database name)
- redshift_iam_redshift_s3_read (Use the previously copied ARN for
cf-direct-data-redshift-role-{REGION}
) - redshift_password (If updated in the previous step)
- s3_bucket_name
- Additional sections can be added with different vault and/or AWS services specified for multiple Vault and database functionality.
- Navigate to the VPC service in the AWS Console
- Select
Route tables
from the left - Select the table ID associated with the
cf-direct-data
VPC - Select
Edit Routes
- Select
Add Route
- Add the following route:
- Destination:
0.0.0.0/0
- Target:
Internet Gateway
(Select the gateway associated with the VPC)
- Destination:
- Save the route table
- Download the Direct Data Connector Postman Collection
- Import the collection into Postman
- Open the List and Download Direct Data Files to S3 endpoint
- Update the URL to the previously noted
API endpoint
from the lambda trigger - Update the body parameters with the following JSON payload:
{
"step": "retrieve",
"start_time": "2000-01-01T00:00Z",
"stop_time": "2024-04-19T00:00Z", //Update this value to the current date
"extract_type": "full",
"continue_processing": true,
"secret": "demo"
}
-
Click
Send
-
When manually invoking the
full
orlog
extract type process, the List and Download Direct Data Files to S3 call will respond with the AWS Batch job nameStarting AWS Batch Job with ID: cf-direct-data-retrieve
and Unzip Files in S3 calls will call will respond with the AWS Batch job nameStarting AWS Batch Job with ID: cf-direct-data-unzip
.
Once both AWS Batch jobs have completed, confirm the following:
- That the zipped file and the unzipped contents are present in the previously created S3 bucket
- All the tables were created in the specified Redshift schema and the data was loaded. This can be confirmed using Redshift Query Editor v2.
These schedules should be created after the initial full
extract is invoked.
- Navigate to the Amazon EventBridge service in the AWS Console
- Select
Schedules
from the left under theScheduler
section - Select
Create schedule
- Configure the schedule with the following settings:
- Schedule name:
direct-data-incremental-schedule
- Occurrence:
Recurring schedule
- Schedule type:
Rate-based schedule
- Rate expression:
15 minutes
- Flexible time window:
Off
- Start date and time: Insert the target start time for when this schedule should run
- Schedule name:
- Select
Next
- Configure the next page with the following settings:
- Template targets:
AWS Lambda Invoke
- Lambda function:
cf-direct-data
- Payload:
- Template targets:
{
"step": "retrieve",
"extract_type": "incremental",
"continue_processing": true,
"secret": "demo"
}
- Select
Next
- On the next page select
Next
- Select
Create schedule
- Follow the same steps as the Incremental Schedule except changing the following fields:
- Rate expression:
24 hours
- Start date and time: Tomorrow's date at 12 AM
- Payload:
- Rate expression:
{
"step": "retrieve",
"extract_type": "log",
"continue_processing": true,
"secret": "demo"
}
If errors are encountered, the logs for the Lambda function can be located on CloudWatch whereas the AWS Batch job logging can be located within the previous job that ran.
Questions, enhancement requests, or issues should be posted in the Vault for Developers community on Veeva Connect. Partners should discuss these topics with their Veeva counterparts.