-
Notifications
You must be signed in to change notification settings - Fork 6
Create A Custom Dashboard From a Redshift View
You want to see how many users confirmed their registration emails while signing up for login.gov and you wanted to sort the confirmation rates by the Top 5 most popular email domains(e.g gmail, yahoo, etc.)
These instructions assume that you have the following setup:
-
GSA VPN or CIDR Block + PIV card
Make sure you are either on the GSA Network or on the GSA VPN, otherwise you will not be able to log into the jumphost. For instructions on how to setup your machine, checkout identity-devops docs.
-
AWS CLI
Make sure you also have aws configured with both your
access_key_id
and and youraws_secret_access_key
foridentitysandbox
andprod
environments. The identity-devops wiki has instructions on how to as well -
Redshift cluster address
You’ll need this to authenticate once on the jumphost
**TODO: automate this part **
You’ll need to have psql
install on the jumphost as well, otherwise check with #identity-devops to see if you can either manual install it via
apt-get install postgres-client-9.5
find the prod redshift cluster
aws redshift describe-clusters | grep Address | grep prod
You’ll get something like this
"Address": "tf-prod-redshift-cluster.<redshift_arn>.us-west-2.redshift.amazonaws.com”
Once in the jumphost, open redshift like so
psql -h tf-prod-redshift-cluster.<redshift_arn>.us-west-2.redshift.amazonaws.com -U awsuser -d analytics -p 5439
Usually, by now you should a SQL query that returns the data/format you desire. For email confirmation success the query would look something this. Wrap the query into a SQL view to use later during analysis
first, run this query to create the view/virtual table
CREATE VIEW email_domain_return_rate
AS
(SELECT Count(DISTINCT u2) / Count(DISTINCT u1) :: FLOAT AS return_rate,
Count(*) AS raw_count,
time1 :: DATE,
domain_name
FROM (SELECT e2.user_id AS u2,
e.user_id AS u1,
e2.name,
ee.name,
e.TIME AS time1,
e2.TIME AS time2,
ee.domain_name
FROM events_email ee
join EVENTS e
ON e.id = ee.id
left outer join (SELECT *
FROM EVENTS
WHERE name = 'Email Confirmation') e2
ON e2.user_id = e.user_id
WHERE ( Datediff(days, e2.TIME :: DATE, e.TIME :: DATE) < 2
OR e2.TIME IS NULL )
AND e.user_id != 'anonymous-uuid'
AND e.TIME :: DATE > '2018-01-01'
AND ee.domain_name IN ( 'gmail.com', 'yahoo.com', 'aol.com', 'hotmail.com', 'comcast.net', 'verizon.net' )) t
GROUP BY 4,
3
ORDER BY 3 DESC);
Query the resulting virtual table to make sure it returns the data you want
SELECT * FROM email_domain_return_rate LIMIT 10;
Login into the QuickSight dashboard in the AWS Console. Make sure you are in the US-West-2(Oregon) region.
- Click connect to another data source or upload a file
- Choose Redshift( auto-discovered) and use your regular credentials with username
awsuser
and click Choose a Cluster ID to choose the cluster you are trying to connect to this time it should betf-prod-redshift-cluster
- Note down the
Data Source Name
as you’ll need it in your analysis(this should be the same name as the view name from redshift)
- On the main QuickSight page, click All dashboards
- Choose the most recent dashboard( in this case login-gov-prod-april)
- Click Save As then give your analysis a name
- Click Add at the top (next to the Quicksight logo) and then add visual
- Under Visualize, click the events and then edit datasets
- Choose your dataset(view name in redshift) or if it’s not on the list click add dataset to add it
Hopefully at this point, things should be straighforward. Choose the Visualize
tabe and for email_domain_return_rate
:
- add time 1 to the x-axis
- add return rate to the y-axis and make sure to show PCT(%) instead of SUM.
- add domain_name as color to distinguish individual domain names return rate performance
You can also look at the Filter
tab and add specific filters like choose to show gmail only etc. it’s up to you!
Congratulations! You can now add your analysis to an existing dashboard or create a new one. Just Choose Create Dashboard
on the top right corner.
You can also share your analysis with specific Users(in the login-gov IAM group), just choose Share Analysis
and follow instructions.