Skip to content
This repository has been archived by the owner on Mar 26, 2021. It is now read-only.

Create A Custom Dashboard From a Redshift View

Jean-Paul Mugizi edited this page Apr 27, 2018 · 2 revisions

You want to see how many users confirmed their registration emails while signing up for login.gov and you wanted to sort the confirmation rates by the Top 5 most popular email domains(e.g gmail, yahoo, etc.)

0. Prerequisites and Setup

These instructions assume that you have the following setup:

  • GSA VPN or CIDR Block + PIV card

    Make sure you are either on the GSA Network or on the GSA VPN, otherwise you will not be able to log into the jumphost. For instructions on how to setup your machine, checkout identity-devops docs.

  • AWS CLI

    Make sure you also have aws configured with both your access_key_id and and your aws_secret_access_key for identitysandbox and prod environments. The identity-devops wiki has instructions on how to as well

  • Redshift cluster address

    You’ll need this to authenticate once on the jumphost

    **TODO: automate this part **

You’ll need to have psql install on the jumphost as well, otherwise check with #identity-devops to see if you can either manual install it via

apt-get install postgres-client-9.5

find the prod redshift cluster

aws redshift describe-clusters | grep Address | grep prod

You’ll get something like this

"Address": "tf-prod-redshift-cluster.<redshift_arn>.us-west-2.redshift.amazonaws.com”

Once in the jumphost, open redshift like so

psql -h tf-prod-redshift-cluster.<redshift_arn>.us-west-2.redshift.amazonaws.com -U awsuser -d analytics -p 5439

1. Create a View in Redshift

Usually, by now you should a SQL query that returns the data/format you desire. For email confirmation success the query would look something this. Wrap the query into a SQL view to use later during analysis

first, run this query to create the view/virtual table

CREATE VIEW email_domain_return_rate 
AS 
  (SELECT Count(DISTINCT u2) / Count(DISTINCT u1) :: FLOAT AS return_rate, 
          Count(*)                                         AS raw_count, 
          time1 :: DATE, 
          domain_name 
   FROM   (SELECT e2.user_id AS u2, 
                  e.user_id  AS u1, 
                  e2.name, 
                  ee.name, 
                  e.TIME     AS time1, 
                  e2.TIME    AS time2, 
                  ee.domain_name 
           FROM   events_email ee 
                  join EVENTS e 
                    ON e.id = ee.id 
                  left outer join (SELECT * 
                                   FROM   EVENTS 
                                   WHERE  name = 'Email Confirmation') e2 
                               ON e2.user_id = e.user_id 
           WHERE  ( Datediff(days, e2.TIME :: DATE, e.TIME :: DATE) < 2 
                     OR e2.TIME IS NULL ) 
                  AND e.user_id != 'anonymous-uuid' 
                  AND e.TIME :: DATE > '2018-01-01' 
                  AND ee.domain_name IN ( 'gmail.com',        'yahoo.com', 'aol.com', 'hotmail.com', 'comcast.net', 'verizon.net' )) t 
   GROUP  BY 4, 
             3 
   ORDER  BY 3 DESC);

Query the resulting virtual table to make sure it returns the data you want

SELECT * FROM email_domain_return_rate LIMIT 10;

2. Create An analysis in QuickSight

Login into the QuickSight dashboard in the AWS Console. Make sure you are in the US-West-2(Oregon) region.

Load the dataset into quicksight

  • Click connect to another data source or upload a file
  • Choose Redshift( auto-discovered) and use your regular credentials with username awsuser and click Choose a Cluster ID to choose the cluster you are trying to connect to this time it should be tf-prod-redshift-cluster
  • Note down the Data Source Name as you’ll need it in your analysis(this should be the same name as the view name from redshift)

Create A New Analysis

  • On the main QuickSight page, click All dashboards
  • Choose the most recent dashboard( in this case login-gov-prod-april)
  • Click Save As then give your analysis a name

Load data set into your new analysis

  • Click Add at the top (next to the Quicksight logo) and then add visual
  • Under Visualize, click the events and then edit datasets
  • Choose your dataset(view name in redshift) or if it’s not on the list click add dataset to add it

Create vizualization

Hopefully at this point, things should be straighforward. Choose the Visualize tabe and for email_domain_return_rate:

  • add time 1 to the x-axis
  • add return rate to the y-axis and make sure to show PCT(%) instead of SUM.
  • add domain_name as color to distinguish individual domain names return rate performance

You can also look at the Filter tab and add specific filters like choose to show gmail only etc. it’s up to you!

3. Add Your Analysis To The Main Dashboard

Congratulations! You can now add your analysis to an existing dashboard or create a new one. Just Choose Create Dashboard on the top right corner.

You can also share your analysis with specific Users(in the login-gov IAM group), just choose Share Analysis and follow instructions.