globaldothealth · abhidg · May 28, 2021 · May 28, 2021 · May 28, 2021 · May 28, 2021
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Global.Health List
 
-[Global.Health](https://global.health)'s mission is to enable rapid sharing of trusted and open public health data to
+[Global.health](https://global.health)'s mission is to enable rapid sharing of trusted and open public health data to
 advance the response to infectious diseases.
 
 This repository contains the servers and scripts that support its data curation efforts.
@@ -15,44 +15,47 @@ You can tell if a case was imported from the manually created spreadsheets data
 
 ## Frontends
 
-- [dev](https://dev-curator.ghdsi.org)
-- [prod](https://curator.ghdsi.org)
+- [dev](https://dev-data.covid-19.global.health)
+- [prod](https://data.covid-19.global.health)
 
 ## Daily exports of case data
 
-- [output](data/README.md)
-- [data dictionary](data-serving/scripts/export-data/case_fields.yaml)
-- [script](data-serving/scripts/export-data/README.md)
+A daily export of case data can be downloaded from the data portal. The data
+is generated using this [script](data-serving/scripts/export-data/README.md), with
+this [data dictionary](data-serving/scripts/export-data/functions/01-split/fields.txt).
 
 ## CI/CD status
 
-[All actions](https://github.com/globaldothealth/list/actions)
+* [**All actions**](https://github.com/globaldothealth/list/actions)
 
-![Curator Service Github Packages Push](https://github.com/globaldothealth/list/workflows/Curator%20Service%20Github%20Packages%20Push/badge.svg)
+* **Docker images**
 
-![Export prod case data](https://github.com/globaldothealth/list/workflows/Export%20prod%20case%20data/badge.svg)
+   ![Curator Service Github Packages Push](https://github.com/globaldothealth/list/workflows/Curator%20Service%20Github%20Packages%20Push/badge.svg)
 
-![Suggest python scripts](https://github.com/globaldothealth/list/workflows/Suggest%20python%20scripts/badge.svg)
+   ![Data Service Github Packages Push](https://github.com/globaldothealth/list/workflows/Data%20Service%20Github%20Packages%20Push/badge.svg)
 
-![Update case data in dev](https://github.com/globaldothealth/list/workflows/Update%20case%20data%20in%20dev/badge.svg)
+   ![Geocoding Service Github Packages Push](https://github.com/globaldothealth/list/actions/workflows/geocoding-service-package.yml/badge.svg)
 
-![Data Service Github Packages Push](https://github.com/globaldothealth/list/workflows/Data%20Service%20Github%20Packages%20Push/badge.svg)
+   ![Ingestion functions deploy](https://github.com/globaldothealth/list/actions/workflows/ingestion-functions-deploy.yml/badge.svg)
 
-![Geocoding location-service Python CI](https://github.com/globaldothealth/list/actions/workflows/geocoding-service-python.yml/badge.svg)
+* **Tests**
 
-![Integration Tests CI](https://github.com/globaldothealth/list/workflows/Integration%20Tests%20CI/badge.svg)
+   ![Geocoding location-service Python CI](https://github.com/globaldothealth/list/actions/workflows/geocoding-service-python.yml/badge.svg)
 
-![Ingestion functions AWS SAM deploy](https://github.com/globaldothealth/list/workflows/Ingestion%20functions%20AWS%20SAM%20deploy/badge.svg)
+   ![Integration Tests CI](https://github.com/globaldothealth/list/workflows/Integration%20Tests%20CI/badge.svg)
 
-![Ingestion functions AWS SAM build](https://github.com/globaldothealth/list/workflows/Ingestion%20functions%20AWS%20SAM%20build/badge.svg)
+   ![Ingestion functions Python CI](https://github.com/globaldothealth/list/workflows/Ingestion%20functions%20Python%20CI/badge.svg)
 
-![Ingestion functions Python CI](https://github.com/globaldothealth/list/workflows/Ingestion%20functions%20Python%20CI/badge.svg)
+   ![Data service Node.js CI](https://github.com/globaldothealth/list/workflows/Data%20service%20Node.js%20CI/badge.svg)
 
-![Data service Node.js CI](https://github.com/globaldothealth/list/workflows/Data%20service%20Node.js%20CI/badge.svg)
+   ![Curator UI Node.js CI](https://github.com/globaldothealth/list/workflows/Curator%20UI%20Node.js%20CI/badge.svg)
 
-![Curator UI Node.js CI](https://github.com/globaldothealth/list/workflows/Curator%20UI%20Node.js%20CI/badge.svg)
+   ![Curator API Node.js CI](https://github.com/globaldothealth/list/workflows/Curator%20API%20Node.js%20CI/badge.svg)
+
+   ![Suggest python scripts](https://github.com/globaldothealth/list/workflows/Suggest%20python%20scripts/badge.svg)
+
+* **Monitoring** ![Data monitoring notifications](https://github.com/globaldothealth/list/actions/workflows/monitoring-notify.yml/badge.svg)
 
-![Curator API Node.js CI](https://github.com/globaldothealth/list/workflows/Curator%20API%20Node.js%20CI/badge.svg)
 
 ## Components
 
@@ -61,6 +64,8 @@ You can tell if a case was imported from the manually created spreadsheets data
 - [The curator service](verification/curator-service/api) in `verification/curator-service/api` serves as the backend
   for the curator portal, which enables curators to view, enter, update, and verify cases; manage data sources and their
   ingestion; and manage portal access.
+- [The geocoding service](geocoding/location-service) geocodes locations and is
+  used by the data service, but can be used standalone as well.
 - [The curator UI](verification/curator-service/ui) in `verification/curator-service/ui` is the frontend for the curator
   portal.
 
@@ -93,7 +98,7 @@ You can tell if a case was imported from the manually created spreadsheets data
 
 ### API docs
 
-- [Curator service](HTTPS://curator.ghdsi.org/api-docs)
+- [Curator service](https://data.covid-19.global.health/api-docs)
 - [Local curator service](http://localhost:3001/api-docs)
 - [Local data service](http://localhost:3000/api-docs)
 

diff --git a/aws/README.md b/aws/README.md
@@ -1,6 +1,6 @@
 # Production infrastructure
 
-This directory contains the configuration files for the production infrastructure of the Global Health project.
+This directory contains the configuration files for the production infrastructure of the Global.health project.
 
 ## One-time setup
 
@@ -32,52 +32,60 @@ The basic deployment/pods/services configuration looks like:
 
 ```
 kubectl get deployments
-NAME           READY   UP-TO-DATE   AVAILABLE   AGE
-curator-dev    1/1     1            1           19h
-curator-prod   1/1     1            1           19h
-data-dev       1/1     1            1           19h
-data-prod      1/1     1            1           19h
+------------------------------------------------------
+NAME            READY   UP-TO-DATE   AVAILABLE   AGE
+curator-dev     1/1     1            1           357d
+curator-prod    2/2     2            2           357d
+data-dev        1/1     1            1           2d6h
+data-prod       2/2     2            2           357d
+location-dev    1/1     1            1           10d
+location-prod   2/2     2            2           10d
 
 kubectl get pods
-NAME                           READY   STATUS    RESTARTS   AGE
-curator-dev-69d6f94954-qrc2v   1/1     Running   0          14m
-curator-prod-dfb49646-qz5zp    1/1     Running   0          14m
-data-dev-6f686ffdb6-jt6tl      1/1     Running   0          14m
-data-prod-bd57576d8-p8wp4      1/1     Running   0          14m
+-----------------------------------------------------------------
+NAME                             READY   STATUS    RESTARTS   AGE
+curator-dev-7c65fdd7bc-jkfnn     1/1     Running   0          26h
+curator-prod-94ccbc565-qtmph     1/1     Running   1          26h
+curator-prod-94ccbc565-v7rvs     1/1     Running   0          26h
+data-dev-66676656f6-897p8        1/1     Running   0          26h
+data-prod-7d479ddbbc-fdkgb       1/1     Running   0          26h
+data-prod-7d479ddbbc-qq256       1/1     Running   0          26h
+location-dev-5986bf6f77-f8mmb    1/1     Running   0          26h
+location-prod-6d674b7b47-lxj2z   1/1     Running   0          26h
+location-prod-6d674b7b47-zfgk2   1/1     Running   0          26h
 
 kubectl get services
-NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
-curator-dev    ClusterIP   10.100.222.22    <none>        80/TCP    110m
-curator-prod   ClusterIP   10.100.43.67     <none>        80/TCP    3h7m
-data-dev       ClusterIP   10.100.204.152   <none>        80/TCP    3h7m
-data-prod      ClusterIP   10.100.59.189    <none>        80/TCP    3h7m
-kubernetes     ClusterIP   10.100.0.1       <none>        443/TCP   3h25m
-
-kubectl get ingress
-NAME                        HOSTS                                     ADDRESS                                                                         PORTS     AGE
-cm-acme-http-solver-k9bzb   curator.ghdsi.org                         ad9f94057436541e5a5d6b4f9b2deec0-1e4e71b2092ca07c.elb.us-east-1.amazonaws.com   80        31s
-cm-acme-http-solver-mvg2s   dev-curator.ghdsi.org                     ad9f94057436541e5a5d6b4f9b2deec0-1e4e71b2092ca07c.elb.us-east-1.amazonaws.com   80        31s
-curator                     dev-curator.ghdsi.org,curator.ghdsi.org   ad9f94057436541e5a5d6b4f9b2deec0-1e4e71b2092ca07c.elb.us-east-1.amazonaws.com   80, 443   34s
+--------------------------------------------------------------------------------------------------------------------------------------------------
+NAME                 TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)         AGE
+curator-dev          ClusterIP      10.100.222.22    <none>                                                                   80/TCP          357d
+curator-dev-aws-1    LoadBalancer   10.100.175.60    a1fc85431766047dda91ffcb74696fa4-156204500.us-east-1.elb.amazonaws.com   443:32233/TCP   105d
+curator-prod         ClusterIP      10.100.43.67     <none>                                                                   80/TCP          357d
+curator-prod-aws-1   LoadBalancer   10.100.165.255   ae37d40a64ae44960b885564ddee19cf-419585441.us-east-1.elb.amazonaws.com   443:31681/TCP   105d
+data-dev             ClusterIP      10.100.204.152   <none>                                                                   80/TCP          357d
+data-prod            ClusterIP      10.100.59.189    <none>                                                                   80/TCP          357d
+kubernetes           ClusterIP      10.100.0.1       <none>                                                                   443/TCP         357d
+location-dev         ClusterIP      10.100.234.147   <none>                                                                   80/TCP          10d
+location-prod        ClusterIP      10.100.68.234    <none>                                                                   80/TCP          10d
 ```
 
 We use a deployment file for the data service and for the curator service, check out `data.yaml` and `curator.yaml`.
 
 To update the deployments, first do a dry run:
 
 ```shell
-kubectl apply -f data.yaml -f curator.yaml --dry-run=server
+kubectl apply -f data.yaml -f curator.yaml -f location.yaml --dry-run=server
 ```
 
 For more verbose output use:
 
 ```shell
-kubectl apply -f data.yaml -f curator.yaml --dry-run=server --output=yaml
+kubectl apply -f data.yaml -f curator.yaml -f location.yaml --dry-run=server --output=yaml
 ```
 
 Then once changes look good, use:
 
 ```shell
-kubectl apply -f data.yaml -f curator.yaml
+kubectl apply -f data.yaml -f curator.yaml -f location.yaml
 ```
 
 To confirm changes occurred, use:
@@ -91,17 +99,22 @@ To read the server logs first find the pod whose logs you want to read.
 
 ```shell
 kubectl get pods
-NAME                            READY   STATUS    RESTARTS   AGE
-curator-dev-6cff5859df-dddbw    1/1     Running   0          148m
-curator-prod-5bf5c88f58-g2489   1/1     Running   0          139m
-data-dev-566fb67694-xfzkj       1/1     Running   0          148m
-data-prod-5b78bdc66d-dwf2k      1/1     Running   4          139m
+NAME                             READY   STATUS    RESTARTS   AGE
+curator-dev-7c65fdd7bc-jkfnn     1/1     Running   0          26h
+curator-prod-94ccbc565-qtmph     1/1     Running   1          26h
+curator-prod-94ccbc565-v7rvs     1/1     Running   0          26h
+data-dev-66676656f6-897p8        1/1     Running   0          26h
+data-prod-7d479ddbbc-fdkgb       1/1     Running   0          26h
+data-prod-7d479ddbbc-qq256       1/1     Running   0          26h
+location-dev-5986bf6f77-f8mmb    1/1     Running   0          26h
+location-prod-6d674b7b47-lxj2z   1/1     Running   0          26h
+location-prod-6d674b7b47-zfgk2   1/1     Running   0          26h
 ```
 
 Then call logs on the pod you want to read from.
 
 ```shell
-kubectl logs data-prod-5b78bdc66d-dwf2k
+kubectl logs data-prod-7d479ddbbc-qq256
 ```
 
 ## Getting access to the cluster
@@ -192,7 +205,7 @@ If for some reason a secret has been compromised or if you want to perform a rot
 
 5. Change reference to new secret in deployment configs.
 
-6. Apply configuration changes. (`kubectl apply -f curator.yaml -f data.yaml`)
+6. Apply configuration changes. (`kubectl apply -f curator.yaml -f data.yaml -f location.yaml`)
 
 7. Verify new deployment works as intended.
 
@@ -253,11 +266,10 @@ In a few seconds the push should be complete.
 
 You can list the existing tags/versions with `git tag` or on the [github repo](https://github.com/globaldothealth/list/releases).
 
-### `main` image tag for dev
+### `latest` image tag for dev
 
-Dev instances of curator and data services are using the `main` image tag, that's not best practice as the images are not reloaded automatically - better approach is pending
-[Flux-based deployment](https://github.com/globaldothealth/list/issues/673). The latest image with the `main`
-tag is fetched when a deployment is updated in kubernetes. To update dev to the `main` image, do:
+Dev instances of curator and data services are using the `latest` image tag. Using this tag lets kubernetes know
+that the image should always be fetched when restarting the service, which can be done using the following:
 
 ```shell
 kubectl rollout restart deployment/curator-dev
@@ -299,14 +311,14 @@ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/cont
 Our ingress routes to the dev and prod curator services were installed with:
 
 ```shell
-kubectl apply -f curator-ingress.yam
+kubectl apply -f curator-ingress.yaml
 kubectl apply -f curator-ingress-config-map.yaml -n ingress-nginx
 ```
 
 The curator services are exposed here:
 
-- [dev](https://dev-curator.ghdsi.org)
-- [prod](https://curator.ghdsi.org)
+- [dev](https://dev-data.covid-19.global.health)
+- [prod](https://data.covid-19.global.health)
 
 ## Kubernetes dashboard
 

diff --git a/ingestion/functions/README.md b/ingestion/functions/README.md
@@ -3,7 +3,7 @@
 ## Overview
 
 This directory contains the parsing functions
-used in the Global Health ingestion system.
+used in the Global.health ingestion system.
 
 The objective of the ingestion system is to facilitate a semi-automated
 workflow accomplishing the retrieval of epidemiological source data, the
@@ -95,6 +95,60 @@ python3.8 -m pip install -r requirements.txt
 
 *NB:* Be sure you're using Python 3.8, which corresponds to the runtime of the job definitions run using Batch.
 
+#### Manual ingestion
+
+You should be able to run ingestion using the curator UI. This exists as
+a fallback if the UI triggers for ingestion are not working.
+
+1. You'll need AWS access, follow the steps in the previous section.
+2. Once you've got AWS setup, run the following in `ingestion/functions` after switching
+  to the virtualenv:
+
+      source venv/bin/activate
+      python aws.py jobdefs
+
+  This should show existing **job definitions**. Job definitions are templates that
+  tell AWS Batch which parser to run and in which environment (dev or prod). If this
+  command doesn't work, contact the engineering team to setup access.
+
+3. Check if the ingestion you want to run already has an associated job definition
+   corresponding to the environment you want to run in:
+   `python aws.py jobdefs | grep colombia.*prod` to search for Colombia ingestion
+   in prod, which gives
+
+       ACTIVE colombia-colombia-ingestor-prod
+
+4. If step 3 shows that a job definition is available, you can **submit** a job:
+
+       python aws.py submit colombia-colombia-ingestor-prod
+
+   Check the submit help options `python aws.py submit --help`. The most common
+   options to use are `-t` (or `--timeout)` to specify the maximum number of *minutes*
+   the ingestion is allowed to run. The default is 60 minutes, which is fine for
+   daily ingestion, but might not be enough time to run a backfill.
+
+   To run a **backfill**, use the `-s` (`--start-date`) and `-e` (`--end-date`)
+   flags to delimit the backfill duration. You can now skip to step 6.
+
+5. If there's no existing job definition for a source, you'll need to **register** one.
+   Registration creates a new job definition which can be used to submit jobs.
+
+   First, add the parser to the source in the curator UI. The parser must be named
+   as follows: `<subfolder>-<parser>-ingestor-<env>`. So for example, if you wanted
+   to add a job definition corresponding to the parser `parsing/peru/peru.py` in
+   the *dev* instance, you would put this as the parser `peru-peru-ingestor-dev`.
+   Then, run the following:
+
+       python aws.py register -e prod|dev <source_id> <parser>
+
+   Here `<parser>` has to be of the form `subfolder.parser` such as `peru.peru` under
+   the parsing folder. This will check that the parser corresponds to that in the
+   curator UI, and create the job definition. You can then submit a job.
+
+6. Once your job has been submitted, you can view its logs through the Cloudwatch
+   AWS portal. The logs are stored in the `/aws/batch/job` log group. Once the job
+   is finished, the status of the upload will also be updated in the curator UI.
+
 ### Writing and editing functions
 
 For the most part, writing functions is writing standard Python business logic. Each parsing function has a boilerplate code at the end which allows it to be invoked from retrieval with an JSON object describing the parameters for the parsing function: