Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

202105 update docs #1891

Merged
merged 3 commits into from
May 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 26 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Global.Health List

[Global.Health](https://global.health)'s mission is to enable rapid sharing of trusted and open public health data to
[Global.health](https://global.health)'s mission is to enable rapid sharing of trusted and open public health data to
advance the response to infectious diseases.

This repository contains the servers and scripts that support its data curation efforts.
Expand All @@ -15,44 +15,47 @@ You can tell if a case was imported from the manually created spreadsheets data

## Frontends

- [dev](https://dev-curator.ghdsi.org)
- [prod](https://curator.ghdsi.org)
- [dev](https://dev-data.covid-19.global.health)
- [prod](https://data.covid-19.global.health)

## Daily exports of case data

- [output](data/README.md)
- [data dictionary](data-serving/scripts/export-data/case_fields.yaml)
- [script](data-serving/scripts/export-data/README.md)
A daily export of case data can be downloaded from the data portal. The data
is generated using this [script](data-serving/scripts/export-data/README.md), with
this [data dictionary](data-serving/scripts/export-data/functions/01-split/fields.txt).

## CI/CD status

[All actions](https://github.com/globaldothealth/list/actions)
* [**All actions**](https://github.com/globaldothealth/list/actions)

![Curator Service Github Packages Push](https://github.com/globaldothealth/list/workflows/Curator%20Service%20Github%20Packages%20Push/badge.svg)
* **Docker images**

![Export prod case data](https://github.com/globaldothealth/list/workflows/Export%20prod%20case%20data/badge.svg)
![Curator Service Github Packages Push](https://github.com/globaldothealth/list/workflows/Curator%20Service%20Github%20Packages%20Push/badge.svg)

![Suggest python scripts](https://github.com/globaldothealth/list/workflows/Suggest%20python%20scripts/badge.svg)
![Data Service Github Packages Push](https://github.com/globaldothealth/list/workflows/Data%20Service%20Github%20Packages%20Push/badge.svg)

![Update case data in dev](https://github.com/globaldothealth/list/workflows/Update%20case%20data%20in%20dev/badge.svg)
![Geocoding Service Github Packages Push](https://github.com/globaldothealth/list/actions/workflows/geocoding-service-package.yml/badge.svg)

![Data Service Github Packages Push](https://github.com/globaldothealth/list/workflows/Data%20Service%20Github%20Packages%20Push/badge.svg)
![Ingestion functions deploy](https://github.com/globaldothealth/list/actions/workflows/ingestion-functions-deploy.yml/badge.svg)

![Geocoding location-service Python CI](https://github.com/globaldothealth/list/actions/workflows/geocoding-service-python.yml/badge.svg)
* **Tests**

![Integration Tests CI](https://github.com/globaldothealth/list/workflows/Integration%20Tests%20CI/badge.svg)
![Geocoding location-service Python CI](https://github.com/globaldothealth/list/actions/workflows/geocoding-service-python.yml/badge.svg)

![Ingestion functions AWS SAM deploy](https://github.com/globaldothealth/list/workflows/Ingestion%20functions%20AWS%20SAM%20deploy/badge.svg)
![Integration Tests CI](https://github.com/globaldothealth/list/workflows/Integration%20Tests%20CI/badge.svg)

![Ingestion functions AWS SAM build](https://github.com/globaldothealth/list/workflows/Ingestion%20functions%20AWS%20SAM%20build/badge.svg)
![Ingestion functions Python CI](https://github.com/globaldothealth/list/workflows/Ingestion%20functions%20Python%20CI/badge.svg)

![Ingestion functions Python CI](https://github.com/globaldothealth/list/workflows/Ingestion%20functions%20Python%20CI/badge.svg)
![Data service Node.js CI](https://github.com/globaldothealth/list/workflows/Data%20service%20Node.js%20CI/badge.svg)

![Data service Node.js CI](https://github.com/globaldothealth/list/workflows/Data%20service%20Node.js%20CI/badge.svg)
![Curator UI Node.js CI](https://github.com/globaldothealth/list/workflows/Curator%20UI%20Node.js%20CI/badge.svg)

![Curator UI Node.js CI](https://github.com/globaldothealth/list/workflows/Curator%20UI%20Node.js%20CI/badge.svg)
![Curator API Node.js CI](https://github.com/globaldothealth/list/workflows/Curator%20API%20Node.js%20CI/badge.svg)

![Suggest python scripts](https://github.com/globaldothealth/list/workflows/Suggest%20python%20scripts/badge.svg)

* **Monitoring** ![Data monitoring notifications](https://github.com/globaldothealth/list/actions/workflows/monitoring-notify.yml/badge.svg)

![Curator API Node.js CI](https://github.com/globaldothealth/list/workflows/Curator%20API%20Node.js%20CI/badge.svg)

## Components

Expand All @@ -61,6 +64,8 @@ You can tell if a case was imported from the manually created spreadsheets data
- [The curator service](verification/curator-service/api) in `verification/curator-service/api` serves as the backend
for the curator portal, which enables curators to view, enter, update, and verify cases; manage data sources and their
ingestion; and manage portal access.
- [The geocoding service](geocoding/location-service) geocodes locations and is
used by the data service, but can be used standalone as well.
- [The curator UI](verification/curator-service/ui) in `verification/curator-service/ui` is the frontend for the curator
portal.

Expand Down Expand Up @@ -93,7 +98,7 @@ You can tell if a case was imported from the manually created spreadsheets data

### API docs

- [Curator service](HTTPS://curator.ghdsi.org/api-docs)
- [Curator service](https://data.covid-19.global.health/api-docs)
- [Local curator service](http://localhost:3001/api-docs)
- [Local data service](http://localhost:3000/api-docs)

Expand Down
92 changes: 52 additions & 40 deletions aws/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Production infrastructure

This directory contains the configuration files for the production infrastructure of the Global Health project.
This directory contains the configuration files for the production infrastructure of the Global.health project.

## One-time setup

Expand Down Expand Up @@ -32,52 +32,60 @@ The basic deployment/pods/services configuration looks like:

```
kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
curator-dev 1/1 1 1 19h
curator-prod 1/1 1 1 19h
data-dev 1/1 1 1 19h
data-prod 1/1 1 1 19h
------------------------------------------------------
NAME READY UP-TO-DATE AVAILABLE AGE
curator-dev 1/1 1 1 357d
curator-prod 2/2 2 2 357d
data-dev 1/1 1 1 2d6h
data-prod 2/2 2 2 357d
location-dev 1/1 1 1 10d
location-prod 2/2 2 2 10d

kubectl get pods
NAME READY STATUS RESTARTS AGE
curator-dev-69d6f94954-qrc2v 1/1 Running 0 14m
curator-prod-dfb49646-qz5zp 1/1 Running 0 14m
data-dev-6f686ffdb6-jt6tl 1/1 Running 0 14m
data-prod-bd57576d8-p8wp4 1/1 Running 0 14m
-----------------------------------------------------------------
NAME READY STATUS RESTARTS AGE
curator-dev-7c65fdd7bc-jkfnn 1/1 Running 0 26h
curator-prod-94ccbc565-qtmph 1/1 Running 1 26h
curator-prod-94ccbc565-v7rvs 1/1 Running 0 26h
data-dev-66676656f6-897p8 1/1 Running 0 26h
data-prod-7d479ddbbc-fdkgb 1/1 Running 0 26h
data-prod-7d479ddbbc-qq256 1/1 Running 0 26h
location-dev-5986bf6f77-f8mmb 1/1 Running 0 26h
location-prod-6d674b7b47-lxj2z 1/1 Running 0 26h
location-prod-6d674b7b47-zfgk2 1/1 Running 0 26h

kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
curator-dev ClusterIP 10.100.222.22 <none> 80/TCP 110m
curator-prod ClusterIP 10.100.43.67 <none> 80/TCP 3h7m
data-dev ClusterIP 10.100.204.152 <none> 80/TCP 3h7m
data-prod ClusterIP 10.100.59.189 <none> 80/TCP 3h7m
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 3h25m

kubectl get ingress
NAME HOSTS ADDRESS PORTS AGE
cm-acme-http-solver-k9bzb curator.ghdsi.org ad9f94057436541e5a5d6b4f9b2deec0-1e4e71b2092ca07c.elb.us-east-1.amazonaws.com 80 31s
cm-acme-http-solver-mvg2s dev-curator.ghdsi.org ad9f94057436541e5a5d6b4f9b2deec0-1e4e71b2092ca07c.elb.us-east-1.amazonaws.com 80 31s
curator dev-curator.ghdsi.org,curator.ghdsi.org ad9f94057436541e5a5d6b4f9b2deec0-1e4e71b2092ca07c.elb.us-east-1.amazonaws.com 80, 443 34s
--------------------------------------------------------------------------------------------------------------------------------------------------
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
curator-dev ClusterIP 10.100.222.22 <none> 80/TCP 357d
curator-dev-aws-1 LoadBalancer 10.100.175.60 a1fc85431766047dda91ffcb74696fa4-156204500.us-east-1.elb.amazonaws.com 443:32233/TCP 105d
curator-prod ClusterIP 10.100.43.67 <none> 80/TCP 357d
curator-prod-aws-1 LoadBalancer 10.100.165.255 ae37d40a64ae44960b885564ddee19cf-419585441.us-east-1.elb.amazonaws.com 443:31681/TCP 105d
data-dev ClusterIP 10.100.204.152 <none> 80/TCP 357d
data-prod ClusterIP 10.100.59.189 <none> 80/TCP 357d
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 357d
location-dev ClusterIP 10.100.234.147 <none> 80/TCP 10d
location-prod ClusterIP 10.100.68.234 <none> 80/TCP 10d
```

We use a deployment file for the data service and for the curator service, check out `data.yaml` and `curator.yaml`.

To update the deployments, first do a dry run:

```shell
kubectl apply -f data.yaml -f curator.yaml --dry-run=server
kubectl apply -f data.yaml -f curator.yaml -f location.yaml --dry-run=server
```

For more verbose output use:

```shell
kubectl apply -f data.yaml -f curator.yaml --dry-run=server --output=yaml
kubectl apply -f data.yaml -f curator.yaml -f location.yaml --dry-run=server --output=yaml
```

Then once changes look good, use:

```shell
kubectl apply -f data.yaml -f curator.yaml
kubectl apply -f data.yaml -f curator.yaml -f location.yaml
```

To confirm changes occurred, use:
Expand All @@ -91,17 +99,22 @@ To read the server logs first find the pod whose logs you want to read.

```shell
kubectl get pods
NAME READY STATUS RESTARTS AGE
curator-dev-6cff5859df-dddbw 1/1 Running 0 148m
curator-prod-5bf5c88f58-g2489 1/1 Running 0 139m
data-dev-566fb67694-xfzkj 1/1 Running 0 148m
data-prod-5b78bdc66d-dwf2k 1/1 Running 4 139m
NAME READY STATUS RESTARTS AGE
curator-dev-7c65fdd7bc-jkfnn 1/1 Running 0 26h
curator-prod-94ccbc565-qtmph 1/1 Running 1 26h
curator-prod-94ccbc565-v7rvs 1/1 Running 0 26h
data-dev-66676656f6-897p8 1/1 Running 0 26h
data-prod-7d479ddbbc-fdkgb 1/1 Running 0 26h
data-prod-7d479ddbbc-qq256 1/1 Running 0 26h
location-dev-5986bf6f77-f8mmb 1/1 Running 0 26h
location-prod-6d674b7b47-lxj2z 1/1 Running 0 26h
location-prod-6d674b7b47-zfgk2 1/1 Running 0 26h
```

Then call logs on the pod you want to read from.

```shell
kubectl logs data-prod-5b78bdc66d-dwf2k
kubectl logs data-prod-7d479ddbbc-qq256
```

## Getting access to the cluster
Expand Down Expand Up @@ -192,7 +205,7 @@ If for some reason a secret has been compromised or if you want to perform a rot

5. Change reference to new secret in deployment configs.

6. Apply configuration changes. (`kubectl apply -f curator.yaml -f data.yaml`)
6. Apply configuration changes. (`kubectl apply -f curator.yaml -f data.yaml -f location.yaml`)

7. Verify new deployment works as intended.

Expand Down Expand Up @@ -253,11 +266,10 @@ In a few seconds the push should be complete.

You can list the existing tags/versions with `git tag` or on the [github repo](https://github.com/globaldothealth/list/releases).

### `main` image tag for dev
### `latest` image tag for dev

Dev instances of curator and data services are using the `main` image tag, that's not best practice as the images are not reloaded automatically - better approach is pending
[Flux-based deployment](https://github.com/globaldothealth/list/issues/673). The latest image with the `main`
tag is fetched when a deployment is updated in kubernetes. To update dev to the `main` image, do:
Dev instances of curator and data services are using the `latest` image tag. Using this tag lets kubernetes know
that the image should always be fetched when restarting the service, which can be done using the following:

```shell
kubectl rollout restart deployment/curator-dev
Expand Down Expand Up @@ -299,14 +311,14 @@ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/cont
Our ingress routes to the dev and prod curator services were installed with:

```shell
kubectl apply -f curator-ingress.yam
kubectl apply -f curator-ingress.yaml
kubectl apply -f curator-ingress-config-map.yaml -n ingress-nginx
```

The curator services are exposed here:

- [dev](https://dev-curator.ghdsi.org)
- [prod](https://curator.ghdsi.org)
- [dev](https://dev-data.covid-19.global.health)
- [prod](https://data.covid-19.global.health)

## Kubernetes dashboard

Expand Down
56 changes: 55 additions & 1 deletion ingestion/functions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Overview

This directory contains the parsing functions
used in the Global Health ingestion system.
used in the Global.health ingestion system.

The objective of the ingestion system is to facilitate a semi-automated
workflow accomplishing the retrieval of epidemiological source data, the
Expand Down Expand Up @@ -95,6 +95,60 @@ python3.8 -m pip install -r requirements.txt

*NB:* Be sure you're using Python 3.8, which corresponds to the runtime of the job definitions run using Batch.

#### Manual ingestion

You should be able to run ingestion using the curator UI. This exists as
a fallback if the UI triggers for ingestion are not working.

1. You'll need AWS access, follow the steps in the previous section.
2. Once you've got AWS setup, run the following in `ingestion/functions` after switching
to the virtualenv:

source venv/bin/activate
python aws.py jobdefs

This should show existing **job definitions**. Job definitions are templates that
tell AWS Batch which parser to run and in which environment (dev or prod). If this
command doesn't work, contact the engineering team to setup access.

3. Check if the ingestion you want to run already has an associated job definition
corresponding to the environment you want to run in:
`python aws.py jobdefs | grep colombia.*prod` to search for Colombia ingestion
in prod, which gives

ACTIVE colombia-colombia-ingestor-prod

4. If step 3 shows that a job definition is available, you can **submit** a job:

python aws.py submit colombia-colombia-ingestor-prod

Check the submit help options `python aws.py submit --help`. The most common
options to use are `-t` (or `--timeout)` to specify the maximum number of *minutes*
the ingestion is allowed to run. The default is 60 minutes, which is fine for
daily ingestion, but might not be enough time to run a backfill.

To run a **backfill**, use the `-s` (`--start-date`) and `-e` (`--end-date`)
flags to delimit the backfill duration. You can now skip to step 6.

5. If there's no existing job definition for a source, you'll need to **register** one.
Registration creates a new job definition which can be used to submit jobs.

First, add the parser to the source in the curator UI. The parser must be named
as follows: `<subfolder>-<parser>-ingestor-<env>`. So for example, if you wanted
to add a job definition corresponding to the parser `parsing/peru/peru.py` in
the *dev* instance, you would put this as the parser `peru-peru-ingestor-dev`.
Then, run the following:

python aws.py register -e prod|dev <source_id> <parser>

Here `<parser>` has to be of the form `subfolder.parser` such as `peru.peru` under
the parsing folder. This will check that the parser corresponds to that in the
curator UI, and create the job definition. You can then submit a job.

6. Once your job has been submitted, you can view its logs through the Cloudwatch
AWS portal. The logs are stored in the `/aws/batch/job` log group. Once the job
is finished, the status of the upload will also be updated in the curator UI.

### Writing and editing functions

For the most part, writing functions is writing standard Python business logic. Each parsing function has a boilerplate code at the end which allows it to be invoked from retrieval with an JSON object describing the parameters for the parsing function:
Expand Down