Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2712 parameterise data service #2727

Merged
merged 4 commits into from
Jun 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/aggregate-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,19 @@ on:
branches: [main, '*-stable']
paths:
- '.github/workflows/aggregate-ci.yml'
- 'data-serving/scripts/aggregate/aggregate/**'
- 'data-serving/scripts/aggregate-covid19/aggregate/**'
pull_request:
paths:
- '.github/workflows/aggregate-ci.yml'
- 'data-serving/scripts/aggregate/aggregate/**'
- 'data-serving/scripts/aggregate-covid19/aggregate/**'
workflow_dispatch:

jobs:
ci:
runs-on: ubuntu-20.04
defaults:
run:
working-directory: data-serving/scripts/aggregate/aggregate
working-directory: data-serving/scripts/aggregate-covid19/aggregate
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.9
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/aggregate-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ on:
branches: [main]
paths:
- '.github/workflows/aggregate-deploy.yml'
- 'data-serving/scripts/aggregate/**'
- '!data-serving/scripts/aggregate/README.md'
- 'data-serving/scripts/aggregate-covid19/**'
- '!data-serving/scripts/aggregate-covid19/README.md'
workflow_dispatch:

jobs:
Expand All @@ -26,7 +26,7 @@ jobs:
uses: aws-actions/amazon-ecr-login@v1

- name: Build, tag, and push image to Amazon ECR (latest)
working-directory: data-serving/scripts/aggregate/aggregate
working-directory: data-serving/scripts/aggregate-covid19/aggregate
env:
REGISTRY: ${{ steps.login-ecr.outputs.registry }}
REPO: gdh-map-aggregator
Expand Down
9 changes: 5 additions & 4 deletions data-serving/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,16 @@

G.h case data, as well as the source, user, and session data for the curator portal, is stored in a MongoDB database.

We have multiple instances of MongoDB, ranging from local instances for development, to dev (for
https://dev-data.covid-19.global.health/), to prod (for https://data.covid-19.global.health/).
We have multiple instances of MongoDB, ranging from local instances for development, to dev and qa (for
https://dev-data.covid-19.global.health/ and https://qa-data.covid-19.global.health in the case of COVID-19 data),
and prod (for https://data.covid-19.global.health/ for COVID-19).

Each instance has a `covid19` database, which in turn has collections for each type of data, e.g. `cases`, `users`, etc.

## Case data

The data in the `cases` collection is the primary data that G.h collects, verifies, and shares. Each document in the
`cases` collection represents a single COVID-19 case.
`cases` collection represents a single disease case.

### Shape of the data

Expand All @@ -33,7 +34,7 @@ indexes the id of the `case` and its revision for quick lookups.

### Importing cases

G.h has millions of case records that predate the new curator portal. These are exported to a
G.h has millions of COVID-19 case records that predate the new curator portal. These are exported to a
[gzipped CSV](https://github.com/beoutbreakprepared/nCoV2019/tree/master/latest_data) in a
[separate repo](https://github.com/beoutbreakprepared/nCoV2019).

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
# Aggregating data

This is a AWS batch script built to aggregate data that is used by Map.
This is a AWS batch script built to aggregate data that is used by COVID-19 Map.
Previously Lambda functions were used for data export, that is now handled
by AWS Batch in [../export-data](../export-data/README.md)

The Batch job (actually two; one for the [dev map](https://dev-map.covid-19.global.health) and one for [prod](https://map.covid-19.global.health)) pulls the code from an image on the Amazon container repo. That image is prepared from the `main` branch by a github action.

**Note for Country level**: Mapbox requires that the lat/long coordinates remain static in order to return the shapes for the choropleth, so this is standardized using the [Google Canonical DSPL countries.csv](https://developers.google.com/public-data/docs/canonical/countries_csv) dataset.

### COVID-19 specific aggregation

This script is only appropriate for aggregating COVID-19 data, as it merges in information from the John Hopkins University to report on completeness.
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ def transform_counts(
counts: list[dict[str, Any]], jhu_counts: dict[str, int] = None
) -> list[dict[str, Any]]:
"""
Transforms aggregated casecount data for map
Transforms aggregated COVID-19 casecount data for map

counts -- Case counts as a list of records of the form
[{"_id": "two-letter country code", "casecount": N}, ... ]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Working with line-list data

This directory contains scripts for converting, ingesting, and otherwise munging line-list data.
This directory contains scripts for converting, ingesting, and otherwise munging COVID-19 line-list data.

## Prerequisites

Expand Down
2 changes: 1 addition & 1 deletion data-serving/scripts/export-data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ data-export/
└── latest
```

Currently the corresponding S3 buckets according to environment are:
Currently the corresponding S3 buckets for the COVID-19 data according to environment are:
* **prod**: covid-19-country-export, covid-19-data-export
* **dev**: covid-19-country-export-dev, covid-19-data-export-dev

Expand Down