Home

Welcome to the cmdc-tools wiki!

This hosts the developer documentation.

Start here if you'd like to contribute to the project -- we'd love to have you!

Developer setup

To set up your system for contributing to the project, we recommend the following

Follow the instructions in the README for installing python
Install docker
Get the docker container for the project:
- If you made changes to any file(s) in db/schema run the following from project root: docker build -t valorumdata/cmdc-tools-pg:latest db (NOTE: you need to do this, stop and remove the container, and start it over again EACH time you change one of the schema files)
- Otherwise, grab the latest from docker hub: docker pull valorumdata/cmdc-tools-pg:latest
Start a docker container for the postgres instance: docker run --name cmdc-tools-pg -e POSTGRES_PASSWORD=password -p 5432:5432 valorumdata/cmdc-tools-pg:latest

Then you can make edits to your scraper, suppose it is named XYZ

Then do the following to run only the tests for the scraper you are developing: PG_CONN_STR="postgresql://postgres:password@localhost:5432" pytest -v -k XYZ

Creating a scraper

Let's talk about scrapers

To create a scraper, do the following:

Other Notes

If you are scraping an ArcGIS dashboard, please use the ArcGIS class found in src/cmdc_tools/datasets/official/base.py as a parent class. Please use the methods on that class when writing your get method. NOTE that usage of this class requires setting some more class level attributes
If you are adding an entirely new datasource, we will have to create PostgreSQL table(s) to store the data. Please work with the core team to do this.

Example

Let's see an example scraper

Here is the source (as of 2020-07-01) for the Pennsylvania scraper:

import textwrap
import pandas as pd
import us

# Parent classes
from ...base import DatasetBaseNoDate
from ..base import ArcGIS


# class name is `Pennsylvania` indicating geography for scraper
class Pennsylvania(DatasetBaseNoDate, ArcGIS):
    # Using ArcGIS , so need to set this class attribute
    ARCGIS_ID = "xtuWQvb2YQnp0z3F"

    # Other required class level attributes as described above
    source = (
        "https://www.arcgis.com/apps/opsdashboard/"
        "index.html#/85054b06472e4208b02285b8557f24cf"
    )
    state_fips = int(us.states.lookup("Pennsylvania").fips)
    has_fips: bool = False

    def get(self):
        # Using `ArcGIS` parent class method to get data
        df = self.get_all_sheet_to_df(
            service="County_Case_Data_Public", sheet=0, srvid=2
        )

        # dict to have columns match the schema -- see note about `covid_us` endpoint above
        column_map = {
            "COUNTY_NAM": "county",
            "Cases": "cases_total",
            "Deaths": "deaths_total",
            "AvailableBedsAdultICU": "available_icu_beds",
            "AvailableBedsMedSurg": "available_other_beds",
            "AvailableBedsPICU": "available_picu_beds",
            "COVID19Hospitalized": "hospital_beds_in_use_covid_confirmed",
            "TotalVents": "ventilators_capacity_count",
            "VentsInUse": "ventilators_in_use_any",
            "COVID19onVents": "ventilators_in_use_covid_confirmed",
        }
        renamed = df.rename(columns=column_map)

        # the column we used was non-covid, need to add covid to get total
        renamed["ventilators_in_use_any"] += renamed[
            "ventilators_in_use_covid_confirmed"
        ]

        renamed = renamed.loc[:, list(column_map.values())]
        # reshape from wide to long form
        out = renamed.melt(
            id_vars=["county"], var_name="variable_name", value_name="value"
        )

        # add the `dt` and `vintage` columns
        dt = pd.Timestamp.utcnow().normalize()
        return out.assign(dt=dt, vintage=dt)

This code is in the file src/cmdc_tools/datasets/official/PA/data.py

The Pennsylvania class is added to the following namespace files:

# src/cmdc_tools/datasets/official/PA/__init__.py
from .data import Pennsylvania

# src/cmdc_tools/datasets/official/__init__.py
from .PA import Pennsylvania

# src/cmdc_tools/datasets/__init__.py
from .official import (
    # many other scrapers
    Pennsylvania,
    # even more scrapers
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Developer setup

Creating a scraper

Example

Clone this wiki locally