Skip to content

Commit

Permalink
Merge branch 'dev' into pandas-1.5
Browse files Browse the repository at this point in the history
  • Loading branch information
zaneselvans committed Sep 15, 2022
2 parents fe6fb2f + f13904e commit 1f3ba5d
Show file tree
Hide file tree
Showing 65 changed files with 1,666 additions and 1,713 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build-deploy-pudl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ jobs:

- name: Post to a pudl-deployments channel
id: slack
uses: slackapi/slack-github-action@v1.21.0
uses: slackapi/slack-github-action@v1.22.0
with:
channel-id: "C03FHB9N0PQ"
slack-message: "build-deploy-pudl status: ${{ job.status }}\n${{ env.ACTION_SHA }}-${{ env.GITHUB_REF }}"
Expand Down
24 changes: 13 additions & 11 deletions .github/workflows/tox-pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,24 @@ jobs:
runs-on: ubuntu-latest
strategy:
fail-fast: false
defaults:
run:
shell: bash -l {0}

steps:
- uses: actions/checkout@v3
with:
fetch-depth: 2

- name: Set up conda environment for testing
uses: conda-incubator/setup-miniconda@v2.1.1
- name: Install Conda environment using mamba
uses: mamba-org/provision-with-micromamba@v13
with:
mamba-version: "*"
channels: conda-forge,defaults
channel-priority: true
python-version: "3.10"
activate-environment: pudl-test
environment-file: test/test-environment.yml
- shell: bash -l {0}
cache-env: true
channels: conda-forge,defaults
channel-priority: strict

- name: Log environment details
run: |
conda info
conda list
Expand Down Expand Up @@ -52,8 +54,8 @@ jobs:
- name: Log SQLite3 version
run: |
conda run -n pudl-test which sqlite3
conda run -n pudl-test sqlite3 --version
which sqlite3
sqlite3 --version
- name: Set default gcp credentials
id: gcloud-auth
Expand All @@ -65,7 +67,7 @@ jobs:
env:
API_KEY_EIA: ${{ secrets.API_KEY_EIA }}
run: |
conda run -n pudl-test tox -- --gcs-cache-path gs://zenodo-cache.catalyst.coop
tox -- --gcs-cache-path gs://zenodo-cache.catalyst.coop
- name: Log post-test Zenodo datastore contents
run: find ~/pudl-work/data/
Expand Down
5 changes: 3 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,11 @@ PUDL currently integrates data from:
* `EIA Form 860 <https://www.eia.gov/electricity/data/eia860/>`__: 2001-2021 (2021 is
early release - use with caution)
* `EIA Form 860m <https://www.eia.gov/electricity/data/eia860m/>`__: 2022-06
* `EIA Form 861 <https://www.eia.gov/electricity/data/eia861/>`__: 2001-2020
* `EIA Form 861 <https://www.eia.gov/electricity/data/eia861/>`__: 2001-2021 (2021 is
early release - use with caution)
* `EIA Form 923 <https://www.eia.gov/electricity/data/eia923/>`__: 2001-2021 (2021 is
early release - use with caution)
* `EPA Continuous Emissions Monitoring System (CEMS) <https://ampd.epa.gov/ampd/>`__: 1995-2021
* `EPA Continuous Emissions Monitoring System (CEMS) <https://campd.epa.gov/>`__: 1995-2021
* `FERC Form 1 <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-1-electric-utility-annual>`__: 1994-2020
* `FERC Form 714 <https://www.ferc.gov/industries-data/electric/general-information/electric-industry-forms/form-no-714-annual-electric/data>`__: 2006-2020
* `US Census Demographic Profile 1 Geodatabase <https://www.census.gov/geographies/mapping-files/2010/geo/tiger-data.html>`__: 2010
Expand Down
56 changes: 36 additions & 20 deletions devtools/eia-etl-debug.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
"%autoreload 3\n",
"import pudl\n",
"import logging\n",
"import sys\n",
Expand Down Expand Up @@ -64,15 +64,24 @@
"from pudl.metadata.classes import DataSource\n",
"\n",
"eia860_data_source = DataSource.from_id(\"eia860\")\n",
"eia860_years = eia860_data_source.working_partitions[\"years\"]\n",
"#eia860_years = [2020]\n",
"eia860_settings = Eia860Settings(years=eia860_years)\n",
"eia860_settings = Eia860Settings(\n",
"# Limit the years as needed if you're testing only a few of them. E.g.:\n",
" years=[2021],\n",
"# years=eia860_data_source.working_partitions[\"years\"]\n",
"# By default all of the tables will be processed.\n",
"# Select the relevant tables as needed if you're testing only a few of them. E.g.:\n",
"# tables=[\"generation_fuel_nuclear_eia923\", \"generation_fuel_eia923\"]\n",
")\n",
"\n",
"# Uncomment to use all available years:\n",
"eia923_data_source = DataSource.from_id(\"eia923\")\n",
"eia923_years = eia923_data_source.working_partitions[\"years\"]\n",
"#eia923_years = [2020]\n",
"eia923_settings = Eia923Settings(years=eia923_years)\n",
"eia923_settings = Eia923Settings(\n",
"# Limit the years as needed if you're testing only a few of them. E.g.:\n",
" years = [2021]\n",
" # years = eia923_data_source.working_partitions[\"years\"]\n",
"# By default all of the tables will be processed.\n",
"# Select the relevant tables as needed if you're testing only a few of them. E.g.:\n",
"# tables=[\"generation_fuel_nuclear_eia923\", \"generation_fuel_eia923\"]\n",
")\n",
"\n",
"eia_settings = EiaSettings(eia860=eia860_settings, eia923=eia923_settings)"
]
Expand Down Expand Up @@ -116,10 +125,12 @@
"source": [
"%%time\n",
"eia860_extractor = pudl.extract.eia860.Extractor(ds)\n",
"eia860_raw_dfs = eia860_extractor.extract(year=eia860_settings.years)\n",
"eia860_raw_dfs = eia860_extractor.extract(settings=eia860_settings)\n",
"\n",
"eia860m_extractor = pudl.extract.eia860m.Extractor(ds)\n",
"if eia860_settings.eia860m:\n",
" eia860m_raw_dfs = pudl.extract.eia860m.Extractor(ds).extract(\n",
" year_month=eia860_settings.eia860m_date\n",
" eia860m_raw_dfs = eia860m_extractor.extract(\n",
" settings=eia860_settings\n",
" )\n",
" eia860_raw_dfs = pudl.extract.eia860m.append_eia860m(\n",
" eia860_raw_dfs=eia860_raw_dfs,\n",
Expand All @@ -143,7 +154,7 @@
"%%time\n",
"eia860_transformed_dfs = pudl.transform.eia860.transform(\n",
" eia860_raw_dfs,\n",
" eia860_tables=eia860_settings.tables,\n",
" eia860_settings=eia860_settings,\n",
")"
]
},
Expand All @@ -169,7 +180,7 @@
"source": [
"%%time\n",
"eia923_extractor = pudl.extract.eia923.Extractor(ds)\n",
"eia923_raw_dfs = eia923_extractor.extract(year=eia923_settings.years)"
"eia923_raw_dfs = eia923_extractor.extract(settings=eia_settings.eia923)"
]
},
{
Expand All @@ -188,7 +199,7 @@
"%%time\n",
"eia923_transformed_dfs = pudl.transform.eia923.transform(\n",
" eia923_raw_dfs,\n",
" eia923_tables=eia923_settings.tables,\n",
" eia923_settings=eia923_settings,\n",
")"
]
},
Expand Down Expand Up @@ -224,14 +235,12 @@
" \n",
"entities_dfs, eia_transformed_dfs = pudl.transform.eia.transform(\n",
" eia_transformed_dfs,\n",
" eia860_years=eia860_settings.years,\n",
" eia923_years=eia923_settings.years,\n",
" eia860m=eia860_settings.eia860m,\n",
" eia_settings=eia_settings,\n",
")\n",
"\n",
"# Assign appropriate types to new entity tables:\n",
"entities_dfs = {\n",
" name: pudl.helpers.apply_pudl_dtypes(df, group=\"eia\")\n",
" name: pudl.metadata.fields.apply_pudl_dtypes(df, group=\"eia\")\n",
" for name, df in entities_dfs.items()\n",
"}\n",
"\n",
Expand All @@ -242,10 +251,17 @@
" .encode(entities_dfs[table])\n",
" )\n",
"\n",
"out_dfs = pudl.etl._read_static_tables_eia()\n",
"out_dfs = pudl.etl._read_static_encoding_tables(etl_group=\"static_eia\")\n",
"out_dfs.update(entities_dfs)\n",
"out_dfs.update(eia_transformed_dfs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -264,7 +280,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.2"
"version": "3.10.6"
}
},
"nbformat": 4,
Expand Down
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/dev/run_the_etl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ You've changed the settings and renamed the file to CUSTOM_ETL.yml
$ pudl_etl settings/CUSTOM_ETL.yml
.. _add-cems-later:

Processing EPA CEMS Separately
------------------------------
As mentioned above, CEMS takes a while to process. Luckily, we've designed PUDL so that
Expand Down
73 changes: 62 additions & 11 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,45 @@
PUDL Release Notes
=======================================================================================

.. _release-v2022.08.XX:
.. _release-v2022.09.XX:

---------------------------------------------------------------------------------------
2022.08.XX
2022.09.XX
---------------------------------------------------------------------------------------

Data Coverage
^^^^^^^^^^^^^
* Added archives of the bulk EIA electricity API data to our datastore, since the API
itself is too unreliable for production use. This is part of :issue:`1763`. The code
for this new data is ``eia_bulk_elec`` and the data comes as a single 200MB zipped
JSON file. :pr:`1922` updates the datastore to include
`this archive on Zenodo <https://zenodo.org/record/7067367>`__ but most of the work
happened in the
`pudl-scrapers <https://github.com/catalyst-cooperative/pudl-scrapers>`__ and
`pudl-zenodo-storage <https://github.com/catalyst-cooperative/pudl-zenodo-storage>`__
repositories. See issue :issue:`catalyst-cooperative/pudl-zenodo-storage#29`.
* Incorporated 2021 data from the :doc:`data_sources/epacems` dataset. See :pr:`1778`
* Incorporated 2021 data from the :doc:`data_sources/eia860` and
:doc:`data_sources/eia923`. Early Release. Early release data is EIA's preliminary
annual release and should be used with caution. We also integrated a ``data_maturity``
column and related ``data_maturities`` table into most of the EIA data tables in
order to alter users to the level of finality of the data. :pr:`1834` :pr:`1855`
* Incorporated Early Release 2021 data from the :doc:`data_sources/eia860`,
:ref:`data-eia861`, and :doc:`data_sources/eia923`. Early release data is EIA's
preliminary annual release and should be used with caution. We also integrated a
``data_maturity`` column and related ``data_maturities`` table into most of the EIA
data tables in order to alter users to the level of finality of the data. See
:pr:`1834,1855,1915,1921`
* Incorporated 2022 data from the :doc:`data_sources/eia860` monthly update from June
2022. See :pr:`1834`. This included adding new ``energy_storage_capacity_mwh`` (for
batteries) and ``net_capacity_mwdc`` (for behind-the-meter solar PV) attributes to the
:ref:`generators_eia860` table, as they appear in the :doc:`data_sources/eia860`
monthly updates for 2022.
* We've integrated several new columns into the EIA 860 and EIA 923 including several
* Integrated several new columns into the EIA 860 and EIA 923 including several
codes with coding tables (See :doc:`data_dictionaries/codes_and_labels`). :pr:`1836`
* Added the `EPACAMD-EIA Crosswalk <https://github.com/USEPA/camd-eia-crosswalk>`__ to
the database. Previously, the crosswalk was a csv stored in ``package_data/glue``,
but now it has its own scraper
:pr:`https://github.com/catalyst-cooperative/pudl-scrapers/pull/20`, archiver,
:pr:`https://github.com/catalyst-cooperative/pudl-zenodo-storage/pull/20`
and place in the PUDL db. For now there's a ``epacamd_eia`` output table you can use
to merge CEMS and EIA data yourself :pr:`1692`. Eventually we'll work these crosswalk
values into an output table combining CEMS and EIA.

Nightly Data Builds
^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -77,9 +95,30 @@ Database Schema Changes

* Renamed ``grid_voltage_kv`` to ``grid_voltage_1_kv`` in the :ref:`plants_eia860`
table, to follow the pattern of many other multiply reported values.

Date Merge Helper Function
^^^^^^^^^^^^^^^^^^^^^^^^^^
* Added a :ref:`balancing_authorities_eia` coding table mapping BA codes found in the
:doc:`data_sources/eia860` and :doc:`data_sources/eia923` to their names, cleaning up
non-standard codes, and fixing some reporting errors for ``PACW`` vs. ``PACE``
(PacifiCorp West vs. East) based on the state associated with the plant reporting the
code. Also added backfilling for codes in years before 2013 when BA Codes first
started being reported, but only in the output tables. See: :pr:`1906,1911`
* Renamed and removed some columns in the :doc:`data_sources/epacems` dataset.
``unitid`` was changed to ``emissions_unit_id_epa`` to clarify the type of unit it
represents. ``unit_id_epa`` was removed because it is a unique identifyer for
``emissions_unit_id_epa`` and not otherwise useful or transferable to other datasets.
``facility_id`` was removed because it is specific to EPA's internal database and does
not aid in connection with other data. :pr:`1692`

Data Accuracy
^^^^^^^^^^^^^
* Retain NA values for :doc:`data_sources/epacems` fields ``gross_load_mw`` and
``heat_content_mmbtu``. Previously, these fields converted NA to 0, but this is not
accurate, so we removed this step.
* Update the ``plant_id_eia`` field from :doc:`data_sources/epacems` with values from
the newly integrated ``epacamd_eia`` crosswalk as not all EPA's ORISPL codes are
correct.

Helper Function Updates
^^^^^^^^^^^^^^^^^^^^^^^
* Replaced the PUDL helper function ``clean_merge_asof`` that merged two dataframes
reported on different temporal granularities, for example monthly vs yearly data.
The reworked function, :mod:`pudl.helpers.date_merge`, is more encapsulating and
Expand All @@ -94,6 +133,10 @@ Date Merge Helper Function
makes this function optionally used to generate the MCOE table that includes a full
monthly timeseries even in years when annually reported generators don't have
matching monthly data. See :pr:`1550`
* Updated the ``fix_leading_zero_gen_ids`` fuction by changing the name to
``remove_leading_zeros_from_numeric_strings`` because it's used to fix more than just
the ``generator_id`` column. Included a new argument to specify which column you'd
like to fix.

Plant Parts List Module Changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -122,6 +165,14 @@ Metadata
* Used the data source metadata class added in release 0.6.0 to dynamically generate
the data source documentation (See :doc:`data_sources/index`). :pr:`1532`

Documentation
^^^^^^^^^^^^^
* Fixed broken links in the documentation since the Air Markets Program Data (AMPD)
changed to Clean Air Markets Data (CAMD).
* Added graphics and clearer descriptions of EPA data and reporting requirements to the
:doc:`data_sources/epacems` page. Also included information about the ``epacamd_eia``
crosswalk.

Bug Fixes
^^^^^^^^^
* `Dask v2022.4.2 <https://docs.dask.org/en/stable/changelog.html#v2022-04-2>`__
Expand Down
Loading

0 comments on commit 1f3ba5d

Please sign in to comment.