diff --git a/.github/ISSUE_TEMPLATE/new_dataset.md b/.github/ISSUE_TEMPLATE/new_dataset.md new file mode 100644 index 0000000000..267c23b2bb --- /dev/null +++ b/.github/ISSUE_TEMPLATE/new_dataset.md @@ -0,0 +1,21 @@ +--- +name: New dataset +about: Provide information about a new dataset you'd like to see in PUDL +title: '' +labels: new-dataset +assignees: '' +--- + +### Overview + +What is this dataset? Why do you want it in PUDL? Is it already partially in +PUDL, or do we need to start from scratch? + +### Where is it? + +Is this dataset publically available? Where can one download the actual data? + +### What do you know about it so far? + +What have you done with this dataset so far? Have you run into any problems with +it yet? diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 325f1bcb8a..788386f562 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,49 +1,26 @@ +# Overview -# PR Overview +Closes #XXXX. - +How did you make sure this worked? How can a reviewer verify this? -# PR Checklist - -- [ ] Merge the most recent version of the branch you are merging into (probably `dev`). -- [ ] All CI checks are passing. [Run tests locally to debug failures](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#running-tests-with-tox) -- [ ] Make sure you've included good docstrings. +```[tasklist] +# Remaining work +- [ ] Make sure full ETL runs & `make pytest-integration-full` passes locally - [ ] For major data coverage & analysis changes, [run data validation tests](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#data-validation) -- [ ] Include unit tests for new functions and classes. -- [ ] Defensive data quality/sanity checks in analyses & data processing functions. -- [ ] Update the [release notes](https://catalystcoop-pudl.readthedocs.io/en/latest/release_notes.html) and reference reference the PR and related issues. -- [ ] Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively. +- [ ] If updating analyses or data processing functions: write data quality checks +- [ ] Update the [release notes](../docs/release_notes.rst): reference the PR and related issues. +- [ ] Review the PR yourself and call out any questions or issues you have +``` + diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst new file mode 100644 index 0000000000..c35bb7e3bf --- /dev/null +++ b/CONTRIBUTING.rst @@ -0,0 +1,62 @@ +-------------------- +Contributing to PUDL +-------------------- + +Help get more data into PUDL! + + +.. IMPORTANT:: Already have a dataset in mind? + + If you **need data that's not in PUDL** that we're missing in PUDL, + `open an issue `__. + + If you've **already written some code to wrangle a dataset**, find us at + `office hours `__ and we + can talk through next steps for how to get that into PUDL. + +Your first contribution +----------------------- + +**Setup** + +You'll need to fork this repository and get the +`dev environment set up `__. + +**Pick an issue** + +* Look for issues with the `good first issue + `__ + tag. These are issues that don't require a ton of PUDL-specific context, and + are relatively tightly scoped to boot. + +* Comment on the issue and tag ``@com-dev`` (our Community Development Team) to + let us know you're working on it. Feel free to ask any questions you might + have! + +* Once you have an idea of how you want to tackle this issue, write out your + plan so we can guide you around obstacles in your way. + +**Work on it!** + +* Make a branch on your fork and open a draft PR early so we can discuss + concrete code! Please don't wait until it's all polished up - it's much easier + for us to help you when we can see the code evolve over time. + +* Please make sure to write tests and documentation for your code - if you run + into trouble with writing tests, let us know in the comments and we can help! + +* Please try to keep your changes relatively small: stuff happens, and one's + bandwidth for volunteer work can fluctuate frequently. If you make a bunch of + small changes, it's much easier to pause on a project without losing a ton of + context. + +**Get it merged in!** + +* Turn the draft PR into a normal PR and ping ``@com-dev``. We'll try to get + back to you within a few days. + +**Next contributions** + +Any issues with the `community +`__ +tag are up for grabs! Follow the same process as above. diff --git a/docs/CONTRIBUTING.rst b/docs/CONTRIBUTING.rst index 616e3d0149..2c15c8da80 100644 --- a/docs/CONTRIBUTING.rst +++ b/docs/CONTRIBUTING.rst @@ -2,52 +2,59 @@ Contributing to PUDL =============================================================================== + Welcome! We're excited that you're interested in contributing to the Public Utility -Data Liberation effort! The work is currently being coordinated by the members of the -`Catalyst Cooperative `__. PUDL is meant to serve a wide -variety of public interests including academic research, climate advocacy, data -journalism, and public policy making. This open source project has been supported by -a combination of volunteer contributions, grant funding from the `Alfred P. Sloan -Foundation `__, and reinvestment of net income from the -cooperative's client projects. +Data Liberation effort! + +If you're interested in contributing directly to the PUDL database, see +:ref:`direct-contribs`. + +It can also be very helpful to provide :ref:`user-feedback`, or +help :ref:`connect-orgs` that we can work with. + +--------------- +Code of Conduct +--------------- Please make sure you review our :doc:`code of conduct `, which is based on the `Contributor Covenant `__. We want to make the PUDL project welcoming to contributors with different levels of experience and diverse personal backgrounds. -------------------------------------------------------------------------------- -How to Get Involved -------------------------------------------------------------------------------- - -There are several areas in which we would welcome your help! Many of these -require a GitHub account, since that is where we manage the project. `Signing -up for a GitHub account `__ (even if you don't intend -to write code) will allow you to participate in online discussions and track -projects that you're interested in. - -First is *user feedback* - if you use PUDL, we would love to talk to you and -understand what your use cases and problems are. This helps us steer the -project towards greater usefulness! Here are some avenues to get in touch: - -* If you need help, someone else might need it too - ask for help in `Github - Discussions - `__ - and maybe the ensuing discussion will be useful to other people too! -* Suggest new features, dataset integrations, structural changes, or just give - us feedback on overall usability using `GitHub Discussions - `__. -* If something went wrong, `file a bug report - `__ - on Github. - -* Help us plan the future of PUDL by telling us what you're using it for! - hello@catalyst.coop works great to get in touch. - -Second is *networking/growth* - for PUDL to be a go-to source of public -information about the US energy system, and help advocates with the clean -energy transition, we need to grow our community and business. Here's how you -can help: +.. _direct-contribs: + +.. include:: ../CONTRIBUTING.rst + +.. _user-feedback: + +------------- +User feedback +------------- + +PUDL's goal is to help people use data to make change in the US energy landscape. + +As such, it's critical that we understand our users' needs! + +We'd love to hear about: + +* what data you're looking for that we don't have +* what you're trying to do with PUDL data +* what issues you're running into with data access or interpretation +* any problems you find in our data +* anything you find confusing in our documentation + +`GitHub Discussions `__ +is a great place to do this, but `emailing us `__ +works too! + +.. _connect-orgs: + +----------------------------------- +Connect us with other organizations +----------------------------------- + +For PUDL to make a bigger impact, we need to find more people who need the data. +Here's how you can help: * Cite PUDL using `DOIs from Zenodo `__ if you use the @@ -61,66 +68,3 @@ can help: * `Hire Catalyst `__ to do analysis for your organization using the PUDL data -- contract work helps us self-fund ongoing open source development. -* And of course... we also appreciate `financial contributions - `__. - -Third is *direct contributions to the technical system* - code and -documentation! This is the most hands-on, in-the-weeds way to contribute, and -obviously helps us make the whole system more capable! - -* Check out the `Code contribution process`_ section below for a process - overview. -* See the :doc:`developer setup ` for technical details -* We also welcome documentation updates, which follow the general code - contribution process! - - -------------------------------------------------------------------------------- -Code contribution process -------------------------------------------------------------------------------- - -Our goals for you are: - -* contribute to something important -* not accidentally end up on the critical path for a time-sensitive task and - end up working a second shift to finish something -* not flounder in a sea of high-context tasks - -To support this, we've set up a `GitHub Projects view -`__ which we -update on a rolling basis. It includes a handful of tasks that are: - -* important and non-urgent -* clearly scoped -* owned by a Catalyst employee who can be your buddy - -If you have an idea for some work you'd like to do that's not on the board, you -should absolutely find/create a new issue or post a Github Discussion - then we -can talk about how Catalyst can support that work! - -We envision a flow like this: - -1. You go to the GitHub Projects "community" view and poke around at the - backlog until you find something you find interesting. -2. You ask some questions about the scope and we attempt to clarify what needs - doing. -3. If you still want to take the task on, assign the issue to yourself and - you're off to the races! We'll probably bother you for updates occasionally. -4. You put up an early draft PR for feedback. -5. Eventually, you convert the draft to a standard PR, we do a thorough review, - and it gets merged! Go back to #1. - -Some guidelines: - -* small PRs: we understand that stuff happens, and one's bandwidth for - volunteer work can fluctuate frequently. One way to make that feel a little - better for both the contributor and the project is to ship many small - changes, so there's never a ton of dangling work. - -* early drafts: our system has evolved over several years and can be quite - confusing. Pushing up an early draft PR will help Catalyst members guide you - gently away from pitfalls. - -* write tests and documentation: this is critical for expressing what - the software "should" do, which is helpful both in development and in - maintenance. If you haven't done much of this before, we can help! diff --git a/src/pudl/metadata/resources/ferc1_eia_record_linkage.py b/src/pudl/metadata/resources/ferc1_eia_record_linkage.py index e1a5f89032..c60ecedf3f 100644 --- a/src/pudl/metadata/resources/ferc1_eia_record_linkage.py +++ b/src/pudl/metadata/resources/ferc1_eia_record_linkage.py @@ -23,8 +23,8 @@ Because generators are often owned by multiple utilities, another dimension of this plant part table involves generating two records for each owner: one for the portion of the plant part they own and one for the plant part as a whole. The -portion records are labeled in the "ownership_record_type" column as "owned" -and the total records are labeled as "total". +portion records are labeled in the ``ownership_record_type`` column as ``owned`` +and the total records are labeled as ``total``. This table includes A LOT of duplicative information about EIA plants. It is primarily meant for use as an input into the record linkage between FERC1 plants and EIA.""",