Skip to content

Commit

Permalink
Reorganize contributing docs + add process description.
Browse files Browse the repository at this point in the history
  • Loading branch information
jdangerx committed Dec 5, 2023
1 parent 8005fd0 commit 3d2c405
Show file tree
Hide file tree
Showing 6 changed files with 197 additions and 129 deletions.
30 changes: 30 additions & 0 deletions .github/ISSUE_TEMPLATE/new_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
name: New dataset
about: Provide information about a new dataset you'd like to see in PUDL
title: ''
labels: new-data
assignees: ''
---

### Overview

What is this dataset?

Why do you want it in PUDL?

Is it already partially in PUDL, or do we need to start from scratch?

### Logistics

Is this dataset publically available?

Where can one download the actual data?

How often does this dataset get updated?

What licensing restrictions apply?

### What do you know about it so far?

What have you done with this dataset so far? Have you run into any problems with
it yet?
52 changes: 14 additions & 38 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,25 @@
<!--
Making a PUDL Pull Request
Before making a PR you may want to check out our:
Resources:
* contributing guidelines: https://catalystcoop-pudl.readthedocs.io/en/latest/CONTRIBUTING.html
* code of conduct: https://catalystcoop-pudl.readthedocs.io/en/latest/code_of_conduct.html
* development process: https://catalystcoop-pudl.readthedocs.io/en/latest/dev/index.html
## PR Process Overview
* PRs have to get an approving review before merging into their development branch.
* Most PRs should be made against the `dev` branch, unless they are part of some larger ongoing refactoring, in which case there will be a persistent development branch for that work.
* It is much easier to do timely code reviews on smaller chunks of code. We try to keep PRs under 500 lines of code.
* Draft PRs are a good way to get early feedback on designs or several incremental commits that will add up to larger changes. If you want a review of a draft PR, make sure you contact the reviewer directly or mention their username in the PR comment, so they get a notification.
* How quickly we can review a PR will depend on how large and complex it is, and how busy we are, but ideally we strive to get an initial review done within a week. If there are going to be delays, we should at least comment on the PR to let you know the situation.
* If you believe you've addressed a reviewer's comments, respond with a brief note and mark the comment resolved. If further discussion is requried respond and do not resolve the comment.
* Before a PR is merged all reviewer comments should be resolved. If a reviewer doesn't feel that their comment has been sufficiently addressed, they may unresolve a comment.
* Be careful not to accidentally "start a review" when responding to comments! If this does happen, don't forget to submit the review you've started so the other PR participatns can see your comments (they are invisible to others if marked "Pending").
* In the period after an initial review when there is significant back-and-forth with the reviewer deciding what changes should actually be made, there should probably be daily interaction. If significant changes are required, it's usually best to request another review after those changes have been made.
Feel free to delete the commented-out parts of the template before submitting the PR.
-->
# Overview

# PR Overview
Closes #XXXX.

<!--
What problem does this address?

Include a short narrative summary of what's going on in the PR. This can be a bulleted list. You might want to include:
What did you change?

* What are you changing and why?
* Are there any known unsolved problems remaining in the PR?
* Is there anything that you want a reivewer to pay particular attention to?
* What kind of feedback are you looking for on the PR?
-->
# Testing

# PR Checklist
How did you make sure this worked? How can a reviewer verify this?

- [ ] Merge the most recent version of the branch you are merging into (probably `dev`).
- [ ] All CI checks are passing. [Run tests locally to debug failures](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#running-tests-with-tox)
- [ ] Make sure you've included good docstrings.
```[tasklist]
# To-do list
- [ ] Make sure full ETL runs & `make pytest-integration-full` passes locally
- [ ] For major data coverage & analysis changes, [run data validation tests](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/testing.html#data-validation)
- [ ] Include unit tests for new functions and classes.
- [ ] Defensive data quality/sanity checks in analyses & data processing functions.
- [ ] Update the [release notes](https://catalystcoop-pudl.readthedocs.io/en/latest/release_notes.html) and reference reference the PR and related issues.
- [ ] Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.
- [ ] If updating analyses or data processing functions: write data quality checks
- [ ] Update the [release notes](../docs/release_notes.rst): reference the PR and related issues.
- [ ] Review the PR yourself and call out any questions or issues you have
```
1 change: 1 addition & 0 deletions .github/workflows/build-deploy-pudl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -139,4 +139,5 @@ jobs:
channel-id: "C03FHB9N0PQ"
slack-message: "build-deploy-pudl status: ${{ job.status }}\n${{ env.COMMIT_TIME}}-${{ env.SHORT_SHA }}-${{ env.COMMIT_BRANCH }}"
env:
channel-id: "C03FHB9N0PQ"
SLACK_BOT_TOKEN: ${{ secrets.PUDL_DEPLOY_SLACK_TOKEN }}
97 changes: 97 additions & 0 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
--------------------
Contributing to PUDL
--------------------

Welcome! We're so glad you're interested in contributing to PUDL! We would love
some help making PUDL data as complete as possible.

.. _after-intro:

.. IMPORTANT:: Already have a dataset in mind?

If you **need data that's not in PUDL** that we're missing in PUDL,
`open an issue <https://github.com/catalyst-cooperative/pudl/issues/new?assignees=&labels=new-data&projects=&template=new_dataset.md&title=>`__
to tell us more about it!

If you've **already written some code to wrangle a dataset**, find us at
`office hours <https://calend.ly/catalyst-cooperative/pudl-office-hours>`__ and we
can talk through next steps.


Your first contribution
-----------------------

**Setup**

You'll need to fork this repository and get the
`dev environment set up <https://catalystcoop-pudl.readthedocs.io/en/latest/dev/dev_setup.html>`__.

**Pick an issue**

* Look for issues with the `good first issue
<https://github.com/catalyst-cooperative/pudl/issues?q=is%3Aissue+is%3Aopen+label%3Agood-first-issue>`__
tag in our `Community Kanban Board
<https://github.com/orgs/catalyst-cooperative/projects/9/views/19>`__. These
are issues that don't require a ton of PUDL-specific context, and are
relatively tightly scoped.

* Comment on the issue and tag ``@com-dev`` (our Community Development Team) to
let us know you're working on it. Feel free to ask any questions you might
have!

* Once you have an idea of how you want to tackle this issue, write out your
plan so we can guide you around obstacles in your way! Post a comment outlining:
* what steps have you broken this down into?
* what is the output of each step?
* how will one know that each step is working?

**Work on it!**

* Make a branch on your fork and open a draft pull request (PR) early so we can
discuss concrete code! **Set the base branch to ``dev`` unless there's a good
reason otherwise.** Please don't wait until it's all polished up - it's much
easier for us to help you when we can see the code evolve over time.

* Please make sure to write tests and documentation for your code - if you run
into trouble with writing tests, let us know in the comments and we can help!
We automatically run the test suite for all PRs, but some of those will have
to be manually approved by Catalyst members for safety reasons.

* **Try to keep your changes relatively small:** stuff happens, and one's
bandwidth for volunteer work can fluctuate frequently. If you make a bunch of
small changes, it's much easier to pause on a project without losing a ton of
context. We try to keep PRs to **less than 500 lines of code.**

**Get it merged in!**

* Turn the draft PR into a normal PR and ping ``@com-dev``. We'll try to get
back to you within a few days - the smaller/simpler the PR, the faster we'll
be able to get back to you.

* The reviewer will leave comments - if they request changes, address their
concerns and re-request review.

* There will probably be some back-and-forth until your PR is approved - this
is normal and a sign of good communication on your part! Don't be shy about
asking us for updates and re-requesting review!

* Don't accidentally "start a review" when responding to comments! If this does
happen, don't forget to submit the review you've started so the other PR
participants can see your comments (they are invisible to others if marked
"Pending").

Next contributions
------------------

Hooray! You made your first contribution! To find another issue to tackle, check
out the `Community Kanban board
<https://github.com/orgs/catalyst-cooperative/projects/9/views/19>`__ where
we've picked out some issues that are

* useful to work on

* unlikely to become super time-sensitive

* have some context, success criteria, and next steps information.

Pick one of these and follow the contribution flow above!
142 changes: 53 additions & 89 deletions docs/CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,111 +2,75 @@
Contributing to PUDL
===============================================================================


Welcome! We're excited that you're interested in contributing to the Public Utility
Data Liberation effort! The work is currently being coordinated by the members of the
`Catalyst Cooperative <https://catalyst.coop>`__. PUDL is meant to serve a wide
variety of public interests including academic research, climate advocacy, data
journalism, and public policy making. This open source project has been supported by
a combination of volunteer contributions, grant funding from the `Alfred P. Sloan
Foundation <https://sloan.org>`__, and reinvestment of net income from the
cooperative's client projects.
Data Liberation effort!

We need lots of help with :ref:`user-feedback`, we welcome :ref:`code-contribs`, and
it would be great to :ref:`connect-orgs` that we can work with.

---------------
Code of Conduct
---------------

Please make sure you review our :doc:`code of conduct <code_of_conduct>`, which is
based on the `Contributor Covenant <https://www.contributor-covenant.org/>`__. We
want to make the PUDL project welcoming to contributors with different levels of
experience and diverse personal backgrounds.

-------------------------------------------------------------------------------
How to Get Involved
-------------------------------------------------------------------------------
.. _user-feedback:

We welcome just about any kind of contribution to the project. Alone, we'll never be
able to understand every use case or integrate all the available data. The project
will serve the community better if other folks get involved.
-------------
User feedback
-------------

There are lots of ways to contribute -- it's not all about code!
PUDL's goal is to help people use data to make change in the US energy landscape.
As such, it's critical that we understand our users' needs! `GitHub Discussions
<https://github.com/orgs/catalyst-cooperative/discussions>`__ is our main forum
for all this. Since it's publicly readable, any conversation here can
potentially benefit other users too!

* If you need help, someone else might need it too - ask for help in `Github
Discussions
We'd love it if you could:

* Tell us what problems you're running into, in the `Help Me!
<https://github.com/orgs/catalyst-cooperative/discussions/categories/help-me>`__
and maybe the ensuing discussion will be useful to other people too!
* `Suggest new data and features <https://github.com/catalyst-cooperative/pudl/issues/new?template=feature_request.md>`__ that would be useful.
discussion board
* Tell us about what data you're looking for by opening an `issue
<https://github.com/catalyst-cooperative/pudl/issues/new?assignees=&labels=new-data&projects=&template=new_dataset.md&title=>`__
* Tell us what you're trying to do with PUDL data in `this thread
<https://github.com/orgs/catalyst-cooperative/discussions/3105>`__
* `File bug reports <https://github.com/catalyst-cooperative/pudl/issues/new?template=bug_report.md>`__ on Github.
* Help expand and improve the documentation, or create new
`example notebooks <https://github.com/catalyst-cooperative/pudl-examples/>`__
* Help us create more and better software :doc:`test cases <dev/testing>`.
* Give us feedback on overall usability using `GitHub Discussions
* Tell us what you'd like to see in PUDL in the `Ideas
<https://github.com/orgs/catalyst-cooperative/discussions/categories/ideas>`__
-- what's confusing?
* Tell us a story about how you're using of the data.
* Point us at interesting publications related to open energy data, open source energy
system modeling, how energy policy can be affected by better data, or open source
tools we should check out.
* Cite PUDL using
`DOIs from Zenodo <https://zenodo.org/communities/catalyst-cooperative/>`__
if you use the software or data in your own published work.
discussion board

.. _code-contribs:

--------------------
Code contributions
--------------------

.. include:: ../CONTRIBUTING.rst
:start-after: after-intro:

.. _connect-orgs:

-----------------------------------
Connect us with other organizations
-----------------------------------

For PUDL to make a bigger impact, we need to find more people who need the data.
Here's how you can help:

* Cite PUDL using `DOIs from Zenodo
<https://zenodo.org/communities/catalyst-cooperative/>`__ if you use the
software or data in your own published work.
* Point us toward appropriate grant funding opportunities and meetings where
we might present our work.
* Point us at interesting publications related to open energy data, open source
energy system modeling, how energy policy can be affected by better data, or
open source tools we should check out.
* Share your Jupyter notebooks and other analyses that use PUDL.
* `Hire Catalyst <https://catalyst.coop/hire-catalyst/>`__ to do analysis for
your organization using the PUDL data -- contract work helps us self-fund
ongoing open source development.
* Contribute code via
`pull requests <https://help.github.com/en/articles/about-pull-requests>`__.
See the :doc:`developer setup <dev/dev_setup>` for more details.
* And of course... we also appreciate
`financial contributions <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=PZBZDFNKBJW5E&source=url>`__.

.. seealso::

* :doc:`dev/dev_setup` for instructions on how to set up the PUDL
development environment.

-------------------------------------------------------------------------------
Find us on GitHub
-------------------------------------------------------------------------------
Github is the primary platform we use to manage the project, integrate
contributions, write and publish documentation, answer user questions, automate
testing & deployment, etc.
`Signing up for a GitHub account <https://github.com/join>`__
(even if you don't intend to write code) will allow you to participate in
online discussions and track projects that you're interested in.

Asking (and answering) questions is a valuable contribution! As noted in `How to
support open-source software and stay sane
<https://www.nature.com/articles/d41586-019-02046-0>`__, it's much more efficient to
ask and answer questions in a public forum because then other users and contributors
who are having the same problem can find answers without having to re-ask the same
question. The forum we're using is our `Github discussions
<https://github.com/catalyst-cooperative/discussions>`__.

Even if you feel like you have a basic question, we want you to feel
comfortable asking for help in public -- we (Catalyst) only recently came to
this data work from being activists and policy wonks -- so it's easy for us to
remember when it all seemed frustrating and alien! Sometimes it still does. We
want people to use the software and data to do good things in the world. We
want you to be able to access it. Using a public forum also enables the
community of users to help each other!

Don't hesitate to post a discussion with a `feature request
<https://github.com/catalyst-cooperative/discussions/categories/ideas>`__,
a pointer to energy data that needs liberating, or a reference to documentation
that's out of date, unclear, or missing. Understanding how people are using the
software, and how they would *like* to be using the software, is very valuable and
will help us make it more useful and usable.

-------------------------------------------------------------------------------
Our design process
-------------------------------------------------------------------------------

We do our technical design out in the open, so that community members can weigh
in. Here's the process we usually follow:

1. Someone has a problem they'd like to solve. They post in the `Ideas
<https://github.com/orgs/catalyst-cooperative/discussions/categories/ideas>`__
forum with their problem and some context.

2. Discussion ensues.

3. When the open questions are answered, we create an issue from the discussion,
which holds the conclusions of the discussion.
4 changes: 2 additions & 2 deletions src/pudl/metadata/resources/ferc1_eia_record_linkage.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
Because generators are often owned by multiple utilities, another dimension of
this plant part table involves generating two records for each owner: one for the
portion of the plant part they own and one for the plant part as a whole. The
portion records are labeled in the "ownership_record_type" column as "owned"
and the total records are labeled as "total".
portion records are labeled in the ``ownership_record_type`` column as ``owned``
and the total records are labeled as ``total``.
This table includes A LOT of duplicative information about EIA plants. It is primarily
meant for use as an input into the record linkage between FERC1 plants and EIA.""",
Expand Down

0 comments on commit 3d2c405

Please sign in to comment.