Skip to content

Commit

Permalink
Add documentation information
Browse files Browse the repository at this point in the history
  • Loading branch information
joecummings authored Apr 28, 2024
1 parent a94803f commit 40f3575
Showing 1 changed file with 53 additions and 51 deletions.
104 changes: 53 additions & 51 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Contributing to TorchTune
# Contributing to torchtune
We want to make contributing to this project as easy and transparent as possible.

 

## Dev install
In order to contribute to Torchtune, you should first fork
You should first [fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo) the torchtune repository
and then clone your forked repository.

```git clone https://github.com/<YOUR_GITHUB_USER>/torchtune.git```
Expand All @@ -18,8 +18,19 @@ pip install -e ".[dev]"

&nbsp;

## Contributing workflow
We actively welcome your pull requests.

1. Create your new branch from `main` in your forked repo, with a name describing the work you're completing e.g. `add-feature-x`.
2. If you've added code that should be tested, add tests. Ensure all tests pass. See the [testing section](#testing) for more information.
3. If you've changed APIs, [update the documentation](#updating-documentation).
4. Make sure your [code lints](#coding-style).
5. If you haven't already, complete the [Contributor License Agreement ("CLA")](#contributor-license-agreement-cla)

&nbsp;

## Testing
TorchTune contains three different types of tests: unit tests, recipe test, and regression tests. These tests are distinguished by their complexity and the resources they require to run. Recipe tests and regression tests are explicitly marked via pytest.mark decorators and both require S3 access to download the requisite assets.
torchtune contains three different types of tests: unit tests, recipe tests, and regression tests. These tests are distinguished by their complexity and the resources they require to run. Recipe tests and regression tests are explicitly marked via pytest.mark decorators and both require S3 access to download the requisite assets.

- **Unit tests**
- These should be minimal tests runnable without remote access. (No large models, no downloading weights). Unit tests should be under [tests/torchtune](https://github.com/pytorch/torchtune/tree/main/tests/torchtune).
Expand All @@ -35,52 +46,43 @@ TorchTune contains three different types of tests: unit tests, recipe test, and
- Regression tests are found under [tests/regression_tests](https://github.com/pytorch/torchtune/tree/main/tests/regression_tests) and should be marked with the `@pytest.mark.slow_integration_test` decorator.
- To run only regression tests, you can use the command `pytest tests -m slow_integration_test`.

Whenever running tests in TorchTune, favor using the command line flags as much as possible (e.g. run `pytest tests -m integration_test` over `pytest tests/recipes`). This is because (a) the default behavior is to run unit tests only (so you will miss recipe tests without the flag), and (b) using the flags ensures pytest will automatically download any remote assets needed for your test run.
Whenever running tests in torchtune, favor using the command line flags as much as possible (e.g. run `pytest tests -m integration_test` over `pytest tests/recipes`). This is because (a) the default behavior is to run unit tests only (so you will miss recipe tests without the flag), and (b) using the flags ensures pytest will automatically download any remote assets needed for your test run.

Note that the above flags can be combined with other pytest flags, so e.g. `pytest tests -m integration_test -k 'test_loss'` will run only recipe tests matching the substring `test_loss`.

&nbsp;

## Pull Requests
We actively welcome your pull requests.
## Updating documentation
Each API and class should be clearly documented. Well-documented code is easier to review and understand/extend. All documentation is contained in the [docs directory](docs/source):

1. Fork the repo and create your branch from `main`.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation.
4. Ensure the test suite passes.
5. Make sure your code lints.
6. If you haven't already, complete the Contributor License Agreement ("CLA").
* All files following the pattern `api_ref_*` document top-level APIs.
* All files under the [deep dives directory](docs/source/deep_dives) contain "deep-dive" tutorials
* All files under the [tutorials directory](docs/source/tutorials) contain regular tutorials

&nbsp;
Documentation is written in [RST](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html) format.

## Coding Style
`torchtune` uses pre-commit hooks to ensure style consistency and prevent common mistakes. Enable it by:
### Adding a new class/method to the API References
Once you've added an API that is meant to be exposed publically, you should add it to the appropriate rst file. For example, any new API within the [configs/](torchtune/configs)
directory should be added to `api_ref_configs.rst`, [data/](torchtune/data) should be added to `api_ref_data.rst`, [datasets](torchtune/datasets) should be added to
`api_ref_datasets.rst`, and so on. To add, it's as simple as adding the name of the exposed API somewhere in the appropriate RST file.

```
pre-commit install
```
All code written within the docstring of the class or method will be correctly rendered there.

After this pre-commit hooks will be run before every commit.
> Note: Our RST theme expects code to be specified using double backticks instead of single. Eg: ``hidden_dim``. Single backticks will be rendered as italics instead of as "code".
You can also run this manually on every file using:
### Building docs

```
pre-commit run --all-files
```
All documentation is built for each PR and contains a preview on the PR. However, this takes awhile (~8 minutes) and you should first build docs from your local machine.

&nbsp;
From the [docs/](docs) directory:

## Build docs

From the `docs` folder:

Install dependencies:
1. Install dependencies:

```
pip install -r requirements.txt
```

Then:
2. Run make command:

```
make html
Expand All @@ -94,9 +96,7 @@ make html`.

If the doc build starts failing for a weird reason, try `make clean`.

&nbsp;

#### Iterate and Serve docs locally
#### Serving docs locally (if building from a GPU env)

If you're developing locally, you can just open the generated `index.html` file in your browser.

Expand All @@ -118,31 +118,33 @@ Now, you can navigate to `localhost:9000` on your local machine to view the rend

&nbsp;

## Best Practices
## Coding Style
`torchtune` uses pre-commit hooks to ensure style consistency and prevent common mistakes. Enable it by:

This section captures some best practices for contributing code to TorchTune. Following these will make PR reviews easier.
```
pre-commit install
```

### Code
- Modular Blocks instead of Monolithic Classes. Stuffing all of the logic into a single class limits readability and makes it hard to reuse logic. Think about breaking the implementation into self-contained blocks which can be used independently from a given model. For example, attention mechanisms, embedding classes, transformer layers etc.
- Say no to Implementation Inheritance. You really don’t need it AND it makes the code much harder to understand or refactor since the logic is spread across many files/classes. Where needed, consider using Protocols.
- Clean Interfaces. There’s nothing more challenging than reading through functions/constructors with ~100 parameters. Think carefully about what needs to be exposed to the user and don’t hesitate to hard-code parameters until there is a need to make them configurable.
- Intrusive Configs. Config objects should not intrude into the class implementation. Configs should interact with these classes through cleanly defined builder functions which convert the config into flat parameters needed to instantiate an object.
- Limit Generalization. Attempting to generalize code before this is needed unnecessarily complicates implementations - you are anticipating use cases you don’t know a lot about. When you actually need to generalize a component, think about whether it’s worth it to complicate a given interface to stuff in more functionality. Don’t be afraid of code duplication if it makes things easier to read.
- Value Checks and Asserts. Don’t check values in higher level modules - defer the checks to the modules where the values are actually used. This helps reduce the number of raise statements in code which generally hurts readability, but are critical for correctness.
After this pre-commit hooks will be run before every commit.

### Docstrings
You can also run this manually on every file using:

Each API and class should be clearly documented. Well-documented code is easier to review and understand/extend.
```
pre-commit run --all-files
```

- TorchTune docs are written in rst, and the pytorch-sphinx-theme expects code to be specified using double backticks instead of single. Eg: ``hidden_dim``. Single backticks will be rendered as italics instead of as "code".
- For parameters that have a default value, specify that they're optional in the docstring.
&nbsp;

### Tests
## Best Practices

Every API and class should also have well-defined Tests. TorchTune uses PyTest for testing. TODO: Link to testing README when this is ready.
This section captures some best practices for contributing code to torchtune. Following these will make PR reviews easier.

- Use PyTest's autouse fixture to prevent the RNG of each test to leak into the other tests.
- Small comments about what a test is doing and what it's checking go a long way.
- **Modular Blocks instead of Monolithic Classes**. Stuffing all of the logic into a single class limits readability and makes it hard to reuse logic. Think about breaking the implementation into self-contained blocks which can be used independently from a given model. For example, attention mechanisms, embedding classes, transformer layers etc.
- **Say no to Implementation Inheritance**. You really don’t need it AND it makes the code much harder to understand or refactor since the logic is spread across many files/classes. Where needed, consider using Protocols.
- **Clean Interfaces**. There’s nothing more challenging than reading through functions/constructors with ~100 parameters. Think carefully about what needs to be exposed to the user and don’t hesitate to hard-code parameters until there is a need to make them configurable.
- **Intrusive Configs**. Config objects should not intrude into the class implementation. Configs should interact with these classes through cleanly defined builder functions which convert the config into flat parameters needed to instantiate an object.
- **Limit Generalization**. Attempting to generalize code before this is needed unnecessarily complicates implementations - you are anticipating use cases you don’t know a lot about. When you actually need to generalize a component, think about whether it’s worth it to complicate a given interface to stuff in more functionality. Don’t be afraid of code duplication if it makes things easier to read.
- **Value Checks and Asserts**. Don’t check values in higher level modules - defer the checks to the modules where the values are actually used. This helps reduce the number of raise statements in code which generally hurts readability, but are critical for correctness.

&nbsp;

Expand All @@ -154,7 +156,7 @@ Meta has a [bounty program](https://www.facebook.com/whitehat/) for the safe dis
&nbsp;

## License
By contributing to TorchTune, you agree that your contributions will be licensed under the LICENSE file in the root directory of this source tree.
By contributing to torchtune, you agree that your contributions will be licensed under the LICENSE file in the root directory of this source tree.

&nbsp;

Expand Down

0 comments on commit 40f3575

Please sign in to comment.