Note
If you are developing on Stanford's sapling cluster, instead see the instructions here. If you don't know what this means, you're not using sapling so you should just continue reading.
-
FlexFlow Train uses nix to manage dependencies and the development environment. There exist a number of ways to install nix, but we recommend one of the following:
-
If you have root permissions: DeterminateSystems/nix-installer
-
If you don't have root permissions: DavHau/nix-portable. Note that nix-portable does not work particularly well if the nix store is in NFS1 or other distributed file systems, so if you are running on an HPC cluster where the home directory is mounted via a distributed file system we recommend setting the
NP_LOCATION
environment to/tmp
or some other non-NFS location.While you should at least skim nix-portable's setup instructions, you'll probably end up doing something like this:
$ USERBIN="${XDG_BIN_HOME:-$HOME/.local/bin}" $ wget 'https://github.com/DavHau/nix-portable/releases/download/v010/nix-portable' -O "$USERBIN/nix-portable" ... $ chmod u+x "$USERBIN/nix-portable" ... $ ln -sf "$USERBIN/nix-portable" "$USERBIN/nix" ... $ echo 'export PATH=$USERBIN:$PATH' >> ~/.bashrc ...
Now if everything is setup properly, you should be able to see something like the following (don't worry if the version number is slightly different) if you run
nix --version
:$ nix --version nix (Nix) 2.20.6
-
- Clone the FlexFlow Train repository (or, if you'd prefer, follow the alternative setup instructions in the ff-dev section)
$ FF_DIR="$HOME/flexflow-train" # or wherever else you want to put the repository
$ git clone --recursive git@github.com:flexflow/flexflow-train.git "$FF_DIR"
...
- Enter the nix-provided
default
development environment2
$ cd "$FF_DIR"
$ nix develop --accept-flake-config
- Build and run the non-GPU-required tests (systems that have access to CUDA GPUs can also run the GPU-mandatory tests by following the instructions here)
(ff) $ proj cmake
...
(ff) $ proj test --skip-gpu-tests
...
If everything is correctly configured, you should see a bunch of build messages followed by something like
(ff) $ proj test --skip-gpu-tests
421/421 Test #441: get_transformer_computation_graph
100% tests passed, 0 tests failed out of 421
Label Time Summary:
compiler-tests = 6.13 sec*proc (19 tests)
local-execution-tests = 0.13 sec*proc (3 tests)
models-tests = 0.05 sec*proc (4 tests)
op-attrs-tests = 0.48 sec*proc (59 tests)
pcg-tests = 0.33 sec*proc (33 tests)
substitution-generator-tests = 0.06 sec*proc (2 tests)
substitutions-tests = 0.10 sec*proc (9 tests)
utils-tests = 1.20 sec*proc (293 tests)
Total Test time (real) = 8.64 sec
If you don't, or if you see any tests failing, please double check that you have followed the instructions above. If you have and are still encountering an issue, please contact us with a detailed description of your platform and the commands you have run.
If you are developing on a machine with one or more CUDA GPUs, you can also run the tests that require a GPU by entering the gpu
devshell instead of the default
devshell:
$ NIXPKGS_ALLOW_UNFREE=1 nix develop .#gpu --accept-flake-config --impure
and then running
(ff) $ proj test
...
You should see the additional GPU tests run. If you instead see a message like
Error: ... Pass --skip-gpu-tests to skip running tests that require a GPU
Double check that you are correctly in the gpu
devshell, not the default
devshell.
If you've confirmed that you are in the correct devshell and are still encountering issues, contact us
with a detailed description of your platform and the commands you have run.
Many of the FlexFlow Train developers use an additional set of scripts called ff-dev to automate many common git operations associated with FlexFlow Train development.
To setup ff-dev, run TODO (tracked in #1573).
If you installed nix system-wide (e.g., using DeterminateSystems/nix-installer),
you can use direnv to automatically enter the FlexFlow Train development environment when you cd
into the repository, rather
than having to manually run nix develop
.
direnv will also automatically exit the environment when you cd
out of the repository, and (if configured using nix-direnv) will even automatically reload the environment if the flake.nix
file changes.
You can find the installation instructions for direnv here, and if you would like automatic environment reloading you can also install nix-direnv using the instructions here.
Once you have direnv (and optionally nix-direnv) installed, cd into the root of your cloned FlexFlow Train repository and run
$ echo 'use flake . --accept-flake-config' > .envrc
You should see a message that the .envrc
file you just created is blocked.
Run the command shown in the error message (i.e., direnv allow
), and direnv should automatically place you in the environment.
For more information on using direnv with nix, see here.
Most operations you'll want to perform while developing FlexFlow Train are provided through a small python utility called proj.
proj
is automatically pulled in by nix when you enter the dev shell, so you should be able to run
(ff) $ proj -h
and see the full list of operations that proj
supports.
proj
commands can be run from anywhere in the repository (i.e., they do not have to be run from the root).
To help you get started, however, a list of common command invocations is included here:
- To build FlexFlow Train:
(ff) $ proj build
- To build and run FlexFlow Train tests (without a GPU):
(ff) $ proj test --skip-gpu-tests
- To build and run FlexFlow Train tests (with a GPU):
(ff) $ proj test
- To regenerate CMake files (necessary anytime you switch branches or modify the CMake source. If you're ever running into weird build issues, try running this and see if it fixes things):
(ff) $ proj cmake
- To format all of the FlexFlow Train sources files:
(ff) $ proj format
- To build the FlexFlow Train Doxygen docs:
You can also add the
(ff) $ proj doxygen
--browser
command to automatically open the built docs in your default browser if you are working on your local machine.
The bulk of the FlexFlow source code is stored in the following folders:
lib
: The C++ code that makes up FlexFlow's core, split up into a number of libraries. You can find a description of each library here.bin
: Command-line interfaces for FlexFlow and associated tools (all in C++). Generally, these are just thin wrappers that parse command-line arguments and then call out to functions defined inlib
for the actual processing/logic. You can find a description of each binary here.bindings
: Python (or any additional languages added in the future) bindings for FlexFlow Traindocs
: Config files for documentation generators and code for generating diagrams. The actual documentation itself is included in the source directories/files as either.md
files or inline in the language's documentation syntax (i.e., Doxygen for C++ and Sphinx for Python).cmake
: CMake configuration for building FlexFlow Train. Note that unless you're modifying the build configuration (i.e., adding a library, additional dependencies, etc.), you generally should use proj instead of interacting with CMake directly.deps
: Third-party dependencies included as submodules. Note that since FlexFlow Train moved to nix for managing dependencies many (but not all) of these are used in the default configuration.
We currently implement CI testing using Github Workflows. Each workflow is defined by its corresponding YAML file in the .github/workflows folder of the repo. We currently have the following workflows:
tests
: Builds and runs GPU and non-GPU unit tests for all of the code underlib
andbin
. Also uploads coverage numbers to codecov.io.clang-format-check.yml
: ensures that the source code is properly formatted usingclang-format
. To format your code locally, runproj format
(see here for more information onproj
).shell-check.yml
: runs shellcheck on all bash scripts in the repo.
GPU machines for CI are managed using runs-on.
We actively welcome your pull requests. Note that we may already be working on the feature/fix you're looking for, so we suggest searching through the open issues, open PRs, and contacting us to make sure you're not duplicating existing effort!
The steps for getting changes merged into FlexFlow are relatively standard:
- Fork the repo and either create a new branch based on
master
, or just modifymaster
directly. - If you've added code that should be tested, add tests. The process for adding tests for code under
lib
is documented here. Adding tests for other parts of the code is currently undocumented, so you will contact us for information on how to do it. - Ensure the code builds (i.e., run
proj build
). - Ensure the test suite passes (i.e., run
proj test
). - Format the code (i.e., run
proj format
). - Create a new PR from your modified branch to the
master
branch in FlexFlow Train. Provide a brief description of the changes you've made and link any related/closed issues.
Code review is done using Reviewable. If you haven't used Reviewable before, please read through (or at least skim) the "Reviews" section of the Reviewable documentation.
Either create an issue or join the FlexFlow Zulip instance. For any reported bugs, please ensure that your description clear and has sufficient information for us to reproduce the issue.
By contributing to FlexFlow Train, you agree that your contributions will be licensed under the LICENSE file in the root directory of this source tree.
Footnotes
-
aka "dev shell" ↩