Skip to content

Commit

Permalink
Edit documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
AntonOresten committed Mar 23, 2024
1 parent 5798869 commit 168bd98
Show file tree
Hide file tree
Showing 4 changed files with 90 additions and 5 deletions.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ Conflux.jl is a toolkit designed to enable data parallelism for [Flux.jl](https:

See the documentation for more details, examples, and important caveats.

## Installation

The package can be installed with the Julia package manager. From the Julia REPL, type `]` to enter the Pkg REPL mode and run:

```julia
pkg> add https://github.com/MurrellGroup/Conflux.jl#main
```

## Example usage

```julia
Expand Down
3 changes: 3 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,16 @@ makedocs(;
modules=[Conflux],
authors="Anton Oresten <anton.oresten42@gmail.com>",
sitename="Conflux.jl",
doctest=false,
format=Documenter.HTML(;
edit_link="main",
assets=String[],
),
pages=[
"Home" => "index.md",
"API Reference" => "API.md",
],
checkdocs=:all,
)

deploydocs(
Expand Down
79 changes: 74 additions & 5 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,80 @@ CurrentModule = Conflux

# Conflux

Documentation for [Conflux](https://github.com/anton083/Conflux.jl).
[![Latest Release](https://img.shields.io/github/release/MurrellGroup/Conflux.jl.svg)](https://github.com/MurrellGroup/Conflux.jl/releases/latest)
[![MIT license](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/license/MIT)
[![Documentation](https://img.shields.io/badge/docs-stable-blue.svg)](https://MurrellGroup.github.io/Conflux.jl/stable/)
[![Documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://MurrellGroup.github.io/Conflux.jl/dev/)
[![Status](https://github.com/MurrellGroup/Conflux.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/MurrellGroup/Conflux.jl/actions/workflows/CI.yml?query=branch%3Amain)
[![Coverage](https://codecov.io/gh/MurrellGroup/Conflux.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/MurrellGroup/Conflux.jl)

```@index
```
Conflux.jl is a toolkit designed to enable data parallelism for [Flux.jl](https://github.com/FluxML/Flux.jl) models by simplifying the process of replicating them across multiple GPUs on a single node, and by leveraging [NCCL.jl](https://github.com/JuliaGPU/NCCL.jl) for efficient inter-GPU communication. This package aims to provide a straightforward and intuitive interface for multi-GPU training, requiring minimal changes to existing code and training loops.

## Features

- Easy replication of objects across multiple GPUs with the **replicate** function
- Efficient synchronization of models and averaging of gradients with the **allreduce!** function, which takes an operator (e.g. `+`, `*`, `avg`) and a set of replicas, and reduces all their parameters with the given operator, leaving the replicas identical.
- A **withdevices** function that allows you to run code on each device asynchronously.

See the documentation for more details, examples, and important caveats.

## Installation

```@autodocs
Modules = [Conflux]
The package can be installed with the Julia package manager. From the Julia REPL, type `]` to enter the Pkg REPL mode and run:

```julia
pkg> add https://github.com/MurrellGroup/Conflux.jl#main
```

## Example usage

```julia
# Specify the default devices to use
ENV["CUDA_VISIBLE_DEVICES"] = "0,1"

using Conflux

using Flux, Optimisers

model = Chain(Dense(1 => 256, tanh), Dense(256 => 512, tanh), Dense(512 => 256, tanh), Dense(256 => 1))

# This will use the available devices. If you want to use a specific device, you can pass them in a second argument.
models = replicate(model)

opt = Optimisers.Adam(0.001f0)

# Instantiate the optimiser states on each device
states = Conflux.withdevices() do (i, device)
Optimisers.setup(opt, model) |> device
end

# A single batch, stored on CPU. Could use a more sophisticated mechanism to distribute multiple batches.
X = rand(1, 16)
Y = X .^ 2

loss(y, Y) = sum(abs2, y .- Y)

losses = []
for epoch in 1:10
# Get the gradients for each batch on each device
∇models = Conflux.withdevices() do (i, device)
x, y = device(X), device(Y)
# The second return value is a tuple because `Flux.withgradient` takes `args...`, and the model is the first argument.
l, (∇model,) = Flux.withgradient(m -> loss(m(x), y), models[i])
push!(losses, l)
∇model
end

# Average the gradients across devices
allreduce!(avg, ∇models...)

# Update the models on each device
Conflux.withdevices() do (i, device)
Optimisers.update!(states[i], models[i], ∇models[i])
end

# Optionally synchronize the models and optimiser states, in case the parameters diverge
#allreduce!(avg, models...)
#allreduce!(avg, states...)
end
```
5 changes: 5 additions & 0 deletions docs/src/reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Reference

```@autodocs
Modules = [Conflux]
```

0 comments on commit 168bd98

Please sign in to comment.