[nnx] grad accepts argnums #3798

cgarciae · 2024-03-29T23:15:36Z

What does this PR do?

nnx.grad now accepts argnums and multiple graph nodes can be passed.

codecov-commenter · 2024-04-01T19:43:25Z

Codecov Report

Attention: Patch coverage is 93.42105% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 60.51%. Comparing base (cc740d4) to head (3bebf97).

Files	Patch %	Lines
flax/experimental/nnx/nnx/transforms.py	83.33%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3798      +/-   ##
==========================================
+ Coverage   60.34%   60.51%   +0.17%     
==========================================
  Files         101      101              
  Lines       12862    12908      +46     
==========================================
+ Hits         7761     7811      +50     
+ Misses       5101     5097       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

chiamp · 2024-04-01T22:49:11Z

flax/experimental/nnx/tests/test_transforms.py

+
+  def test_multiple_graph_nodes(self):
+    rngs = nnx.Rngs(0)
+    m1 = nnx.Linear(2, 3, rngs=rngs)
+    m2 = nnx.Linear(3, 3, rngs=rngs)
+    loss_fn = lambda m1, m2, x, y: jnp.mean((m2(m1(x)) - y) ** 2)
+    grad_fn = nnx.grad(loss_fn, argnums=(0, 1), wrt=nnx.Param)
+    x = jax.random.uniform(rngs(), (1, 2))
+    y = jnp.ones((1, 3))
+    grads_m1, grads_m2 = grad_fn(m1, m2, x, y)
+
+    assert 'kernel' in grads_m1
+    assert grads_m1.kernel.raw_value.shape == (2, 3)
+    assert 'bias' in grads_m1
+    assert grads_m1.bias.raw_value.shape == (3,)
+    assert 'kernel' in grads_m2
+    assert grads_m2.kernel.raw_value.shape == (3, 3)
+    assert 'bias' in grads_m2
+    assert grads_m2.bias.raw_value.shape == (3,)
+
+  def test_multiple_graph_nodes_mix_positions(self):
+    rngs = nnx.Rngs(0)
+    m1 = nnx.Linear(2, 3, rngs=rngs)
+    m2 = nnx.Linear(3, 3, rngs=rngs)
+    loss_fn = lambda x, m1, y, m2: jnp.mean((m2(m1(x)) - y) ** 2)
+    grad_fn = nnx.grad(loss_fn, argnums=(1, 3), wrt=nnx.Param)
+    x = jax.random.uniform(rngs(), (1, 2))
+    y = jnp.ones((1, 3))
+    grads_m1, grads_m2 = grad_fn(x, m1, y, m2)
+
+    assert 'kernel' in grads_m1
+    assert grads_m1.kernel.raw_value.shape == (2, 3)
+    assert 'bias' in grads_m1
+    assert grads_m1.bias.raw_value.shape == (3,)
+    assert 'kernel' in grads_m2
+    assert grads_m2.kernel.raw_value.shape == (3, 3)
+    assert 'bias' in grads_m2
+    assert grads_m2.bias.raw_value.shape == (3,)


Suggested change

def test_multiple_graph_nodes(self):

rngs = nnx.Rngs(0)

m1 = nnx.Linear(2, 3, rngs=rngs)

m2 = nnx.Linear(3, 3, rngs=rngs)

loss_fn = lambda m1, m2, x, y: jnp.mean((m2(m1(x)) - y) ** 2)

grad_fn = nnx.grad(loss_fn, argnums=(0, 1), wrt=nnx.Param)

x = jax.random.uniform(rngs(), (1, 2))

y = jnp.ones((1, 3))

grads_m1, grads_m2 = grad_fn(m1, m2, x, y)

assert 'kernel' in grads_m1

assert grads_m1.kernel.raw_value.shape == (2, 3)

assert 'bias' in grads_m1

assert grads_m1.bias.raw_value.shape == (3,)

assert 'kernel' in grads_m2

assert grads_m2.kernel.raw_value.shape == (3, 3)

assert 'bias' in grads_m2

assert grads_m2.bias.raw_value.shape == (3,)

def test_multiple_graph_nodes_mix_positions(self):

rngs = nnx.Rngs(0)

m1 = nnx.Linear(2, 3, rngs=rngs)

m2 = nnx.Linear(3, 3, rngs=rngs)

loss_fn = lambda x, m1, y, m2: jnp.mean((m2(m1(x)) - y) ** 2)

grad_fn = nnx.grad(loss_fn, argnums=(1, 3), wrt=nnx.Param)

x = jax.random.uniform(rngs(), (1, 2))

y = jnp.ones((1, 3))

grads_m1, grads_m2 = grad_fn(x, m1, y, m2)

assert 'kernel' in grads_m1

assert grads_m1.kernel.raw_value.shape == (2, 3)

assert 'bias' in grads_m1

assert grads_m1.bias.raw_value.shape == (3,)

assert 'kernel' in grads_m2

assert grads_m2.kernel.raw_value.shape == (3, 3)

assert 'bias' in grads_m2

assert grads_m2.bias.raw_value.shape == (3,)

@parameterized.parameters(

{'loss_fn': lambda m1, m2, x, y: jnp.mean((m2(m1(x)) - y) ** 2), 'argnums': (0, 1)},

{'loss_fn': lambda x, m1, y, m2: jnp.mean((m2(m1(x)) - y) ** 2), 'argnums': (1, 3)},

)

def test_multiple_graph_nodes(self, loss_fn, argnums):

rngs = nnx.Rngs(0)

m1 = nnx.Linear(2, 3, rngs=rngs)

m2 = nnx.Linear(3, 3, rngs=rngs)

grad_fn = nnx.grad(loss_fn, argnums=argnums, wrt=nnx.Param)

x = jax.random.uniform(rngs(), (1, 2))

y = jnp.ones((1, 3))

grads_m1, grads_m2 = grad_fn(m1, m2, x, y)

assert 'kernel' in grads_m1

assert grads_m1.kernel.raw_value.shape == (2, 3)

assert 'bias' in grads_m1

assert grads_m1.bias.raw_value.shape == (3,)

assert 'kernel' in grads_m2

assert grads_m2.kernel.raw_value.shape == (3, 3)

assert 'bias' in grads_m2

assert grads_m2.bias.raw_value.shape == (3,)

The two tests seem similar enough that we could combine them using parameterized.parameters

I think you can do it but the suggested implementation is not quite right, you also need to parametrize argnums and loss_fn is being redefined.

ah yes good catch, I fixed it now

still doesn't work because you need to pass arguments to grad_fn in different positions.

merging for now

cgarciae force-pushed the nnx-improve-grad branch 3 times, most recently from e6253a9 to 7321765 Compare March 30, 2024 10:14

cgarciae marked this pull request as ready for review March 30, 2024 10:15

cgarciae force-pushed the nnx-static-goes-first branch from fc6c901 to 5f99f11 Compare March 30, 2024 23:34

cgarciae force-pushed the nnx-improve-grad branch from 7321765 to a670e4d Compare March 30, 2024 23:36

cgarciae force-pushed the nnx-static-goes-first branch from 5f99f11 to 8189596 Compare April 1, 2024 17:03

cgarciae force-pushed the nnx-improve-grad branch from a670e4d to efcbbcc Compare April 1, 2024 17:10

cgarciae force-pushed the nnx-static-goes-first branch from 8189596 to 1739c52 Compare April 1, 2024 18:22

cgarciae force-pushed the nnx-improve-grad branch from efcbbcc to 4723528 Compare April 1, 2024 18:23

Base automatically changed from nnx-static-goes-first to main April 1, 2024 19:26

cgarciae force-pushed the nnx-improve-grad branch from 4723528 to 3bebf97 Compare April 1, 2024 19:37

chiamp reviewed Apr 1, 2024

View reviewed changes

chiamp approved these changes Apr 1, 2024

View reviewed changes

[nnx] grad accepts argnums

febc1c7

cgarciae force-pushed the nnx-improve-grad branch from 3bebf97 to febc1c7 Compare April 1, 2024 22:57

cgarciae added the pull ready label Apr 2, 2024

copybara-service bot merged commit 2b257b4 into main Apr 2, 2024
21 checks passed

copybara-service bot deleted the nnx-improve-grad branch April 2, 2024 11:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nnx] grad accepts argnums #3798

[nnx] grad accepts argnums #3798

cgarciae commented Mar 29, 2024 •

edited

Loading

codecov-commenter commented Apr 1, 2024 •

edited

Loading

chiamp Apr 1, 2024 •

edited

Loading

cgarciae Apr 1, 2024 •

edited

Loading

chiamp Apr 2, 2024

cgarciae Apr 2, 2024

cgarciae Apr 2, 2024

[nnx] grad accepts argnums #3798

[nnx] grad accepts argnums #3798

Conversation

cgarciae commented Mar 29, 2024 • edited Loading

What does this PR do?

codecov-commenter commented Apr 1, 2024 • edited Loading

Codecov Report

chiamp Apr 1, 2024 • edited Loading

Choose a reason for hiding this comment

cgarciae Apr 1, 2024 • edited Loading

Choose a reason for hiding this comment

chiamp Apr 2, 2024

Choose a reason for hiding this comment

cgarciae Apr 2, 2024

Choose a reason for hiding this comment

cgarciae Apr 2, 2024

Choose a reason for hiding this comment

cgarciae commented Mar 29, 2024 •

edited

Loading

codecov-commenter commented Apr 1, 2024 •

edited

Loading

chiamp Apr 1, 2024 •

edited

Loading

cgarciae Apr 1, 2024 •

edited

Loading