Fix vae #143

jheek · 2020-03-30T12:32:12Z

No description provided.

AlexeyG

Thanks for fixing the rng bug, @jheek !

I left a bunch of comments, most of which aren't actually related to your improvements, but are rather requests for more improvements of the example. I'm happy to leave them for a later PR, if you prefer.

I also noticed that result this VAE gets doesn't actually match the paper. We should be getting ~100, but I am not sure after how many training epochs.

AlexeyG · 2020-03-30T18:35:40Z

examples/vae/main.py

+  return {
+      'bce': jnp.mean(bce_loss),
+      'kld': jnp.mean(kld_loss),
+      'loss': jnp.mean(bce_loss + kld_loss)


Is it not better (in terms of op count) to sum the averaged bce and kld loss? Although it doesn't really matter on such a small-scale example.

AlexeyG · 2020-03-30T18:40:42Z

examples/vae/main.py

-  'learning_rate', default=1e-3,
-  help=('The leanring rate for the Adam optimizer')
+    'learning_rate', default=1e-3,
+    help=('The leanring rate for the Adam optimizer')


Add a dot at the end of the help string. Here and in other flags.

AlexeyG · 2020-03-30T18:44:29Z

examples/vae/main.py

@@ -127,10 +124,10 @@ def loss_fn(model):


 @jax.jit
-def eval(model, eval_ds, z):
+def eval(model, eval_ds, z, z_rng):
  xs = eval_ds['image'] / 255.0


Should this not be a part of the input pipeline?

also part of the input pipeline now

AlexeyG · 2020-03-30T19:08:21Z

examples/vae/main.py

  for epoch in range(FLAGS.num_epochs):
    for batch in tfds.as_numpy(train_ds):
+      rng, key = random.split(rng)
      batch['image'] = batch['image'].reshape(-1, 784) / 255.0


Why are images flattened and then unflattened again in eval? If flattening serves a purpose, consider moving the first flattening op into the IO pipeline

flattening is now part of the input pipeline

AlexeyG · 2020-03-30T19:10:04Z

examples/vae/main.py

  for epoch in range(FLAGS.num_epochs):
    for batch in tfds.as_numpy(train_ds):
+      rng, key = random.split(rng)


What's the reason for manually handling key splitting instead of using the stochastic context? Using the latter would showcase more of Flax's features.

I like to use the simple and explicit Jax APIs as much as possible. nn.stochastic was introduced to avoid having to pass around rngs through a complex model. But within a single function I think manually splitting rngs is still better.

AlexeyG · 2020-03-30T19:19:17Z

examples/vae/main.py

  for epoch in range(FLAGS.num_epochs):
    for batch in tfds.as_numpy(train_ds):
+      rng, key = random.split(rng)
      batch['image'] = batch['image'].reshape(-1, 784) / 255.0


After dividing by 255. the data becomes "continuous". You need to dynamically binarise it- both for train and eval. Many VAEs report results on dynamically binarised MNIST and on statically binarised MNIST (some fixed dataset, which I believe isn't a part of TFDS).

Scrap half of the previous comment - TFDS does have the binarized mnist dataset. It's called binarized_mnist :)

What is your recommendation here? To eval on both?

Just stick to the binarized_mnist in training and eval. This should be equivalent to the "static MNIST" setup from VAE literature.

examples/vae/main.py

AlexeyG · 2020-03-30T19:22:36Z

examples/vae/README.md

@@ -12,6 +12,10 @@ python main.py



Git doesn't let me comment on parts that are not considered different. But could you please remove a semicolon after ## Examples ?

AlexeyG · 2020-03-30T19:24:32Z

examples/vae/main.py

  vae = nn.Model(VAE, params)

  optimizer = optim.Adam(learning_rate=FLAGS.learning_rate).create(vae)

+  rng, z_key, eval_rng = random.split(rng, 3)
+  z = random.normal(z_key, (64, 20))


64 -> FLAGS.batch_size ?

Can we also make 20 (latent dimensionality) a parameter? I don't expect it being used a lot, but giving it a name would improve readability IMO.

latents are a flag now 64 is because the visualization is 8x8

AlexeyG · 2020-03-30T19:36:01Z

examples/vae/main.py

  xs = eval_ds['image'] / 255.0
  xs = xs.reshape(-1, 784)
-  recon_xs, mean, logvar = model(xs)
+  recon_xs, mean, logvar = model(xs, z_rng)

  comparison = jnp.concatenate([xs[:8].reshape(-1, 28, 28, 1),


Return all samples+reconstructions and subsample+concat outside of eval before saving the image?

There are a lot of reconstructions (10,000) so I think pre-processing the data for the comparison image is not a bad idea

I didn't realise this. I thought it was going to be just the regular batch size.

- Fixes bug causing fixed noise during training - Switch to 30 epochs to better reproduce paper - Switch to binarized mnist - Add reconstruction image to the README.

AlexeyG

LGTM.

Thank you for re-working this example @jheek !

jheek requested a review from AlexeyG March 30, 2020 12:32

googlebot added the cla: yes label Mar 30, 2020

AlexeyG suggested changes Mar 30, 2020

View reviewed changes

AlexeyG added bug labels Mar 30, 2020

fix wrong hanging indents

d37d64a

jheek force-pushed the fix-vae branch from ab4addd to 95a79d3 Compare April 1, 2020 08:03

Fix VAE example

c969cf3

- Fixes bug causing fixed noise during training - Switch to 30 epochs to better reproduce paper - Switch to binarized mnist - Add reconstruction image to the README.

jheek force-pushed the fix-vae branch from 95a79d3 to c969cf3 Compare April 1, 2020 08:50

jheek requested a review from AlexeyG April 1, 2020 10:09

AlexeyG approved these changes Apr 1, 2020

View reviewed changes

jheek merged commit 2f69b48 into google:master Apr 1, 2020

jheek deleted the fix-vae branch April 1, 2020 11:50

avital mentioned this pull request Feb 20, 2023

Flax + Diffrax: Tracer error #2887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix vae #143

Fix vae #143

jheek commented Mar 30, 2020

AlexeyG left a comment

AlexeyG Mar 30, 2020

jheek Apr 1, 2020

AlexeyG Mar 30, 2020

jheek Apr 1, 2020

AlexeyG Mar 30, 2020

jheek Apr 1, 2020

AlexeyG Mar 30, 2020

jheek Apr 1, 2020

AlexeyG Mar 30, 2020

jheek Apr 1, 2020

AlexeyG Mar 30, 2020

AlexeyG Mar 30, 2020

jheek Mar 31, 2020

AlexeyG Mar 31, 2020

jheek Apr 1, 2020

AlexeyG Mar 30, 2020

AlexeyG Mar 30, 2020

AlexeyG Mar 30, 2020

jheek Apr 1, 2020

AlexeyG Mar 30, 2020

jheek Apr 1, 2020

AlexeyG Apr 1, 2020

AlexeyG left a comment

Fix vae #143

Fix vae #143

Conversation

jheek commented Mar 30, 2020

AlexeyG left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexeyG left a comment

Choose a reason for hiding this comment