Port Cifar Augmented pipeline and its nodes #247

shivaram · 2016-03-12T03:20:51Z

Fixes #242

shivaram · 2016-03-12T03:23:42Z

Some notes:

This lacks any unit tests right now. I wanted to get this out early to get some feedback on interfaces. I will work on unit tests soon
To do augmentation I was forced to use FunctionNode -- I don't think we have a better way to do this ? Also not being able to chain function nodes in the middle of a pipeline is a pain.
It'll be good to see if some of these APIs like RandomFlipper can be generalized like @ericmjonas mentioned in the other thread.

etrain · 2016-03-12T18:34:35Z

src/main/scala/evaluation/AugmentedExamplesEvaluator.scala

+import org.apache.spark.rdd.RDD
+import nodes.util.MaxClassifier
+
+object AugmentedExamplesEvaluator extends Serializable {


Instead of AugmentedExamplesEvaluator this seems like it could just as easily be called EnsembleEvaluator or something.

In the traditional ensemble setting, a hypothetical EnsembleEstimator could be an Estimator[T,Seq[V]] or if people wanted to construct ensembles themselves, they could use a Pipeline.gather to coallate the results of running several models on an input data item.

Here we have a case that's slightly different, but I think the basic idea still holds.

This is now tracked in #250 and #249

etrain · 2016-03-12T18:55:49Z

Alright - code-wise, this mostly looks good. The higher level question is how we want to represent this pipeline conceptually. Is it a flatmap with some key attached to the features that we can later group on (this is messy because it assumes labels and features have identical partitioning, etc.) or would we rather think of this in more general terms with some kind of ensembling. I think this is one of those situations where a whiteboard and an example pipeline might be useful, then we can decide if we have sufficient tools to represent that thing or not.

…r-augmenter

shivaram · 2016-03-12T20:02:29Z

Cool - We can chat about this on Monday. I don't think we have the right machinery to do this kind of augment-ensembles cleanly yet, but we can try to use this pipeline to design things

Vaishaal · 2016-03-14T06:43:24Z

src/main/scala/nodes/images/RandomFlipper.scala

+  */
+case class RandomFlipper(flipChance: Double, seed: Long = 12334L) extends Transformer[Image, Image] {
+
+  val rnd = new scala.util.Random(seed)


Apply scala.util.Random isn't serializable...

Caused by: java.io.NotSerializableException: scala.util.Random Serialization stack: - object not serializable (class: scala.util.Random, value: scala.util.Random@39652a30) - field (class: nodes.images.RandomPatcher, name: rnd, type: class scala.util.Random) - object (class nodes.images.RandomPatcher, <function1>) - field (class: nodes.images.RandomPatcher$$anonfun$apply$1, name: $outer, type: class nodes.images.RandomPatcher) - object (class nodes.images.RandomPatcher$$anonfun$apply$1, <function1>) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301) ... 22 more 16/03/13 23:33:37 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@192.168.10.51:48870] <- [ak

Uh - thats a pain. Using java.util.Random now

Also introduce a new API RandomImageTransformer

shivaram · 2016-03-14T17:41:46Z

I've fixed all the comments other the discussion about ensembling. I also introduced a new class RandomImageTransformer instead of the RandomFlipper per @ericmjonas 's request.

etrain · 2016-03-16T03:32:57Z

src/main/scala/pipelines/images/cifar/RandomPatchCifarAugmented.scala

+        ((unnormFilters(::, *) / (twoNorms + 1e-10)) * whitener.whitener.t, whitener)
+    }
+
+    val trainImagesAugmented = RandomImageTransformer(flipChance, ImageUtils.flipHorizontal).apply(


is this written this way because FunctionNode isn't a first-class citizen?

Yeah - you can't chain anything after function nodes right now

shivaram · 2016-03-21T00:12:43Z

Alright added unit tests for the two patcher nodes and one for flipHorizontal. I'm going to run the entire pipeline on the local cluster to make sure we get the same results we got for the paper. @etrain Let me know if you have any other comments on this.

etrain · 2016-03-21T01:27:19Z

src/main/scala/pipelines/images/cifar/RandomPatchCifarAugmented.scala

+    val trainImages = ImageExtractor(trainData)
+
+    val patchExtractor = new Windower(conf.patchSteps, conf.patchSize)
+      .andThen(ImageVectorizer.apply)


style nit - can we move the andThen to the end of the previous line and remove the spurious ., (, ) and apply here? I know you're copy/pasting old code, but I like that style better.

shivaram · 2016-03-21T16:43:59Z

Other than the performance problem with repeated convolutions, the test error was fine. I got

16/03/21 01:05:04 INFO RandomPatchCifarAugmented: Test error is: 0.1833

from using 5120 filters (i.e. 40k features)

etrain · 2016-03-21T16:46:10Z

src/main/scala/pipelines/images/cifar/RandomPatchCifarAugmented.scala

+
+    val unscaledFeaturizer = 
+      new Convolver(filters, augmentPatchSize, augmentPatchSize, numChannels, Some(whitener), true)
+        .andThen(SymmetricRectifier(alpha=conf.alpha))


The next ten or so lines have the old .andThen( syntax still.

Fixed all the andThens now

shivaram · 2016-03-21T22:17:08Z

@etrain the latest commit 202317e updates the other CIFAR / Mnist pipeline to also use the style described in #258 -- Tests pass, but it would still be good to get another pair of eyes on this commit

etrain · 2016-03-21T22:25:15Z

src/main/scala/pipelines/images/cifar/RandomCifar.scala


-    val predictionPipeline = featurizer andThen model andThen MaxClassifier
+    val predictionPipeline = featurizer andThen MaxClassifier


can we add the MaxClassfier to the end of what we're calling featurizer and change that name to predictionPipeline?

etrain · 2016-03-22T16:17:00Z

This LGTM, merging. Thanks @shivaram !

Port Cifar Augmented pipeline and its nodes

shivaram added 3 commits March 11, 2016 19:07

Add CIFAR augmentation pipeline and nodes for it

78e149e

Add random flipper to pipeline

d719bd8

Add evaluator for augmented pipeline

886fc87

etrain reviewed Mar 12, 2016
View reviewed changes

Merge branch 'master' of https://github.com/amplab/keystone into cifa…

fe17dfe

…r-augmenter

Remove sumPooler function

d0d6da5

Vaishaal reviewed Mar 14, 2016
View reviewed changes

shivaram added 3 commits March 14, 2016 08:53

Fix code review comments

ebd4998

More code review comments.

9cf6a48

Also introduce a new API RandomImageTransformer

Remove unused RandomFlipper

8b7b8c4

etrain reviewed Mar 16, 2016
View reviewed changes

Add test cases for new nodes

f40b495

etrain reviewed Mar 21, 2016
View reviewed changes

shivaram added 2 commits March 20, 2016 21:09

Style fix

5a9547c

Merge remote-tracking branch 'amplab/master' into cifar-augmenter

5368829

Fixes from local cluster run

da2344b

etrain reviewed Mar 21, 2016
View reviewed changes

Style fixes

e114b52

etrain mentioned this pull request Mar 21, 2016

Certain pipelines are double-caching #258

Closed

shivaram added 3 commits March 21, 2016 11:53

Return a pipeline object

e0d517c

Try to avoid recomputation

3db2b5a

Avoid recomputation in other CIFAR pipelines

202317e

etrain reviewed Mar 21, 2016
View reviewed changes

Code review fixes

32c3dd0

etrain added a commit that referenced this pull request Mar 22, 2016

Merge pull request #247 from shivaram/cifar-augmenter

4552d2c

Port Cifar Augmented pipeline and its nodes

etrain merged commit 4552d2c into amplab:master Mar 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port Cifar Augmented pipeline and its nodes #247

Port Cifar Augmented pipeline and its nodes #247

shivaram commented Mar 12, 2016

shivaram commented Mar 12, 2016

etrain Mar 12, 2016

shivaram Mar 21, 2016

etrain commented Mar 12, 2016

shivaram commented Mar 12, 2016

Vaishaal Mar 14, 2016

shivaram Mar 14, 2016

shivaram commented Mar 14, 2016

etrain Mar 16, 2016

shivaram Mar 16, 2016

shivaram commented Mar 21, 2016

etrain Mar 21, 2016

shivaram Mar 21, 2016

shivaram commented Mar 21, 2016

etrain Mar 21, 2016

shivaram Mar 21, 2016

shivaram commented Mar 21, 2016

etrain Mar 21, 2016

shivaram Mar 21, 2016

etrain commented Mar 22, 2016


		val predictionPipeline = featurizer andThen model andThen MaxClassifier
		val predictionPipeline = featurizer andThen MaxClassifier

Port Cifar Augmented pipeline and its nodes #247

Port Cifar Augmented pipeline and its nodes #247

Conversation

shivaram commented Mar 12, 2016

shivaram commented Mar 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etrain commented Mar 12, 2016

shivaram commented Mar 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivaram commented Mar 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivaram commented Mar 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivaram commented Mar 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivaram commented Mar 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etrain commented Mar 22, 2016