-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port Cifar Augmented pipeline and its nodes #247
Conversation
Some notes:
|
import org.apache.spark.rdd.RDD | ||
import nodes.util.MaxClassifier | ||
|
||
object AugmentedExamplesEvaluator extends Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of AugmentedExamplesEvaluator
this seems like it could just as easily be called EnsembleEvaluator
or something.
In the traditional ensemble setting, a hypothetical EnsembleEstimator
could be an Estimator[T,Seq[V]]
or if people wanted to construct ensembles themselves, they could use a Pipeline.gather
to coallate the results of running several models on an input data item.
Here we have a case that's slightly different, but I think the basic idea still holds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright - code-wise, this mostly looks good. The higher level question is how we want to represent this pipeline conceptually. Is it a flatmap with some key attached to the features that we can later group on (this is messy because it assumes labels and features have identical partitioning, etc.) or would we rather think of this in more general terms with some kind of ensembling. I think this is one of those situations where a whiteboard and an example pipeline might be useful, then we can decide if we have sufficient tools to represent that thing or not. |
Cool - We can chat about this on Monday. I don't think we have the right machinery to do this kind of augment-ensembles cleanly yet, but we can try to use this pipeline to design things |
*/ | ||
case class RandomFlipper(flipChance: Double, seed: Long = 12334L) extends Transformer[Image, Image] { | ||
|
||
val rnd = new scala.util.Random(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply scala.util.Random isn't serializable...
Caused by: java.io.NotSerializableException: scala.util.Random
Serialization stack:
- object not serializable (class: scala.util.Random, value: scala.util.Random@39652a30)
- field (class: nodes.images.RandomPatcher, name: rnd, type: class scala.util.Random)
- object (class nodes.images.RandomPatcher, <function1>)
- field (class: nodes.images.RandomPatcher$$anonfun$apply$1, name: $outer, type: class nodes.images.RandomPatcher)
- object (class nodes.images.RandomPatcher$$anonfun$apply$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 22 more
16/03/13 23:33:37 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@192.168.10.51:48870] <- [ak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh - thats a pain. Using java.util.Random
now
Also introduce a new API RandomImageTransformer
I've fixed all the comments other the discussion about ensembling. I also introduced a new class |
((unnormFilters(::, *) / (twoNorms + 1e-10)) * whitener.whitener.t, whitener) | ||
} | ||
|
||
val trainImagesAugmented = RandomImageTransformer(flipChance, ImageUtils.flipHorizontal).apply( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this written this way because FunctionNode
isn't a first-class citizen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - you can't chain anything after function nodes right now
Alright added unit tests for the two patcher nodes and one for |
val trainImages = ImageExtractor(trainData) | ||
|
||
val patchExtractor = new Windower(conf.patchSteps, conf.patchSize) | ||
.andThen(ImageVectorizer.apply) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit - can we move the andThen
to the end of the previous line and remove the spurious .
, (
, )
and apply
here? I know you're copy/pasting old code, but I like that style better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Other than the performance problem with repeated convolutions, the test error was fine. I got
from using 5120 filters (i.e. 40k features) |
|
||
val unscaledFeaturizer = | ||
new Convolver(filters, augmentPatchSize, augmentPatchSize, numChannels, Some(whitener), true) | ||
.andThen(SymmetricRectifier(alpha=conf.alpha)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next ten or so lines have the old .andThen(
syntax still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed all the andThen
s now
|
||
val predictionPipeline = featurizer andThen model andThen MaxClassifier | ||
val predictionPipeline = featurizer andThen MaxClassifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the MaxClassfier
to the end of what we're calling featurizer
and change that name to predictionPipeline
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - done
This LGTM, merging. Thanks @shivaram ! |
Port Cifar Augmented pipeline and its nodes
Fixes #242