SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 #566

tmalaska · 2014-04-26T13:20:00Z

No description provided.

AmplabJenkins · 2014-04-26T13:22:57Z

Can one of the admins verify this patch?

tdas · 2014-04-28T05:50:05Z

Jenkins, test this please.

tdas · 2014-04-28T05:50:41Z

external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala

@@ -153,3 +181,15 @@ class FlumeReceiver(

  override def preferredLocation = Some(host)
 }
+
+private[streaming]
+class CompressionChannelPipelineFactory() extends ChannelPipelineFactory {


No need for () when no parameters are present.

tdas · 2014-04-28T05:53:45Z

Except a few nits, it looks good to me. However, since its so late in the process of Spark 1.0, I am little extra afraid of breaking something. If possible, can you run this one a cluster with real data transfer from producer to see if this works?

tmalaska · 2014-04-28T11:04:11Z

OK I have reviewed the commits and I will be making changes this morning. Thank tdas.

tdas · 2014-04-28T20:27:32Z

Jenkins, this okay to test.

tdas · 2014-04-28T20:45:05Z

Hey @tmalaska, I pondered about the code a bit more, especially about the lazy vals. The lazy val in this case is probably not a good idea. The receivers are now (after #300) are designed to be restartable multiple times. So onStart() + onStop() could be called multiple times if the receiver decides to restart itself (to handle exceptions). In which case, start() will be called on the netty server after it has been closed. I am not sure that is possible. So its best to create a new NettyServer every time a onStart() is called, rather than lazy initialize and use the netty server.

So its probably best to do something like this.

FlumeReceiver .... {
   var server: NettyServer = null

   def onStart() {
       synchronized { 
           server = initServer()
           server.start()
       }
   }

   def onStop() {
      synchronized { 
         if (server != null) {
            server.stop()
         }
      }
   }
...
}

tmalaska · 2014-04-29T00:47:30Z

Will do. I will start tomorrow. Shouldn't take long.

tmalaska · 2014-04-29T11:02:57Z

Let me know if the changes are ok. The only difference from what you told me to do was I made a check to prevent a double start. Let me know if you want me to take it out. If so I can make the change very fast.

  if (server == null) {
    server = initServer()
    server.start()
  } else {
    logWarning("Flume receiver being asked to start more then once with out close")
  }

tdas · 2014-04-29T18:53:22Z

aah, right, makes sense. Please go ahead with it, and test it as well. I am still hopeful that we can squeeze this in for Spark 1.0 :)

tmalaska · 2014-04-29T21:05:27Z

I already updated the code and tested it. Feel free to commit unless you see anything wrong.

If you commit it in the next couple hours. I can start on SPARK-1642 tonight or tomorrow morning.

tmalaska · 2014-05-01T19:52:50Z

Hey tdas,

How is this Jira looking. Is there anything I need to do to get it passed?

tdas · 2014-05-02T00:59:10Z

Got side tracked, will take a look asap!
On May 1, 2014 12:52 PM, "Ted Malaska" notifications@github.com wrote:

Hey tdas,

How is this Jira looking. Is there anything I need to do to get it passed?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/566#issuecomment-41949374
.

tmalaska · 2014-05-05T13:48:35Z

LOL tdas, how it going. Just pinging.

new MLlib documentation for optimization, regression and classification new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods. also did some minor changes in the code for consistency. scala tests pass. this is the rebased branch, i deleted the old PR jira: https://spark-project.atlassian.net/browse/MLLIB-19 Author: Martin Jaggi <m.jaggi@gmail.com> Closes apache#566 and squashes the following commits: 5f0f31e [Martin Jaggi] line wrap at 100 chars 4e094fb [Martin Jaggi] better description of GradientDescent 1d6965d [Martin Jaggi] remove broken url ea569c3 [Martin Jaggi] telling what updater actually does 964732b [Martin Jaggi] lambda R() in documentation a6c6228 [Martin Jaggi] better comments in SGD code for regression b32224a [Martin Jaggi] new optimization documentation d5dfef7 [Martin Jaggi] new classification and regression documentation b07ead6 [Martin Jaggi] correct scaling for MSE loss ba6158c [Martin Jaggi] use d for the number of features bab2ed2 [Martin Jaggi] renaming LeastSquaresGradient

pwendell · 2014-06-04T23:20:44Z

@tdas this seems pretty useful - could you take a look?

tdas · 2014-06-04T23:53:46Z

Yeah, starting to look at all pending PRs now.

On Wed, Jun 4, 2014 at 4:20 PM, Patrick Wendell notifications@github.com
wrote:

@tdas https://github.com/tdas this seems pretty useful - could you take
a look?

—
Reply to this email directly or view it on GitHub
#566 (comment).

tmalaska · 2014-06-17T22:08:20Z

Hey tdas,

I was going to do 1642 tonight, but I noticed these changes are not in the code yet. What should I do?

Thanks

tdas · 2014-06-20T20:38:46Z

Jenkins, test this again.

tmalaska · 2014-06-20T20:40:09Z

Let me know if there is anything I can do to help this go through.

Thanks tdas

On Fri, Jun 20, 2014 at 4:38 PM, Tathagata Das notifications@github.com
wrote:

Jenkins, test this again.

—
Reply to this email directly or view it on GitHub
#566 (comment).

tdas · 2014-06-20T20:42:08Z

external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala

+class CompressionChannelPipelineFactory extends ChannelPipelineFactory {
+
+  def getPipeline() = {
+      val pipeline = Channels.pipeline()


Formatting issue. 2 space indents required.

tdas · 2014-06-20T20:49:43Z

external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala

+import org.jboss.netty.channel.ChannelPipelineFactory
+import java.util.concurrent.Executors
+import org.jboss.netty.channel.Channels
+import org.jboss.netty.handler.codec.compression.ZlibDecoder


please dedup, and sort. see import style in https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

tdas · 2014-06-20T20:51:37Z

Sorry Ted, that this has been sitting here for so long. Will get this in ASAP.
Other than a few nit, it LGTM. :)

tmalaska · 2014-06-20T21:01:33Z

No worries. I'm starting to free up so I would love to do more work. I will finish this one up then the Flume encryption one. Then if you have anything else. Let me at it.

Thanks

tmalaska · 2014-06-21T14:57:33Z

I'm going to have to make a new pull request, because I had drop the repo that belonged to this pull request. I will update the ticket with the information when it's ready

tmalaska · 2014-06-21T20:07:20Z

New Pull request #1168

new MLlib documentation for optimization, regression and classification new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods. also did some minor changes in the code for consistency. scala tests pass. this is the rebased branch, i deleted the old PR jira: https://spark-project.atlassian.net/browse/MLLIB-19 Author: Martin Jaggi <m.jaggi@gmail.com> Closes apache#566 and squashes the following commits: 5f0f31e [Martin Jaggi] line wrap at 100 chars 4e094fb [Martin Jaggi] better description of GradientDescent 1d6965d [Martin Jaggi] remove broken url ea569c3 [Martin Jaggi] telling what updater actually does 964732b [Martin Jaggi] lambda R() in documentation a6c6228 [Martin Jaggi] better comments in SGD code for regression b32224a [Martin Jaggi] new optimization documentation d5dfef7 [Martin Jaggi] new classification and regression documentation b07ead6 [Martin Jaggi] correct scaling for MSE loss ba6158c [Martin Jaggi] use d for the number of features bab2ed2 [Martin Jaggi] renaming LeastSquaresGradient Conflicts: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala

… ExternalShuffleBlockHandler (apache#566) More context on https://issues.apache.org/jira/browse/SPARK-27773. Basically gives us a rough indicator of health of the external shuffle service / metric that we can monitor and alert on.

* Update Go version for 1.13 conformance job Release 1.13 of Kubernetes supports Go 1.12 https://github.com/kubernetes/kubernetes/blob/release-1.13/Godeps/Godeps.json#L3 * Update tag Application:Go

…apache#566) * AL-4757 when refresh InMemoryFileIndex, if recursiveFileLookup is true, use recursiveDirChildrenFiles * AL-4757 add UT

Finished Second draft

6a39069

tdas mentioned this pull request Apr 28, 2014

SPARK-1478 #405

Closed

tdas reviewed Apr 28, 2014
View reviewed changes

Fixed issues reported by tads in pr 566

0cd6f24

Moved init of server to the onStart function

b626656

tdas reviewed Jun 20, 2014
View reviewed changes

tmalaska closed this Jul 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 #566

SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 #566

tmalaska commented Apr 26, 2014

AmplabJenkins commented Apr 26, 2014

tdas commented Apr 28, 2014

tdas Apr 28, 2014

tmalaska Apr 28, 2014

tdas commented Apr 28, 2014

tmalaska commented Apr 28, 2014

tdas commented Apr 28, 2014

tdas commented Apr 28, 2014

tmalaska commented Apr 29, 2014

tmalaska commented Apr 29, 2014

tdas commented Apr 29, 2014

tmalaska commented Apr 29, 2014

tmalaska commented May 1, 2014

tdas commented May 2, 2014

tmalaska commented May 5, 2014

pwendell commented Jun 4, 2014

tdas commented Jun 4, 2014

tmalaska commented Jun 17, 2014

tdas commented Jun 20, 2014

tmalaska commented Jun 20, 2014

tdas Jun 20, 2014

tdas Jun 20, 2014

tdas commented Jun 20, 2014

tmalaska commented Jun 20, 2014

tmalaska commented Jun 21, 2014

tmalaska commented Jun 21, 2014

SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 #566

SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 #566

Conversation

tmalaska commented Apr 26, 2014

AmplabJenkins commented Apr 26, 2014

tdas commented Apr 28, 2014

tdas Apr 28, 2014

Choose a reason for hiding this comment

tmalaska Apr 28, 2014

Choose a reason for hiding this comment

tdas commented Apr 28, 2014

tmalaska commented Apr 28, 2014

tdas commented Apr 28, 2014

tdas commented Apr 28, 2014

tmalaska commented Apr 29, 2014

tmalaska commented Apr 29, 2014

tdas commented Apr 29, 2014

tmalaska commented Apr 29, 2014

tmalaska commented May 1, 2014

tdas commented May 2, 2014

tmalaska commented May 5, 2014

pwendell commented Jun 4, 2014

tdas commented Jun 4, 2014

tmalaska commented Jun 17, 2014

tdas commented Jun 20, 2014

tmalaska commented Jun 20, 2014

tdas Jun 20, 2014

Choose a reason for hiding this comment

tdas Jun 20, 2014

Choose a reason for hiding this comment

tdas commented Jun 20, 2014

tmalaska commented Jun 20, 2014

tmalaska commented Jun 21, 2014

tmalaska commented Jun 21, 2014