Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-1478: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 #566

Closed
wants to merge 3 commits into from

Conversation

tmalaska
Copy link

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@tdas tdas mentioned this pull request Apr 28, 2014
@tdas
Copy link
Contributor

tdas commented Apr 28, 2014

Jenkins, test this please.

@@ -153,3 +181,15 @@ class FlumeReceiver(

override def preferredLocation = Some(host)
}

private[streaming]
class CompressionChannelPipelineFactory() extends ChannelPipelineFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for () when no parameters are present.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tdas
Copy link
Contributor

tdas commented Apr 28, 2014

Except a few nits, it looks good to me. However, since its so late in the process of Spark 1.0, I am little extra afraid of breaking something. If possible, can you run this one a cluster with real data transfer from producer to see if this works?

@tmalaska
Copy link
Author

OK I have reviewed the commits and I will be making changes this morning. Thank tdas.

@tdas
Copy link
Contributor

tdas commented Apr 28, 2014

Jenkins, this okay to test.

@tdas
Copy link
Contributor

tdas commented Apr 28, 2014

Hey @tmalaska, I pondered about the code a bit more, especially about the lazy vals. The lazy val in this case is probably not a good idea. The receivers are now (after #300) are designed to be restartable multiple times. So onStart() + onStop() could be called multiple times if the receiver decides to restart itself (to handle exceptions). In which case, start() will be called on the netty server after it has been closed. I am not sure that is possible. So its best to create a new NettyServer every time a onStart() is called, rather than lazy initialize and use the netty server.

So its probably best to do something like this.

FlumeReceiver .... {
   var server: NettyServer = null

   def onStart() {
       synchronized { 
           server = initServer()
           server.start()
       }
   }

   def onStop() {
      synchronized { 
         if (server != null) {
            server.stop()
         }
      }
   }
...
}

@tmalaska
Copy link
Author

Will do. I will start tomorrow. Shouldn't take long.

@tmalaska
Copy link
Author

Let me know if the changes are ok. The only difference from what you told me to do was I made a check to prevent a double start. Let me know if you want me to take it out. If so I can make the change very fast.

  if (server == null) {
    server = initServer()
    server.start()
  } else {
    logWarning("Flume receiver being asked to start more then once with out close")
  }

@tdas
Copy link
Contributor

tdas commented Apr 29, 2014

aah, right, makes sense. Please go ahead with it, and test it as well. I am still hopeful that we can squeeze this in for Spark 1.0 :)

@tmalaska
Copy link
Author

I already updated the code and tested it. Feel free to commit unless you see anything wrong.

If you commit it in the next couple hours. I can start on SPARK-1642 tonight or tomorrow morning.

@tmalaska
Copy link
Author

tmalaska commented May 1, 2014

Hey tdas,

How is this Jira looking. Is there anything I need to do to get it passed?

@tdas
Copy link
Contributor

tdas commented May 2, 2014

Got side tracked, will take a look asap!
On May 1, 2014 12:52 PM, "Ted Malaska" notifications@github.com wrote:

Hey tdas,

How is this Jira looking. Is there anything I need to do to get it passed?


Reply to this email directly or view it on GitHubhttps://github.com//pull/566#issuecomment-41949374
.

@tmalaska
Copy link
Author

tmalaska commented May 5, 2014

LOL tdas, how it going. Just pinging.

pwendell pushed a commit to pwendell/spark that referenced this pull request May 12, 2014
new MLlib documentation for optimization, regression and classification

new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods.
also did some minor changes in the code for consistency. scala tests pass.

this is the rebased branch, i deleted the old PR

jira:
https://spark-project.atlassian.net/browse/MLLIB-19

Author: Martin Jaggi <m.jaggi@gmail.com>

Closes apache#566 and squashes the following commits:

5f0f31e [Martin Jaggi] line wrap at 100 chars
4e094fb [Martin Jaggi] better description of GradientDescent
1d6965d [Martin Jaggi] remove broken url
ea569c3 [Martin Jaggi] telling what updater actually does
964732b [Martin Jaggi] lambda R() in documentation
a6c6228 [Martin Jaggi] better comments in SGD code for regression
b32224a [Martin Jaggi] new optimization documentation
d5dfef7 [Martin Jaggi] new classification and regression documentation
b07ead6 [Martin Jaggi] correct scaling for MSE loss
ba6158c [Martin Jaggi] use d for the number of features
bab2ed2 [Martin Jaggi] renaming LeastSquaresGradient
@pwendell
Copy link
Contributor

pwendell commented Jun 4, 2014

@tdas this seems pretty useful - could you take a look?

@tdas
Copy link
Contributor

tdas commented Jun 4, 2014

Yeah, starting to look at all pending PRs now.

On Wed, Jun 4, 2014 at 4:20 PM, Patrick Wendell notifications@github.com
wrote:

@tdas https://github.com/tdas this seems pretty useful - could you take
a look?


Reply to this email directly or view it on GitHub
#566 (comment).

@tmalaska
Copy link
Author

Hey tdas,

I was going to do 1642 tonight, but I noticed these changes are not in the code yet. What should I do?

Thanks

@tdas
Copy link
Contributor

tdas commented Jun 20, 2014

Jenkins, test this again.

@tmalaska
Copy link
Author

Let me know if there is anything I can do to help this go through.

Thanks tdas

On Fri, Jun 20, 2014 at 4:38 PM, Tathagata Das notifications@github.com
wrote:

Jenkins, test this again.


Reply to this email directly or view it on GitHub
#566 (comment).

class CompressionChannelPipelineFactory extends ChannelPipelineFactory {

def getPipeline() = {
val pipeline = Channels.pipeline()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting issue. 2 space indents required.

import org.jboss.netty.channel.ChannelPipelineFactory
import java.util.concurrent.Executors
import org.jboss.netty.channel.Channels
import org.jboss.netty.handler.codec.compression.ZlibDecoder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdas
Copy link
Contributor

tdas commented Jun 20, 2014

Sorry Ted, that this has been sitting here for so long. Will get this in ASAP.
Other than a few nit, it LGTM. :)

@tmalaska
Copy link
Author

No worries. I'm starting to free up so I would love to do more work. I will finish this one up then the Flume encryption one. Then if you have anything else. Let me at it.

Thanks

@tmalaska
Copy link
Author

I'm going to have to make a new pull request, because I had drop the repo that belonged to this pull request. I will update the ticket with the information when it's ready

@tmalaska
Copy link
Author

New Pull request #1168

@tmalaska tmalaska closed this Jul 9, 2014
gzm55 pushed a commit to MediaV/spark that referenced this pull request Jul 17, 2014
new MLlib documentation for optimization, regression and classification

new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods.
also did some minor changes in the code for consistency. scala tests pass.

this is the rebased branch, i deleted the old PR

jira:
https://spark-project.atlassian.net/browse/MLLIB-19

Author: Martin Jaggi <m.jaggi@gmail.com>

Closes apache#566 and squashes the following commits:

5f0f31e [Martin Jaggi] line wrap at 100 chars
4e094fb [Martin Jaggi] better description of GradientDescent
1d6965d [Martin Jaggi] remove broken url
ea569c3 [Martin Jaggi] telling what updater actually does
964732b [Martin Jaggi] lambda R() in documentation
a6c6228 [Martin Jaggi] better comments in SGD code for regression
b32224a [Martin Jaggi] new optimization documentation
d5dfef7 [Martin Jaggi] new classification and regression documentation
b07ead6 [Martin Jaggi] correct scaling for MSE loss
ba6158c [Martin Jaggi] use d for the number of features
bab2ed2 [Martin Jaggi] renaming LeastSquaresGradient

Conflicts:
	mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala
helenyugithub pushed a commit to helenyugithub/spark that referenced this pull request Aug 20, 2019
… ExternalShuffleBlockHandler (apache#566)

More context on https://issues.apache.org/jira/browse/SPARK-27773. Basically gives us a rough indicator of health of the external shuffle service / metric that we can monitor and alert on.
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
* Update Go version for 1.13 conformance job

Release 1.13 of Kubernetes supports Go 1.12
https://github.com/kubernetes/kubernetes/blob/release-1.13/Godeps/Godeps.json#L3

* Update tag Application:Go
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Feb 8, 2023
…apache#566)

* AL-4757 when refresh InMemoryFileIndex, if recursiveFileLookup is true, use recursiveDirChildrenFiles

* AL-4757 add UT
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Apr 7, 2023
…apache#566)

* AL-4757 when refresh InMemoryFileIndex, if recursiveFileLookup is true, use recursiveDirChildrenFiles

* AL-4757 add UT
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 18, 2023
…apache#566)

* AL-4757 when refresh InMemoryFileIndex, if recursiveFileLookup is true, use recursiveDirChildrenFiles

* AL-4757 add UT
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Dec 8, 2023
…apache#566)

* AL-4757 when refresh InMemoryFileIndex, if recursiveFileLookup is true, use recursiveDirChildrenFiles

* AL-4757 add UT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants