Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26818][ML] Make MLEvents JSON ser/de safe #23728

Closed
wants to merge 2 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Feb 2, 2019

What changes were proposed in this pull request?

Currently, it looks it's not going to cause any virtually effective problem apparently (if I didn't misread the codes).

I see one place that JSON formatted events are being used.

val eventJson = JsonProtocol.sparkEventToJson(event)

It's okay because it just logs when the exception is ignorable

logError(s"Listener ${Utils.getFormattedClassName(listener)} threw an exception", e)

I guess it should be best to stay safe - I don't want this unstable experimental feature breaks anything in any case. It also disables logEvent in SparkListenerEvent for the same reason.

This is also to match SQL execution events side:

case class SparkListenerSQLExecutionEnd(executionId: Long, time: Long)
extends SparkListenerEvent {
// The name of the execution, e.g. `df.collect` will trigger a SQL execution with name "collect".
@JsonIgnore private[sql] var executionName: Option[String] = None
// The following 3 fields are only accessed when `executionName` is defined.
// The duration of the SQL execution, in nanoseconds.
@JsonIgnore private[sql] var duration: Long = 0L
// The `QueryExecution` instance that represents the SQL execution
@JsonIgnore private[sql] var qe: QueryExecution = null
// The exception object that caused this execution to fail. None if the execution doesn't fail.
@JsonIgnore private[sql] var executionFailure: Option[Exception] = None
}

to make ML events JSON ser/de safe.

How was this patch tested?

Manually tested, and unit tests were added.

@HyukjinKwon
Copy link
Member Author

cc @vanzin, @srowen, and @cloud-fan, would you mind if I ask to take a look please?

@HyukjinKwon
Copy link
Member Author

Let me turn off this to be not logged by EventLoggingListener to be even more conservative.

@SparkQA
Copy link

SparkQA commented Feb 2, 2019

Test build #102016 has finished for PR 23728 at commit 1687844.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class TransformStart() extends MLEvent
  • case class TransformEnd() extends MLEvent
  • case class FitStart[M <: Model[M]]() extends MLEvent
  • case class FitEnd[M <: Model[M]]() extends MLEvent
  • case class LoadInstanceStart[T](path: String) extends MLEvent
  • case class LoadInstanceEnd[T]() extends MLEvent
  • case class SaveInstanceStart(path: String) extends MLEvent
  • case class SaveInstanceEnd(path: String) extends MLEvent

@SparkQA
Copy link

SparkQA commented Feb 2, 2019

Test build #102018 has finished for PR 23728 at commit bd80031.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait MLEvent extends SparkListenerEvent

@HyukjinKwon
Copy link
Member Author

Thanks, @srowen and @viirya. This PR makes it more conservative and safer.

Merged to master.

@asfgit asfgit closed this in dfb8809 Feb 3, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Currently, it looks it's not going to cause any virtually effective problem apparently (if I didn't misread the codes).

I see one place that JSON formatted events are being used.

https://github.com/apache/spark/blob/ec506bd30c2ca324c12c9ec811764081c2eb8c42/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L148

It's okay because it just logs when the exception is ignorable

https://github.com/apache/spark/blob/9690eba16efe6d25261934d8b73a221972b684f3/core/src/main/scala/org/apache/spark/util/ListenerBus.scala#L111

I guess it should be best to stay safe - I don't want this unstable experimental feature breaks anything in any case. It also disables `logEvent` in `SparkListenerEvent` for the same reason.

This is also to match SQL execution events side:

https://github.com/apache/spark/blob/ca545f79410a464ef24e3986fac225f53bb2ef02/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L41-L57

to make ML events JSON ser/de safe.

## How was this patch tested?

Manually tested, and unit tests were added.

Closes apache#23728 from HyukjinKwon/SPARK-26818.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
robert3005 pushed a commit to palantir/spark that referenced this pull request Feb 28, 2019
## What changes were proposed in this pull request?

Currently, it looks it's not going to cause any virtually effective problem apparently (if I didn't misread the codes).

I see one place that JSON formatted events are being used.

https://github.com/apache/spark/blob/ec506bd30c2ca324c12c9ec811764081c2eb8c42/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L148

It's okay because it just logs when the exception is ignorable

https://github.com/apache/spark/blob/9690eba16efe6d25261934d8b73a221972b684f3/core/src/main/scala/org/apache/spark/util/ListenerBus.scala#L111

I guess it should be best to stay safe - I don't want this unstable experimental feature breaks anything in any case. It also disables `logEvent` in `SparkListenerEvent` for the same reason.

This is also to match SQL execution events side:

https://github.com/apache/spark/blob/ca545f79410a464ef24e3986fac225f53bb2ef02/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala#L41-L57

to make ML events JSON ser/de safe.

## How was this patch tested?

Manually tested, and unit tests were added.

Closes apache#23728 from HyukjinKwon/SPARK-26818.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon deleted the SPARK-26818 branch March 3, 2020 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants