Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] context is stopped in RabitTrackerHandler #4054

Closed
zongyaozhang opened this issue Jan 11, 2019 · 0 comments · Fixed by #4224
Closed

[jvm-packages] context is stopped in RabitTrackerHandler #4054

zongyaozhang opened this issue Jan 11, 2019 · 0 comments · Fixed by #4224

Comments

@zongyaozhang
Copy link

I use scala tracker and run with spark using xgboost 0.81 version. get the error XGBoost train failed. The error stack is like in section 1 below. The log info is like in section 2 below.
I found the reason is ml.dmlc.xgboost4j.scala.rabit.handler.RabitTrackerHandler class.
image
When I comment the 178 line, then rebuild xgboost4j, my test can run success.
Please check whether this is a defect.

  1. error stack:
    Caused by: ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed
    at ml.dmlc.xgboost4j.scala.spark.XGBoost$.ml$dmlc$xgboost4j$scala$spark$XGBoost$$postTrackerReturnProcessing(XGBoost.scala:283)
    at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$trainDistributed$4.apply(XGBoost.scala:240)
    at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$trainDistributed$4.apply(XGBoost.scala:222)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.immutable.List.map(List.scala:285)
    at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributed(XGBoost.scala:221)
    at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressor.train(XGBoostRegressor.scala:186)
    at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressor.train(XGBoostRegressor.scala:48)
    at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)

  2. log info:
    19/01/11 16:24:00 INFO XGBoostSpark: Rabit returns with exit code 3
    [INFO] [01/11/2019 16:24:00.092] [RabitTracker-akka.actor.default-dispatcher-11] [akka://RabitTracker/user/Handler] Message [ml.dmlc.xgboost4j.scala.rabit.handler.RabitTrackerHandler$RequestCompletionFuture$] from Actor[akka://RabitTracker/deadLetters] to Actor[akka://RabitTracker/user/Handler#-1754393715] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
    19/01/11 16:24:00 ERROR XGBoost: XGBoostModel training failed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant