Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

kmeans on spark error: requires number of clusters greater than one, but does not respond to changing 'k' #15

Open
kaileena1 opened this issue Sep 25, 2018 · 0 comments

Comments

@kaileena1
Copy link

kaileena1 commented Sep 25, 2018

I have this code:

val fileName = """file:///home/user/data/csv/sessions_sample.csv"""
     val df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(fileName)
     val input1 = df.select("id", "duration", "ip_dist", "txr1", "txr2", "txr3", "txr4").na.fill(3.0)
     val input2 = input1.map(r => (r.getInt(0), Vectors.dense((1 until r.size - 1).map{ i =>  r.getDouble(i)}.toArray[Double])))
     val input3 = input2.toDF("id", "features")
     input3.count()

val kmeans = new KMeans().setK(100).setSeed(1L).setFeaturesCol("features").setPredictionCol("prediction")
val model = kmeans.fit(input3)
val model = kmeans.fit(input3.select("features"))
// Make predictions
val predictions = model.transform(input3.select("features"))
val predictions = model.transform(input3)
val evaluator = new ClusteringEvaluator()
// i get error when i run this line
val silhouette = evaluator.evaluate(predictions)

Error: 

> java.lang.AssertionError: assertion failed: Number of clusters must be greater than one.
>   at scala.Predef$.assert(Predef.scala:170)
>   at org.apache.spark.ml.evaluation.SquaredEuclideanSilhouette$.computeSilhouetteScore(ClusteringEvaluator.scala:416)
>   at org.apache.spark.ml.evaluation.ClusteringEvaluator.evaluate(ClusteringEvaluator.scala:96)
>   ... 49 elided


Of course i tried changing k. It does not respond. On top of that, I have clusters with infinite cluster centers. For absolutely no value of k my clusters are stable => silhouette gives weird error?

`model.clusterCenters.foreach(println)`

> [3217567.1300936914,145.06533614203505,Infinity,Infinity,Infinity]


please advise. 



Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant