You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using char-dist-features + header features for the domain "dbpedia", we get many features (400+). The training of RandomForestClassifier with Spark fails with the error:
Cause: org.codehaus.janino.JaninoRuntimeException: Code of method "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
When using char-dist-features + header features for the domain "dbpedia", we get many features (400+). The training of RandomForestClassifier with Spark fails with the error:
Cause: org.codehaus.janino.JaninoRuntimeException: Code of method "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
Apparently, there's a bug in Spark, but it's not clear if there is an easy fix for this problem:
https://issues.apache.org/jira/browse/SPARK-16845
http://stackoverflow.com/questions/40044779/find-mean-and-corr-of-10-000-columns-in-pyspark-dataframe
https://issues.apache.org/jira/browse/SPARK-17092
SparkTestSpec reproduces this error currently.
The text was updated successfully, but these errors were encountered: