Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adam2vcf Fails with Sample not serializable #1100

Closed
jpdna opened this issue Aug 6, 2016 · 0 comments
Closed

adam2vcf Fails with Sample not serializable #1100

jpdna opened this issue Aug 6, 2016 · 0 comments
Assignees
Labels

Comments

@jpdna
Copy link
Member

jpdna commented Aug 6, 2016

using adam as of commit: e7e1adf
I was attempting a round trip of vcf to adam-parquet to vcf with the following commands
adam-submit vcf2adam HG00096.vcf HG00096.var.adam
(worked fine)

then back to vcf with:
adam-submit adam2vcf HG00096.var.adam outFromAdamHG00096.vcf

the adam2vcf command produced the following error:

adam-submit adam2vcf HG00096.var.adam outFromAdamHG00096.vcf
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/jpr1/work/Hbase_July22/spark1.6.1/spark-1.6.1-bin-hadoop2.6/bin/spark-submit
Command body threw exception:
org.apache.spark.SparkException: Task not serializable
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:742)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:741)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:741)
    at org.bdgenomics.adam.rdd.variation.VariantContextRDD.saveAsVcf(VariantContextRDD.scala:117)
    at org.bdgenomics.adam.cli.ADAM2Vcf.run(ADAM2Vcf.scala:83)
    at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
    at org.bdgenomics.adam.cli.ADAM2Vcf.run(ADAM2Vcf.scala:59)
    at org.bdgenomics.adam.cli.ADAMMain.apply(ADAMMain.scala:131)
    at org.bdgenomics.adam.cli.ADAMMain$.main(ADAMMain.scala:71)
    at org.bdgenomics.adam.cli.ADAMMain.main(ADAMMain.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: org.bdgenomics.formats.avro.Sample
Serialization stack:
    - object not serializable (class: org.bdgenomics.formats.avro.Sample, value: {"sampleId": "HG00096", "name": null, "attributes": {}})
    - writeObject data (class: scala.collection.immutable.$colon$colon)
    - object (class scala.collection.immutable.$colon$colon, List({"sampleId": "HG00096", "name": null, "attributes": {}}))
    - field (class: org.bdgenomics.adam.rdd.variation.VariantContextRDD, name: samples, type: interface scala.collection.Seq)
    - object (class org.bdgenomics.adam.rdd.variation.VariantContextRDD, VariantContextRDD(MapPartitionsRDD[4] at map at GenotypeRDD.scala:62,SequenceDictionary{
1->249250621, 0
2->243199373, 1
3->198022430, 2
4->191154276, 3
5->180915260, 4
6->171115067, 5
7->159138663, 6
8->146364022, 7
9->141213431, 8
10->135534747, 9
11->135006516, 10
12->133851895, 11
13->115169878, 12
14->107349540, 13
15->102531392, 14
16->90354753, 15
17->81195210, 16
18->78077248, 17
19->59128983, 18
20->63025520, 19
21->48129895, 20
22->51304566, 21
GL000191.1->106433, 22
GL000192.1->547496, 23
GL000193.1->189789, 24
GL000194.1->191469, 25
GL000195.1->182896, 26
GL000196.1->38914, 27
GL000197.1->37175, 28
GL000198.1->90085, 29
GL000199.1->169874, 30
GL000200.1->187035, 31
GL000201.1->36148, 32
GL000202.1->40103, 33
GL000203.1->37498, 34
GL000204.1->81310, 35
GL000205.1->174588, 36
GL000206.1->41001, 37
GL000207.1->4262, 38
GL000208.1->92689, 39
GL000209.1->159169, 40
GL000210.1->27682, 41
GL000211.1->166566, 42
GL000212.1->186858, 43
GL000213.1->164239, 44
GL000214.1->137718, 45
GL000215.1->172545, 46
GL000216.1->172294, 47
GL000217.1->172149, 48
GL000218.1->161147, 49
GL000219.1->179198, 50
GL000220.1->161802, 51
GL000221.1->155397, 52
GL000222.1->186861, 53
GL000223.1->180455, 54
GL000224.1->179693, 55
GL000225.1->211173, 56
GL000226.1->15008, 57
GL000227.1->128374, 58
GL000228.1->129120, 59
GL000229.1->19913, 60
GL000230.1->43691, 61
GL000231.1->27386, 62
GL000232.1->40652, 63
GL000233.1->45941, 64
GL000234.1->40531, 65
GL000235.1->34474, 66
GL000236.1->41934, 67
GL000237.1->45867, 68
GL000238.1->39939, 69
GL000239.1->33824, 70
GL000240.1->41933, 71
GL000241.1->42152, 72
GL000242.1->43523, 73
GL000243.1->43341, 74
GL000244.1->39929, 75
GL000245.1->36651, 76
GL000246.1->38154, 77
GL000247.1->36422, 78
GL000248.1->39786, 79
GL000249.1->38502, 80
MT->16569, 81
NC_007605->171823, 82
X->155270560, 83
Y->59373566, 84
hs37d5->35477943, 85},List({"sampleId": "HG00096", "name": null, "attributes": {}})))
    - field (class: org.bdgenomics.adam.rdd.variation.VariantContextRDD$$anonfun$4, name: $outer, type: class org.bdgenomics.adam.rdd.variation.VariantContextRDD)
    - object (class org.bdgenomics.adam.rdd.variation.VariantContextRDD$$anonfun$4, <function2>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
    ... 25 more
Aug 6, 2016 12:02:13 PM INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 6

@jpdna jpdna added the bug label Aug 6, 2016
@fnothaft fnothaft self-assigned this Aug 6, 2016
fnothaft added a commit to fnothaft/adam that referenced this issue Aug 6, 2016
Resolves bigdatagenomics#1100. Registered `Sample` class with the `AvroSerializer` in
`ADAMKryoRegistrator`.
fnothaft added a commit to fnothaft/adam that referenced this issue Aug 7, 2016
Resolves bigdatagenomics#1100. Registered `Sample` class with the `AvroSerializer` in
`ADAMKryoRegistrator`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants