Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingestion-sink undeliverable message: java.lang.NullPointerException: null #1294

Closed
whd opened this issue May 25, 2020 · 2 comments · Fixed by #1297
Closed

ingestion-sink undeliverable message: java.lang.NullPointerException: null #1294

whd opened this issue May 25, 2020 · 2 comments · Fixed by #1297
Assignees

Comments

@whd
Copy link
Member

whd commented May 25, 2020

/CC @relud, there appears to be a persistent issue with telemetry live sink that will need to be addressed this week (or dropped and recovered from pbd, but we don't have an automated mechanism for that):

{
 insertId: "ul5b6261edy7hygbk"

jsonPayload: {
  endOfBatch: false

instant: {
   epochSecond: 1590375717
   nanoOfSecond: 429000000
  }
  level: "ERROR"
  loggerFqcn: "org.apache.logging.slf4j.Log4jLogger"
  loggerName: "com.mozilla.telemetry.ingestion.sink.io.Pubsub$Read"
  message: "failed to deliver message"
  thread: "Gax-2"
  threadId: 25
  threadPriority: 5

thrown: {
   commonElementCount: 0
   extendedStackTrace: "java.lang.NullPointerException: null
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.coerceToBqType(PubsubMessageToObjectNode.java:508) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.processField(PubsubMessageToObjectNode.java:395) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.transformForBqSchema(PubsubMessageToObjectNode.java:286) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.processField(PubsubMessageToObjectNode.java:374) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.transformForBqSchema(PubsubMessageToObjectNode.java:286) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.processField(PubsubMessageToObjectNode.java:374) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.transformForBqSchema(PubsubMessageToObjectNode.java:286) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.processField(PubsubMessageToObjectNode.java:342) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.transformForBqSchema(PubsubMessageToObjectNode.java:286) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.processField(PubsubMessageToObjectNode.java:342) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.transformForBqSchema(PubsubMessageToObjectNode.java:286) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload.apply(PubsubMessageToObjectNode.java:243) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.transform.PubsubMessageToObjectNode$Payload$WithOpenCensusMetrics.apply(PubsubMessageToObjectNode.java:114) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.io.Gcs$Write$Ndjson.encodeInput(Gcs.java:51) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.io.Gcs$Write$Ndjson.encodeInput(Gcs.java:34) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.util.BatchWrite.apply(BatchWrite.java:148) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.util.BatchWrite.apply(BatchWrite.java:26) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.config.SinkConfig$Output.apply(SinkConfig.java:128) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.mozilla.telemetry.ingestion.sink.config.SinkConfig$Output.apply(SinkConfig.java:109) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:995) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2137) ~[?:1.8.0_252]
	at com.mozilla.telemetry.ingestion.sink.io.Pubsub$Read.lambda$new$2(Pubsub.java:46) ~[ingestion-sink-0.1-SNAPSHOT.jar:?]
	at com.google.cloud.pubsub.v1.MessageDispatcher$4.run(MessageDispatcher.java:379) [google-cloud-pubsub-1.102.1.jar:1.102.1]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_252]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
"
   name: "java.lang.NullPointerException"
  }
 }

From my read, it appears coerceToBqType is receiving a null as its JsonNode o argument and o.isNull() is not having any of it. /CC @jklukas as well since he modified the code most recently.

@relud
Copy link
Contributor

relud commented May 26, 2020

I don't think it would have been worth the effort, but in case it is ever needed, it's possible to recover from an undeliverable message by this process:

  1. stop the decoder
  2. wait for the sink to deliver all deliverable messages, i.e. wait for unacked messages to stabilize for 5 minutes (sink) or 30 minutes (loader)
  3. stop the sink
  4. download and remove outstanding message(s) from the sink subscription, e.g. gcloud pubsub subscriptions pull SUBSCRIPTION --auto-ack --format=json > bad_messages.json
  5. start the decoder and the sink

@whd
Copy link
Member Author

whd commented May 26, 2020

We could also use a dead letter queue if 100 delivery attempts is sufficient to consider a message undeliverable. In any case I don't plan on automating any mechanisms around handling undeliverable messages since I agree it's not worth the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants