Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Druid 30.0.0 ingestion task failure research #595

Closed
maltesander opened this issue Jul 19, 2024 · 4 comments · Fixed by stackabletech/docker-images#804
Closed

Druid 30.0.0 ingestion task failure research #595

maltesander opened this issue Jul 19, 2024 · 4 comments · Fixed by stackabletech/docker-images#804
Assignees

Comments

@maltesander
Copy link
Member

maltesander commented Jul 19, 2024

We planned on having Druid 30.0.0 in the 24.7 release, which we now bailed on and marked 30.0.0 as experimental.

The reason were problems with integration tests and demos, especially regarding data ingestion.

Definiton of Done

Ideally the problem is identified and fixed.
However, one possible outcome of this issue could be that we are not able to identifiy and fix the problem within 2-3 days. In that case we at least want to create an upsteam issue at Druid with a instructions to reproduce the issue.

Problem description

In the HDFS deep storage integration test e.g. the ingestion task fails with a null pointer exception.

2024-07-18T07:46:00,275 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [ANNOUNCEMENTS]
2024-07-18T07:46:00,275 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Successfully started lifecycle [module]
2024-07-18T07:46:01,280 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Found intervals and shardSpecs in 2,510ms
2024-07-18T07:46:01,281 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.AbstractBatchIndexTask - Using time chunk lock since forceTimeChunkLock is [true] and mode is [REPLACE_LEGACY].
2024-07-18T07:46:04,183 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Pushing [1] segments in background
2024-07-18T07:46:04,184 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Pushing segments: [wikipedia-2015-09-12_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2024-07-18T07:46:01.359Z]
2024-07-18T07:46:04,186 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Persisted rows[39,244] and (estimated) bytes[27,119,548]
2024-07-18T07:46:04,644 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in BUILD_SEGMENTS.
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException
	at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:1028) ~[druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:519) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runSequential(ParallelIndexSupervisorTask.java:1241) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask(ParallelIndexSupervisorTask.java:573) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450) [druid-indexing-service-30.0.0.jar:30.0.0]
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
	at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:592) ~[guava-32.0.1-jre.jar:?]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:571) ~[guava-32.0.1-jre.jar:?]
	at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:91) ~[guava-32.0.1-jre.jar:?]
	at org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver.pushAndClear(BatchAppenderatorDriver.java:152) ~[druid-server-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver.pushAllAndClear(BatchAppenderatorDriver.java:135) ~[druid-server-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.common.task.InputSourceProcessor.process(InputSourceProcessor.java:136) ~[druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:910) ~[druid-indexing-service-30.0.0.jar:30.0.0]
	... 13 more
Caused by: java.lang.NullPointerException
	at java.base/java.util.Objects.requireNonNull(Objects.java:209) ~[?:?]
	at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.add(Object2IntRBTreeMap.java:273) ~[fastutil-8.5.13.jar:?]
	at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.put(Object2IntRBTreeMap.java:257) ~[fastutil-8.5.13.jar:?]
	at org.apache.druid.segment.SortedDimensionDictionary.<init>(SortedDimensionDictionary.java:45) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DimensionDictionary.sort(DimensionDictionary.java:265) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnIndexer.sortedLookup(DictionaryEncodedColumnIndexer.java:243) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnIndexer.getActualValue(DictionaryEncodedColumnIndexer.java:250) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:89) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:79) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnMerger.allNull(DictionaryEncodedColumnMerger.java:533) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnMerger.writeMergedValueDictionary(DictionaryEncodedColumnMerger.java:149) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.writeDimValuesAndSetupDimConversion(IndexMergerV9.java:816) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:216) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:1155) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.multiphaseMerge(IndexMergerV9.java:972) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.persist(IndexMergerV9.java:876) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMerger.persist(IndexMerger.java:225) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:1559) ~[druid-server-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.access$100(AppenderatorImpl.java:120) ~[druid-server-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl$2.call(AppenderatorImpl.java:667) ~[druid-server-30.0.0.jar:30.0.0]
	... 6 more
2024-07-18T07:46:04,661 WARN [task-runner-0-priority-0] org.apache.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - handler[index_parallel_wikipedia-2015-09-12_aalngccn_2024-07-18T07:45:51.005Z] not currently registered, ignoring.
2024-07-18T07:46:04,664 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_parallel_wikipedia-2015-09-12_aalngccn_2024-07-18T07:45:51.005Z",
  "status" : "FAILED",
  "duration" : 5905,
  "errorMsg" : "java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException\n...",
  "location" : {
    "host" : null,
    "port" : -1,
    "tlsPort" : -1
  }
}

@xeniape reported another null pointer exception during the nifi-kafka-druid-earthquake demo test:

2024-07-17T12:03:35,185 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception while running task.
java.lang.RuntimeException: java.lang.NullPointerException
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:619) ~[druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:277) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:153) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478) [druid-indexing-service-30.0.0.jar:30.0.0]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450) [druid-indexing-service-30.0.0.jar:30.0.0]
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
	Suppressed: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException
		at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.persist(StreamAppenderatorDriver.java:248) ~[druid-server-30.0.0.jar:30.0.0]
		at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:790) ~[druid-indexing-service-30.0.0.jar:30.0.0]
		at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:277) [druid-indexing-service-30.0.0.jar:30.0.0]
		at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:153) [druid-indexing-service-30.0.0.jar:30.0.0]
		at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-30.0.0.jar:30.0.0]
		at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478) [druid-indexing-service-30.0.0.jar:30.0.0]
		at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450) [druid-indexing-service-30.0.0.jar:30.0.0]
		at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
		at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
		at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
		at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
		at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
		at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
	Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
		at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:592) ~[guava-32.0.1-jre.jar:?]
		at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:571) ~[guava-32.0.1-jre.jar:?]
		at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:91) ~[guava-32.0.1-jre.jar:?]
		at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.persist(StreamAppenderatorDriver.java:240) ~[druid-server-30.0.0.jar:30.0.0]
		... 12 more
	Caused by: java.lang.NullPointerException
		at java.base/java.util.Objects.requireNonNull(Objects.java:209) ~[?:?]
		at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.add(Object2IntRBTreeMap.java:273) ~[fastutil-core-8.5.13.jar:?]
		at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.put(Object2IntRBTreeMap.java:257) ~[fastutil-core-8.5.13.jar:?]
		at org.apache.druid.segment.SortedDimensionDictionary.<init>(SortedDimensionDictionary.java:45) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.DimensionDictionary.sort(DimensionDictionary.java:265) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.DictionaryEncodedColumnIndexer.sortedLookup(DictionaryEncodedColumnIndexer.java:243) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.DictionaryEncodedColumnIndexer.getActualValue(DictionaryEncodedColumnIndexer.java:250) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:89) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:79) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.DictionaryEncodedColumnMerger.allNull(DictionaryEncodedColumnMerger.java:533) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.DictionaryEncodedColumnMerger.writeMergedValueDictionary(DictionaryEncodedColumnMerger.java:149) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.IndexMergerV9.writeDimValuesAndSetupDimConversion(IndexMergerV9.java:816) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:216) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:1155) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.IndexMergerV9.multiphaseMerge(IndexMergerV9.java:972) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.IndexMergerV9.persist(IndexMergerV9.java:876) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.IndexMerger.persist(IndexMerger.java:225) ~[druid-processing-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.persistHydrant(StreamAppenderator.java:1655) ~[druid-server-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.access$200(StreamAppenderator.java:119) ~[druid-server-30.0.0.jar:30.0.0]
		at org.apache.druid.segment.realtime.appenderator.StreamAppenderator$2.call(StreamAppenderator.java:680) ~[druid-server-30.0.0.jar:30.0.0]
		... 6 more
Caused by: java.lang.NullPointerException
	at java.base/java.util.Objects.requireNonNull(Objects.java:209) ~[?:?]
	at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.add(Object2IntRBTreeMap.java:273) ~[fastutil-core-8.5.13.jar:?]
	at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.put(Object2IntRBTreeMap.java:257) ~[fastutil-core-8.5.13.jar:?]
	at org.apache.druid.segment.SortedDimensionDictionary.<init>(SortedDimensionDictionary.java:45) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DimensionDictionary.sort(DimensionDictionary.java:265) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnIndexer.sortedLookup(DictionaryEncodedColumnIndexer.java:243) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnIndexer.getActualValue(DictionaryEncodedColumnIndexer.java:250) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:89) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:79) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnMerger.allNull(DictionaryEncodedColumnMerger.java:533) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.DictionaryEncodedColumnMerger.writeMergedValueDictionary(DictionaryEncodedColumnMerger.java:149) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.writeDimValuesAndSetupDimConversion(IndexMergerV9.java:816) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:216) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:1155) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.multiphaseMerge(IndexMergerV9.java:972) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMergerV9.persist(IndexMergerV9.java:876) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.IndexMerger.persist(IndexMerger.java:225) ~[druid-processing-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.persistHydrant(StreamAppenderator.java:1655) ~[druid-server-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.access$200(StreamAppenderator.java:119) ~[druid-server-30.0.0.jar:30.0.0]
	at org.apache.druid.segment.realtime.appenderator.StreamAppenderator$2.call(StreamAppenderator.java:680) ~[druid-server-30.0.0.jar:30.0.0]
	... 6 more

and failing logs

2024-07-17T14:43:01,124 ERROR [[index_kafka_earthquakes_c6d22f2137390e3_kmennhje]-appenderator-persist] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Persist failed, dying
2024-07-17T14:43:01,131 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception in run() before persisting.

whereas a working tasks persists:

2024-07-18T06:04:36,902 INFO [[index_kafka_earthquakes_6c6ac108e1afd93_oeciigko]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data for segment[earthquakes_1960-01-01T00:00:00.000Z_1961-01-01T00:00:00.000Z_2024-07-17T10:00:10.461Z] spill[0] to disk in [137] ms (1,513 rows).
2024-07-18T06:04:37,038 INFO [[index_kafka_earthquakes_6c6ac108e1afd93_oeciigko]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data for segment[earthquakes_1980-01-01T00:00:00.000Z_1981-01-01T00:00:00.000Z_2024-07-17T10:00:24.050Z] spill[0] to disk in [66] ms (2,297 rows).
2024-07-18T06:04:37,378 INFO [[index_kafka_earthquakes_6c6ac108e1afd93_oeciigko]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data for segment[earthquakes_1979-01-01T00:00:00.000Z_1980-01-01T00:00:00.000Z_2024-07-17T10:00:23.160Z] spill[0] to disk in [337] ms (18,953 rows).

@xeniape also found out that unit tests fail when building 30.0.0 from source

3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesForceVirtualContextCannotVectorize
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesLong
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesMultiValueString
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesMultiValueStringNotDictionaryEncoded
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesMultiValueStringUnknown
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesSingleValueString
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesSingleValueStringNotDictionaryEncoded
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesTrueVirtualContextDefault
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.junit.runner.manipulation.Filter.null
3638.1 [ERROR]   Run 1: Filter ?  No tests found matching Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testNumericEvolutionTimeseriesAggregation]/[test:%5B1%5D false (testNumericEvolutionTimeseriesAggregation)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testHyperUniqueEvolutionTimeseries]/[test:%5B1%5D false (testHyperUniqueEvolutionTimeseries)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testHyperUniqueEvolutionTimeseries]/[test:%5B0%5D true (testHyperUniqueEvolutionTimeseries)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testNumericEvolutionFiltering]/[test:%5B0%5D true (testNumericEvolutionFiltering)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testNumericEvolutionFiltering]/[test:%5B1%5D false (testNumericEvolutionFiltering)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testNumericEvolutionTimeseriesAggregation]/[test:%5B0%5D true (testNumericEvolutionTimeseriesAggregation)] from org.junit.vintage.engine.descriptor.RunnerRequest@230b4acb
3638.1 [ERROR]   Run 2: Filter ?  No tests found matching Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:initializationError(org.junit.runner.manipulation.Filter)] from org.junit.vintage.engine.descriptor.RunnerRequest@37fd1b30
3638.1 [ERROR]   Run 3: Filter ?  No tests found matching Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:initializationError(org.junit.runner.manipulation.Filter)] from org.junit.vintage.engine.descriptor.RunnerRequest@6b1c6d39
3638.1 [INFO] 
3638.1 [INFO] 
3638.1 [ERROR] Tests run: 252314, Failures: 83, Errors: 82807, Skipped: 17828
3638.1 [INFO] 
3638.1 [INFO] ------------------------------------------------------------------------
3638.1 [INFO] Reactor Summary for Druid 30.0.0:
3638.1 [INFO] 
3638.1 [INFO] Druid .............................................. SUCCESS [01:55 min]
3638.1 [INFO] druid-processing ................................... FAILURE [58:09 min]
3638.1 [INFO] druid-aws-common ................................... SKIPPED
3638.1 [INFO] druid-gcp-common ................................... SKIPPED

Tests from source with our mvn profile / command:

[INFO] Running org.apache.druid.query.timeboundary.TimeBoundaryQueryRunnerTest
[INFO] Tests run: 72, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.099 s -- in org.apache.druid.query.timeboundary.TimeBoundaryQueryRunnerTest

and failure for our patched builds:

[ERROR] org.apache.druid.segment.ConciseBitmapIndexMergerV9Test.testPersistNullColumnSkipping[100: metric compression=none, dimension compression=lzf, long encoding=longs, segment write-out medium=org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory@62c229bb] -- Time elapsed: 0.001 s <<< ERROR!
#43 653.2 java.lang.NullPointerException
#43 653.2 	at java.base/java.util.Objects.requireNonNull(Objects.java:209)
#43 653.2 	at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.add(Object2IntRBTreeMap.java:273)
#43 653.2 	at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.put(Object2IntRBTreeMap.java:257)
#43 653.2 	at org.apache.druid.segment.SortedDimensionDictionary.<init>(SortedDimensionDictionary.java:45)
#43 653.2 	at org.apache.druid.segment.DimensionDictionary.sort(DimensionDictionary.java:265)
#43 653.2 	at org.apache.druid.segment.DictionaryEncodedColumnIndexer.sortedLookup(DictionaryEncodedColumnIndexer.java:243)
#43 653.2 	at org.apache.druid.segment.DictionaryEncodedColumnIndexer.getActualValue(DictionaryEncodedColumnIndexer.java:250)
#43 653.2 	at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:89)
#43 653.2 	at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:79)
#43 653.2 	at org.apache.druid.segment.DictionaryEncodedColumnMerger.allNull(DictionaryEncodedColumnMerger.java:533)
#43 653.2 	at org.apache.druid.segment.DictionaryEncodedColumnMerger.writeMergedValueDictionary(DictionaryEncodedColumnMerger.java:149)
#43 653.2 	at org.apache.druid.segment.IndexMergerV9.writeDimValuesAndSetupDimConversion(IndexMergerV9.java:816)
#43 653.2 	at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:216)
#43 653.2 	at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:1155)
#43 653.2 	at org.apache.druid.segment.IndexMergerV9.multiphaseMerge(IndexMergerV9.java:972)
#43 653.2 	at org.apache.druid.segment.IndexMergerV9.persist(IndexMergerV9.java:876)
#43 653.2 	at org.apache.druid.segment.IndexMerger.persist(IndexMerger.java:203)
#43 653.2 	at org.apache.druid.segment.IndexMergerTestBase.testPersistNullColumnSkipping(IndexMergerTestBase.java:1736)
#43 653.2 	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
#43 653.2 	at org.apache.druid.segment.CloserRule$1.evaluate(CloserRule.java:52)

The ingestion tasks seem very flaky, although more on the failing side.
Ingesting a simple CSV file sometimes work and sometimes doesnt.

The problem is around the segments and persting data and should be further researched to support Druid 30.0.0

I tried remote debugging but did not spent too much time on it. Just add the line

        -agentlib:jdwp=transport=dt_socket,address=*:5005,server=y,suspend=y

to the jvm config and connect to localhost on port 5005 to debug the process.

@labrenbe found an unrelated error in OIDC callbacks:

2024-08-14T15:01:43,208 WARN [qtp1268743900-74] org.eclipse.jetty.server.HttpChannel - /druid-ext/druid-pac4j/callback
java.lang.NoSuchMethodError: 'net.minidev.json.JSONObject com.nimbusds.jwt.JWTClaimsSet.toJSONObject()'
    at com.nimbusds.openid.connect.sdk.claims.IDTokenClaimsSet.<init>(IDTokenClaimsSet.java:238) ~[?:?]
    at com.nimbusds.openid.connect.sdk.validators.IDTokenValidator.toIDTokenClaimsSet(IDTokenValidator.java:339) ~[?:?]
    at com.nimbusds.openid.connect.sdk.validators.IDTokenValidator.validate(IDTokenValidator.java:289) ~[?:?]
    at com.nimbusds.openid.connect.sdk.validators.IDTokenValidator.validate(IDTokenValidator.java:224) ~[?:?]
    at org.pac4j.oidc.profile.creator.TokenValidator.validate(TokenValidator.java:103) ~[?:?]
    at org.pac4j.oidc.profile.creator.OidcProfileCreator.create(OidcProfileCreator.java:93) ~[?:?]
    at org.pac4j.oidc.profile.creator.OidcProfileCreator.create(OidcProfileCreator.java:45) ~[?:?]
    at org.pac4j.core.client.BaseClient.retrieveUserProfile(BaseClient.java:119) ~[?:?]
    at org.pac4j.core.client.BaseClient.getUserProfile(BaseClient.java:99) ~[?:?]
    at org.pac4j.core.engine.DefaultCallbackLogic.perform(DefaultCallbackLogic.java:88) ~[?:?]
    at org.apache.druid.security.pac4j.Pac4jFilter.doFilter(Pac4jFilter.java:92) ~[?:?]
    at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59) ~[druid-server-30.0.0.jar:30.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.apache.druid.security.basic.authentication.BasicHTTPAuthenticator$BasicHTTPAuthenticationFilter.doFilter(BasicHTTPAuthenticator.java:172) ~[?:?]
    at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59) ~[druid-server-30.0.0.jar:30.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:77) ~[druid-server-30.0.0.jar:30.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) [jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]

Related issues (very old):
apache/druid#7662
apache/druid#8835

@sbernauer sbernauer moved this from Next to Ready for Development in Stackable Engineering Aug 7, 2024
@labrenbe labrenbe self-assigned this Aug 7, 2024
@labrenbe labrenbe moved this from Development: In Progress to Development: Waiting for Review in Stackable Engineering Aug 14, 2024
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Aug 15, 2024
@sbernauer
Copy link
Member

Moving back to in progress, as we want to create one upstream Druid issues for the nimbus-jose-jwt problem

@sbernauer sbernauer moved this from Development: Done to Development: In Progress in Stackable Engineering Aug 19, 2024
@labrenbe
Copy link
Member

Created an upstream issue: apache/druid#16920

@labrenbe labrenbe moved this from Development: In Progress to Development: Done in Stackable Engineering Aug 19, 2024
@lfrancke
Copy link
Member

lfrancke commented Sep 4, 2024

Thanks. This means we can move to Druid 30 in 24.11?

@sbernauer
Copy link
Member

I'd say yes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants