[BUG] Seeing some backlog pending on Pulsar UI even if Spark consumer has consumed all data. #176

akshay-habbu · 2024-04-03T14:41:28Z

[ Disclaimer - I am fairly new with Pulsar so I might not understand all the pulsar details but I have been using spark from a while now. ]
I am using Apache Spark consumer for consuming data from Pulsar on AWS EMR. I am using steamnative pulsar-spark connector.
my version stack looks like this
Spark Version- 3.4.1
Pulsar Version- 2.10.0.7
streamnative connector - pulsar-spark-connector_2.12-3.4.0.3.jar

I have created a new pulsar topic and started a fresh spark consumer on that topic, the consumer is able to connect to the topic and consume messages correctly. the only issue I have is with the backlog numbers displayed on the pulsar admin UI.

To Reproduce
Steps to reproduce the behavior:
Create a spark consumer using following code

val spark = SparkSession.builder
  .appName("pulsar_streaming_test_app")
  .enableHiveSupport()
  .getOrCreate()
spark.sparkContext.setLogLevel("WARN")

val optionsMap: mutable.Map[String, String] = mutable.Map[String, String]()
optionsMap.put("service.url", "pulsar://pulsar-service.url:6650")
optionsMap.put("admin.url", "pulsar://pulsar-admin.url:8080")
optionsMap.put("pulsar.producer.batchingEnabled", "false")
optionsMap.put("topic", "topic-name")
optionsMap.put("predefinedSubscription", "existing-subscription-name")
optionsMap.put("subscriptionType", "Exclusive/Shared")
optionsMap.put("startingOffsets", "latest")

val data = spark.readStream.format("pulsar").options(optionsMap).load()

data.writeStream
  .format("parquet")
  .option("checkpointLocation", "checkpoint/path")
  .option("path", "output/path")
  .start()
  .awaitTermination()

Also there is a side problem not very important but seems like spark does not create new subscription on its own, the job keeps on failing with

Caused by: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"Subscription does not exist","reqId":1663032428812969942, "remote":"pulsar-broker-21/172.31.203.70:6650", "local":"/ip:46010"}

The only way I make it work is by creating a subscription manually on pulsar end and using predefinedSubscription option in spark to latch on to that subscription
I tried passing pulsar.reader.subscriptionName, pulsar.consumer.subscriptionName, subscriptionName while running job but it failed with same error.

Any help would be much appreciated.

The text was updated successfully, but these errors were encountered:

nlu90 · 2024-04-12T22:35:28Z

@akshay-habbu

Where do you access the UI? It's not StreamNative Cloud Console.
For the Subscription does not exist issue, it may be caused by some Pulsar side configuration. Specifically, the following one:

 178 # Enable subscription auto creation if new consumer connected (disable auto creation with value false)
 179 allowAutoSubscriptionCreation=true

akshay-habbu · 2024-04-15T06:51:50Z

@nlu90

Thanks for responding.

The UI is not streamnative cloud console, its pulsar admin UI that comes default with pulsar, same backlog is observed on the pulsar metrics as well.

"spark-consumer" : {
      "msgRateOut" : 3872.5419823435427,
      "msgThroughputOut" : 9793786.857122486,
      "bytesOutCounter" : 2836729487,
      "msgOutCounter" : 1079519,
      "msgRateRedeliver" : 0.0,
      "messageAckRate" : 0.0,
      "chunkedMessageRate" : 0,
      "msgBacklog" : 249731,
      "backlogSize" : 0,
      "earliestMsgPublishTimeInBacklog" : 0,
      "msgBacklogNoDelayed" : 249731,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "msgRateExpired" : 0.0,
      "totalMsgExpired" : 0,
      "lastExpireTimestamp" : 0,
      "lastConsumedFlowTimestamp" : 0,
      "lastConsumedTimestamp" : 0,
      "lastAckedTimestamp" : 0,
      "lastMarkDeleteAdvancedTimestamp" : 0,
      "consumers" : [ {
        "msgRateOut" : 3872.5419823435427,
        "msgThroughputOut" : 9793786.857122486,
        "bytesOutCounter" : 62749236,
        "msgOutCounter" : 25000,
        "msgRateRedeliver" : 0.0,
        "messageAckRate" : 0.0,
        "chunkedMessageRate" : 0.0,
        "availablePermits" : 0,
        "unackedMessages" : 0,
        "avgMessagesPerEntry" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "lastAckedTimestamp" : 0,
        "lastConsumedTimestamp" : 0
      } ],
      "isDurable" : true,
      "isReplicated" : false,
      "allowOutOfOrderDelivery" : false,
      "consumersAfterMarkDeletePosition" : { },
      "nonContiguousDeletedMessagesRanges" : 0,
      "nonContiguousDeletedMessagesRangesSerializedSize" : 0,
      "subscriptionProperties" : { },
      "durable" : true,
      "replicated" : false
    },

yes the issue was pulsar configuration which was preventing us from creating a new subscription. Changing the pulsar config for that namespace helped. Thanks

nlu90 · 2024-04-16T04:18:42Z

@akshay-habbu Just FYI, during the spark job execution, it spawns new consumer/reader to consume messages from the last committed position. That's why you may observe some backlog.

Do you see you job is proceeding and the backlog changes after each micro-batch?

akshay-habbu · 2024-04-16T04:28:32Z

@nlu90
Yes the consumer job is progressing just fine, the job is able to process data and write to output stream.
I have tested same on scale and I have seen spark spawning new temporary readers and when the readers come the backlog reduces to ~5k from 100k temporarily and as soon as the reader goes away the backlog jumps back to ~100k
I believe that the consume is on par with topic and running on latest offset but backlog shows higher number for some reason.
Do you all see similar backlogs? Or are there any other configs that seems missing from my end?

nlu90 · 2024-04-16T07:00:22Z

@akshay-habbu We haven't heard any report for this issue from other users for now.

One possible thing is these backlogged subscription are not the one being actively used and probably is the left-over subscriptions from your previous round of test.

akshay-habbu · 2024-04-16T08:57:40Z

I have tried with multiple names and different topic, same behaviour is observed

sbandaru · 2024-09-27T21:12:26Z

@akshay-habbu Hello, have you ever been able to figure out how to reduce backlog? I am seeing exactly the same issue on my end. Also, using "predefinedSubscription" vs auto creation has any impact on the backlog at all in your experience?

akshay-habbu added the type/bug label Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Seeing some backlog pending on Pulsar UI even if Spark consumer has consumed all data. #176

[BUG] Seeing some backlog pending on Pulsar UI even if Spark consumer has consumed all data. #176

akshay-habbu commented Apr 3, 2024 •

edited

Loading

nlu90 commented Apr 12, 2024

akshay-habbu commented Apr 15, 2024

nlu90 commented Apr 16, 2024

akshay-habbu commented Apr 16, 2024 •

edited

Loading

nlu90 commented Apr 16, 2024

akshay-habbu commented Apr 16, 2024

sbandaru commented Sep 27, 2024

[BUG] Seeing some backlog pending on Pulsar UI even if Spark consumer has consumed all data. #176

[BUG] Seeing some backlog pending on Pulsar UI even if Spark consumer has consumed all data. #176

Comments

akshay-habbu commented Apr 3, 2024 • edited Loading

nlu90 commented Apr 12, 2024

akshay-habbu commented Apr 15, 2024

nlu90 commented Apr 16, 2024

akshay-habbu commented Apr 16, 2024 • edited Loading

nlu90 commented Apr 16, 2024

akshay-habbu commented Apr 16, 2024

sbandaru commented Sep 27, 2024

akshay-habbu commented Apr 3, 2024 •

edited

Loading

akshay-habbu commented Apr 16, 2024 •

edited

Loading