Kafka microservice health check does not work #1690

jirawat-tokenx · 2022-02-20T03:06:47Z

Is there an existing issue for this?

I have searched the existing issues

Current behavior

Configuring a microservice ping check, like:

       this.kafka.pingCheck('KAFKA', {
          name: configService.getKafkaClientName(),
          transport: Transport.KAFKA,
          options: {
            client: {
              brokers: configService.getKafkaHost(),
            },
            consumer: {
              groupId: configService.getKafkaGroupId(),
            },
          },
        });

The check will alway return "timeout of x ms exceeded" whatever how many timeout will be set.

But my service still working well with the same Microservice config.

Minimum reproduction code

       this.kafka.pingCheck('KAFKA', {
          name: configService.getKafkaClientName(),
          transport: Transport.KAFKA,
          options: {
            client: {
              brokers: configService.getKafkaHost(),
            },
            consumer: {
              groupId: configService.getKafkaGroupId(),
            },
          },
        });

Steps to reproduce

No response

Expected behavior

Health check should be return UP status when the kafka is able to connect

Package version

8.0.4

NestJS version

8.2.6

Node.js version

14.17.3

In which operating systems have you tested?

macOS
Windows
Linux

Other

No response

The text was updated successfully, but these errors were encountered:

jirawat-tokenx · 2022-02-26T06:35:05Z

Hi if there is anything I can provide for this issue please let me know

Tony133 · 2022-02-27T22:53:14Z

Hi @jirawat-tokenx, I advise you to leave a minimal reproduction of a clonable git repository so that the core team can evaluate the problem you have reported.

fer8a · 2022-03-25T14:52:13Z

The check will always return "timeout of x ms exceeded" whatever how many timeout will be set.

This is also the case for me out of the box, however, I can get a successful response by increasing the timeout directly in the options (I can't see this in the options you shared @jirawat-tokenx)

This is what I'm doing to make it work.

const kafkaOptions = {
    transport: kafkaConfig.transport,
    options: kafkaConfig.options,
    timeout: 10000,
};
    
this.microservice.pingCheck<KafkaOptions>('Kafka', kafkaOptions)

That being said, the response time is unmanageable. A ping check that is taking ~10 seconds to give back a response is absolutely way too much. Is that something that can be looked at @Tony133?

For context, DB and HTTP health indicators that I'm using (from this same package) have a response time of ~1 second.

Tony133 · 2022-03-25T22:18:38Z

Hi @fer8a, i think maybe that's something that can be improved, however if you can create a minimal reproduction in a clonable git repository the Nest Core Team can better assess the issue.

fer8a · 2022-03-30T21:07:52Z

Hello @Tony133 sorry for the late reply. It's been some busy days.

Take a look at the following respository
It's a simple example exposing the health indicator at the base route /.
You can use it for the tests.

Tony133 · 2022-03-31T17:23:26Z

Hi all, I tried @fer8a minimum reproduction, I did some tests setting the timeout up to 5140 gives the following error (see screenshot):

Log terminal:

while setting the timeout on 5200 and increasing it, the status check works correctly (see screenshot):

Log Terminal:

This is a brief summary of the test I took.
You need to understand if this is a specific problem that affects only the @nestjs/microservices package or if it also affects the @nestjs/terminus package as a whole. Since internally for microservices terminus uses the @nestjs/microservices package.

P.s. For the Kafka service I used a docker container.

ohudenkoongage · 2022-07-01T07:39:37Z

Hi everyone. I found that kafka pingCheck works if added producerOnlyMode: true flag. Think, for those who're looking for a solution, it could be helpful.

Example:

this.microservice.pingCheck('kafka', {
          transport: Transport.KAFKA,
          options: {
            ...this.configService.get('kafka.options'),
            producerOnlyMode: true,
      },
}),

jlahtinen · 2022-10-26T18:16:02Z

@Tony133 If you have time, please check https://github.com/jlahtinen/nestjskafkaterminus

I think this is about as simple as it can get to reproduce described problem. I have not idea how to fix this.

If producerOnlyMode: true is the fix, it should be documented and make clear are the consequences using it (for me all seems to be working with that on localhost, but on k8 all starts to fail again like without producerOnlyMode).

Tony133 · 2022-10-28T19:55:47Z

Hi @jlahtinen, I tried your minimal reproduction in a local environment with docker and managed to get it working, but I had to increase the timeout to 9000ms or more, using NestJS v9.x and Terminus v9.x compared to my message I had posted above with NestJS v8.x and Terminus v8.x.

I attach here two screenshots:

I hope I have been of some help

jlahtinen · 2022-11-01T05:46:31Z

@Tony133 Thank you. Works as you described.

Ugzuzg · 2023-01-10T10:39:29Z

A big problem with this is that the check creates a new connection, causing Kafka to rebalance the consumers, then closes it, which triggers another rebalance.

I think, it's an issue with the approach itself: instead of checking if the existing connection is healthy, the indicator checks if Kafka is connectable, so we don't really know if the microservice itself is still connected.

EugeneKorshenko · 2023-01-24T19:25:14Z

Yeah, I agree that it's not actually terminus issue. It's natural state of things for Kafka.
If you want to check both producer and consumer connections you have to take into account that it takes some time to join consumer group and make consumer connection stable.
But you can just use producerOnlyMode to significantly speed up things (as suggested above). It should give you a necessary amount of confidence that your microservice is able to reach Kafka cluster.

@Tony133 I'd probably consider introducing default config for Kafka transport consumer config with producerOnlyMode set to true.
Or, if we don't want to mutate health check config from inside of the library we could think of adding warning note to the documentation.

resolves: #1690

resolves #1690

BrunnerLivio · 2023-09-14T21:02:43Z

producerOnlyMode will be set per default to true now. We've also added additional e2e tests to ensure this behavior.
If you already have producerOnlyMode set to false it will overwrite the default behavior.

Released with v10.1.0 🎉

jirawat-tokenx added the type: bug label Feb 20, 2022

BrunnerLivio closed this as completed in 3e44d7a Sep 14, 2023

BrunnerLivio added a commit that referenced this issue Sep 14, 2023

fix(microservice): connect to kafka with producerOnlyMode per default

dba0042

resolves: #1690

BrunnerLivio added a commit that referenced this issue Sep 14, 2023

fix(microservice): connect to kafka with producerOnlyMode per default

c6d8f7c

resolves #1690

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka microservice health check does not work #1690

Kafka microservice health check does not work #1690

jirawat-tokenx commented Feb 20, 2022 •

edited

Loading

jirawat-tokenx commented Feb 26, 2022

Tony133 commented Feb 27, 2022

fer8a commented Mar 25, 2022 •

edited

Loading

Tony133 commented Mar 25, 2022 •

edited

Loading

fer8a commented Mar 30, 2022

Tony133 commented Mar 31, 2022

ohudenkoongage commented Jul 1, 2022

jlahtinen commented Oct 26, 2022

Tony133 commented Oct 28, 2022

jlahtinen commented Nov 1, 2022

Ugzuzg commented Jan 10, 2023

EugeneKorshenko commented Jan 24, 2023

BrunnerLivio commented Sep 14, 2023

Kafka microservice health check does not work #1690

Kafka microservice health check does not work #1690

Comments

jirawat-tokenx commented Feb 20, 2022 • edited Loading

Is there an existing issue for this?

Current behavior

Minimum reproduction code

Steps to reproduce

Expected behavior

Package version

NestJS version

Node.js version

In which operating systems have you tested?

Other

jirawat-tokenx commented Feb 26, 2022

Tony133 commented Feb 27, 2022

fer8a commented Mar 25, 2022 • edited Loading

Tony133 commented Mar 25, 2022 • edited Loading

fer8a commented Mar 30, 2022

Tony133 commented Mar 31, 2022

ohudenkoongage commented Jul 1, 2022

jlahtinen commented Oct 26, 2022

Tony133 commented Oct 28, 2022

jlahtinen commented Nov 1, 2022

Ugzuzg commented Jan 10, 2023

EugeneKorshenko commented Jan 24, 2023

BrunnerLivio commented Sep 14, 2023

jirawat-tokenx commented Feb 20, 2022 •

edited

Loading

fer8a commented Mar 25, 2022 •

edited

Loading

Tony133 commented Mar 25, 2022 •

edited

Loading