Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when reading the service instance - SW Health Indicator #4906

Conversation

djabarovgeorge
Copy link
Contributor

What change does this PR introduce?

Reproduction Steps

There is a race condition happening in the WS service when reading the server instance sockets. The error message that was noticed has been like Cannot read proprties of sockets of null.

The code part where it is happening:

private async connectionExist(command: ExternalServicesRouteCommand) {
  return !!(await this.wsGateway.server.sockets.in(command.userId).fetchSockets()).length;
}

The issue should be reproducible when spinning the service (or a few instances).

Why was this change needed?

Will validate the WS server health.

Other information (Screenshots)

Copy link

linear bot commented Nov 28, 2023

NV-3202 Race condition when reading the service instance - SW Health Indicator

Reproduction Steps

There is a race condition happening in the WS service when reading the server instance sockets. The error message that was noticed has been like Cannot read proprties of sockets of null.

The code part where it is happening:

private async connectionExist(command: ExternalServicesRouteCommand) {
  return !!(await this.wsGateway.server.sockets.in(command.userId).fetchSockets()).length;
}

The issue should be reproducible when spinning the service (or a few instances).

Expected Behaviour

  • We should have a check for server readiness and only enable workers once the server has bootstraped. - will be handled in NV-3206 as Pablo suggested.
  • Will validate the WS server health.

@@ -18,6 +20,7 @@ export class HealthController {
const result = await this.healthCheckService.check([
async () => this.dalHealthIndicator.isHealthy(),
async () => this.webSocketsQueueHealthIndicator.isHealthy(),
async () => this.wsHealthIndicator.isHealthy(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we will check if the WS server is up.

Comment on lines +18 to +32

if (!isOnline) {
return;
}

if (command.event === WebSocketEventEnum.RECEIVED) {
await this.processReceivedEvent(command);
}

if (command.event === WebSocketEventEnum.UNSEEN) {
await this.sendUnseenCountChange(command);
}

if (command.event === WebSocketEventEnum.UNREAD) {
await this.sendUnreadCountChange(command);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small refactor, not related to the PR

@@ -127,7 +130,13 @@ export class ExternalServicesRoute {
}
}

private async connectionExist(command: ExternalServicesRouteCommand) {
private async connectionExist(command: ExternalServicesRouteCommand): Promise<boolean | undefined> {
if (!this.wsGateway.server) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the server is not initialized we want to log it.

@WebSocketGateway()
export class WSGateway implements OnGatewayConnection, OnGatewayDisconnect {
constructor(private jwtService: JwtService, private subscriberOnlineService: SubscriberOnlineService) {}

@WebSocketServer()
server: Server;
server: Server | null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we know that there is an edge case where the Server can be nullish.

Comment on lines +120 to +121
Logger.error('No sw server available to send message', LOG_CONTEXT);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the server is not initialized we want to log it.

import { WSGateway } from '../ws.gateway';

@Injectable()
export class WSHealthIndicator extends HealthIndicator implements IHealthIndicator {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rename in the upcoming PR's in order to make clear what the health indicator is responsible for.

Suggested change
export class WSHealthIndicator extends HealthIndicator implements IHealthIndicator {
export class WSServerHealthIndicator extends HealthIndicator implements IHealthIndicator {

djabarovgeorge and others added 2 commits November 29, 2023 10:07
…n-when-reading-the-service-instance-sw-health-indicator

# Conflicts:
#	apps/ws/src/health/health.module.ts
#	packages/application-generic/src/health/index.ts
Copy link
Contributor

@LetItRock LetItRock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@djabarovgeorge djabarovgeorge merged commit 5f68810 into next Dec 4, 2023
24 of 26 checks passed
@djabarovgeorge djabarovgeorge deleted the nv-3202-race-condition-when-reading-the-service-instance-sw-health-indicator branch December 4, 2023 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants