-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition when reading the service instance - SW Health Indicator #4906
Race condition when reading the service instance - SW Health Indicator #4906
Conversation
NV-3202 Race condition when reading the service instance - SW Health Indicator
Reproduction StepsThere is a race condition happening in the WS service when reading the The code part where it is happening:
The issue should be reproducible when spinning the service (or a few instances). Expected Behaviour
|
@@ -18,6 +20,7 @@ export class HealthController { | |||
const result = await this.healthCheckService.check([ | |||
async () => this.dalHealthIndicator.isHealthy(), | |||
async () => this.webSocketsQueueHealthIndicator.isHealthy(), | |||
async () => this.wsHealthIndicator.isHealthy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we will check if the WS server is up.
|
||
if (!isOnline) { | ||
return; | ||
} | ||
|
||
if (command.event === WebSocketEventEnum.RECEIVED) { | ||
await this.processReceivedEvent(command); | ||
} | ||
|
||
if (command.event === WebSocketEventEnum.UNSEEN) { | ||
await this.sendUnseenCountChange(command); | ||
} | ||
|
||
if (command.event === WebSocketEventEnum.UNREAD) { | ||
await this.sendUnreadCountChange(command); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small refactor, not related to the PR
@@ -127,7 +130,13 @@ export class ExternalServicesRoute { | |||
} | |||
} | |||
|
|||
private async connectionExist(command: ExternalServicesRouteCommand) { | |||
private async connectionExist(command: ExternalServicesRouteCommand): Promise<boolean | undefined> { | |||
if (!this.wsGateway.server) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the server is not initialized we want to log it.
@WebSocketGateway() | ||
export class WSGateway implements OnGatewayConnection, OnGatewayDisconnect { | ||
constructor(private jwtService: JwtService, private subscriberOnlineService: SubscriberOnlineService) {} | ||
|
||
@WebSocketServer() | ||
server: Server; | ||
server: Server | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we know that there is an edge case where the Server can be nullish.
Logger.error('No sw server available to send message', LOG_CONTEXT); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the server is not initialized we want to log it.
import { WSGateway } from '../ws.gateway'; | ||
|
||
@Injectable() | ||
export class WSHealthIndicator extends HealthIndicator implements IHealthIndicator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will rename in the upcoming PR's in order to make clear what the health indicator is responsible for.
export class WSHealthIndicator extends HealthIndicator implements IHealthIndicator { | |
export class WSServerHealthIndicator extends HealthIndicator implements IHealthIndicator { |
…ice-instance-sw-health-indicator
…n-when-reading-the-service-instance-sw-health-indicator # Conflicts: # apps/ws/src/health/health.module.ts # packages/application-generic/src/health/index.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
What change does this PR introduce?
Reproduction Steps
There is a race condition happening in the WS service when reading the server instance sockets. The error message that was noticed has been like Cannot read proprties of sockets of null.
The code part where it is happening:
The issue should be reproducible when spinning the service (or a few instances).
Why was this change needed?
Will validate the WS server health.
Other information (Screenshots)