-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health check all of the 3rd party services #1469
Comments
Work completed:created a monitoring package that has healthcheck classes to ping each service; investigationnow we have a class for each service but i don't know how exactly will create the instance of those classes, the idea is we can't only have a method that pings the urls, some of the services will pinged through http request, polkadot version method, and rmb ping method, so will handle the implementation of them later |
Work completed: import { servicesLiveChecker } from "../src/index";
async function HealthCheck() {
try {
console.log(await servicesLiveChecker("fakeURL", "", "wss://tfchain.dev.grid.tf/ws", "wss://relay.dev.grid.tf"));
process.exit(0);
} catch (err) {
console.log(err);
}
}
HealthCheck(); and the output: 2024-01-21 19:35:09 API/INIT: RPC methods not decorated: transaction_unstable_submitAndWatch, transaction_unstable_unwatch
2024-01-21 19:35:09 API/INIT: Not decorating runtime apis without matching versions: TransactionPaymentApi/4 (1 known), Metadata/2 (1 known)
2024-01-21 19:35:10 API/INIT: RPC methods not decorated: transaction_unstable_submitAndWatch, transaction_unstable_unwatch
2024-01-21 19:35:10 API/INIT: Not decorating runtime apis without matching versions: TransactionPaymentApi/4 (1 known), Metadata/2 (1 known)
disconnecting
{ GraphQl: 'Down', TFChain: 'Alive', RMB: 'Alive' } will enhance the code and provide a way to check the life of custom service if needed,
|
It should be more like
|
the mentioned script is totally abstract one;
export async function servicesLiveChecker(
GraphQlURL?: string,
GridProxyURL?: string,
TFChainURL?: string,
RMBrelayURL?: string,
) {
const LIVENESS = {};
if (GraphQlURL) {
LIVENESS["GraphQl"] = (await HealthChecker(new GraphQlHealthCheck(GridProxyURL))) ? "Alive" : `Down`;
}
if (GridProxyURL) {
LIVENESS["GridProxy"] = (await HealthChecker(new GridProxyHealthCheck(GridProxyURL))) ? "Alive" : `Down`;
}
if (TFChainURL) {
LIVENESS["TFChain"] = (await HealthChecker(new TFChainHealthCheck(TFChainURL))) ? "Alive" : "Down";
}
if (RMBrelayURL && TFChainURL) {
LIVENESS["RMB"] = (await HealthChecker(new RMBHealthCheck(RMBrelayURL, TFChainURL, "sr25519"))) ? "Alive" : "Down";
}
return LIVENESS;
}
async function HealthChecker(HealthChecker: ILivenessChecker, retries = 2) {
let alive = false;
while (!alive && retries > 0) {
alive = await HealthChecker.LiveChecker();
retries--;
}
if ("disconnectHandler" in HealthChecker) HealthChecker.disconnectHandler();
return alive;
} |
updated version of the monitoring function : const HEALTH_CHECK_INTERVAL = 5000;
const MAX_RETRIES = 2;
export async function TFServicesLiveMonitor(services: TFServices, interval = HEALTH_CHECK_INTERVAL): Promise<void> {
const serviceArray: IServiceMonitor[] = initializeServices(services);
// eslint-disable-next-line no-constant-condition
// making sure we have at least one service to monitor
while (serviceArray.length) {
for (const service of serviceArray) {
await liveChecker(service);
}
await new Promise(resolve => setTimeout(resolve, interval));
}
}
export function initializeServices(services: TFServices): IServiceMonitor[] {
const serviceArray: IServiceMonitor[] = [];
for (const serviceName in services) {
switch (serviceName) {
case "graphQL":
serviceArray.push(new GraphQlMonitor(services.graphQL.LivenessURL));
break;
case "gridProxy":
serviceArray.push(new GridProxyMonitor(services.gridProxy.LivenessURL));
break;
case "tfChain":
serviceArray.push(new TFChainMonitor(services.tfChain.LivenessURL));
break;
case "rmb":
serviceArray.push(new RMBMonitor(services.rmb.LivenessURL, services?.tfChain.LivenessURL, "sr25519"));
break;
default:
console.warn(`Unknown service: ${serviceName}`);
break;
}
}
return serviceArray;
}
async function liveChecker(liveChecker: IServiceMonitor, retries = MAX_RETRIES): Promise<ServiceStatus> {
let alive = false;
while (!alive && retries > 0) {
alive = await liveChecker.LiveChecker();
retries--;
}
return { [liveChecker.ServiceName]: alive ? "Alive" : "Down" };
} trying to update the code to add the ability of check service aliveness , and the healthy status |
Wrok completed
Work in Progress (WIP):refactor events handler Investigation and Solution:
|
Work completed: |
To increase user satisfaction, we should implement basic healthchecks that periodically check the performance of all third-party services we rely on, such as RMB, TFchain, GridProxy, GraphQL, and others. These health checks should be executed at least once a minute? the intervals can be rethought , and any detected performance degradation should be promptly displayed in the user interface and trigger notifications to alert users and system administrators. By proactively monitoring and addressing performance issues, we can minimize disruptions and provide a seamless user experience.
Note: the alerting of the system administrators is going to be addressed in another issue
The text was updated successfully, but these errors were encountered: