-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#175014199] Health checks #90
Conversation
Affected stories
Generated by 🚫 dangerJS |
Codecov Report
@@ Coverage Diff @@
## master #90 +/- ##
==========================================
- Coverage 83.94% 82.77% -1.18%
==========================================
Files 44 47 +3
Lines 1489 1556 +67
Branches 124 127 +3
==========================================
+ Hits 1250 1288 +38
- Misses 234 263 +29
Partials 5 5
Continue to review full report at Codecov.
|
.fold<IResponseSuccessJson<IInfo> | IResponseErrorInternal>( | ||
problems => ResponseErrorInternal(problems.join("\n")), | ||
_ => | ||
ResponseSuccessJson({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be useful to return all the statuses even in case of success (to build a status dashboard of all services). we should think about some common vocabulary for status checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this: https://tools.ietf.org/html/draft-inadarei-api-health-check-04
(but simpler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return all the statuses even in case of success
What d'you mean? We are checking only the current application status - at most, if it can reach an external resources. So I think we just have a single pass/fail status
common vocabulary for status checks.
I agree indeed.
something like this: https://tools.ietf.org/html/draft-inadarei-api-health-check-04
I like this approach, but I notice that the above is a draft. Meanwhile, we are using a problem+json
which is already a standard: https://tools.ietf.org/html/rfc7807.
As for now, we need:
- a machine-readable encoding of the service pass/fail status
|> we use http status code for that - a human-readable report of failures, for throubleshooting
|> we userdetail
section of theProblemJson
schema for that
We can model further and define a proper encoding for failure reports, to make them machine-readable. However, I see no use for that so far.
utils/config.ts
Outdated
@@ -0,0 +1,97 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you point this PR versus the other one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both this PR and #89 have just one file in common, but very different impact. I'd rather handle them separately and take myself the burden of keeping the lone file in sync.
utils/healthcheck.ts
Outdated
/** | ||
* Check the application can connect to an Azure CosmosDb instances | ||
* | ||
* @param dbUri uri of the database | ||
* @param dbUri connection string for the storage | ||
* | ||
* @returns either true or an array of error messages | ||
*/ | ||
export const checkAzureCosmosDbHealth = ( | ||
dbUri: string, | ||
dbKey?: string | ||
): HealthCheck<true> => | ||
tryCatch(() => { | ||
const client = new CosmosClient({ | ||
endpoint: dbUri, | ||
key: dbKey | ||
}); | ||
return client.getDatabaseAccount(); | ||
}, toHealthProblems).map(_ => true); | ||
|
||
/** | ||
* Check the application can connect to an Azure Storage | ||
* | ||
* @param connStr connection string for the storage | ||
* | ||
* @returns either true or an array of error messages | ||
*/ | ||
export const checkAzureStorageHealth = (connStr: string): HealthCheck => | ||
tryCatch( | ||
() => | ||
new Promise<azurestorageCommon.models.ServiceStats>((resolve, reject) => | ||
createBlobService(connStr).getServiceStats((err, result) => | ||
err ? reject(err) : resolve(result) | ||
) | ||
), | ||
toHealthProblems | ||
).map(_ => true); | ||
|
||
/** | ||
* Check a url is reachable | ||
* | ||
* @param url url to connect with | ||
* | ||
* @returns either true or an array of error messages | ||
*/ | ||
export const checkUrlHealth = (_: string): HealthCheck => | ||
// TODO: implement this check | ||
taskEither.of(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions may be moved into io-functions-commons
(or even io-ts-commons)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functions-commons is ok (ts-commons is used by the app as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, apart form minor discussion
utils/config.ts
Outdated
COSMOSDB_KEY: NonEmptyString, | ||
COSMOSDB_NAME: NonEmptyString, | ||
COSMOSDB_URI: NonEmptyString, | ||
|
||
SERVICE_PRINCIPAL_CLIENT_ID: NonEmptyString, | ||
SERVICE_PRINCIPAL_SECRET: NonEmptyString, | ||
SERVICE_PRINCIPAL_TENANT_ID: NonEmptyString, | ||
|
||
AZURE_APIM: NonEmptyString, | ||
AZURE_APIM_HOST: NonEmptyString, | ||
AZURE_APIM_RESOURCE_GROUP: NonEmptyString, | ||
AZURE_SUBSCRIPTION_ID: NonEmptyString, | ||
|
||
ADB2C_CLIENT_ID: NonEmptyString, | ||
ADB2C_CLIENT_KEY: NonEmptyString, | ||
ADB2C_TENANT_ID: NonEmptyString, | ||
|
||
UserDataBackupStorageConnection: NonEmptyString, | ||
|
||
MESSAGE_CONTAINER_NAME: NonEmptyString, | ||
USER_DATA_BACKUP_CONTAINER_NAME: NonEmptyString, | ||
USER_DATA_CONTAINER_NAME: NonEmptyString, | ||
|
||
StorageConnection: NonEmptyString, | ||
SubscriptionFeedStorageConnection: NonEmptyString, | ||
UserDataArchiveStorageConnection: NonEmptyString, | ||
|
||
PUBLIC_API_KEY: NonEmptyString, | ||
PUBLIC_API_URL: NonEmptyString, | ||
|
||
PUBLIC_DOWNLOAD_BASE_URL: NonEmptyString, | ||
|
||
SESSION_API_KEY: NonEmptyString, | ||
SESSION_API_URL: NonEmptyString, | ||
|
||
LOGOS_URL: NonEmptyString, | ||
|
||
SUBSCRIPTIONS_FEED_TABLE: NonEmptyString, | ||
USER_DATA_DELETE_DELAY_DAYS: NonEmptyString, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to remember that in case of a global env variable creation or deletion or renaming we have to update this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem in creation - you'll be forced to add it to the model. When deleting it is tricky, and we'd must rely on our discipline imho
* | ||
* @returns either true or an array of error messages | ||
*/ | ||
export const checkAzureStorageHealth = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this method checkAzureBlobStorageHealth
? Because only Blob Storage connection check is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This checks the storage resource itself and if the app can correctly connect with it. However I actually found no better solution than try to connect to a blob storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave the scope of the test to be "the whole storage resource" (thus, I'd leave the name) and I'd rather improve the implementation. Suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could extend the check on lower resources es. for Blob check that the containers exist (api), for table storage that all required tables exist and so on. checkAzureStorageHealth
should regroup all these storage health checks. Attention on autogenerated resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check that the containers exist
That is a deeper level of check which will introduce additional considerations, for example many containers are lazily created.
Attention on autogenerated resources.
What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
The same as you for "lazily created containers". They don't need to be checked. Must be checked only required resource for application start and run. We can improve this logic later if hard to implements.
utils/__tests__/config.test.ts
Outdated
it("should decode configuration for sendgrid", () => { | ||
const rawConf = { | ||
MAIL_FROM: aMailFrom, | ||
NODE_ENV: "production", | ||
SENDGRID_API_KEY: "a-sg-key" | ||
}; | ||
const result = MailerConfig.decode(rawConf); | ||
|
||
expectRight(result, value => { | ||
expect(value.SENDGRID_API_KEY).toBe("a-sg-key"); | ||
expect(typeof value.MAILUP_USERNAME).toBe("undefined"); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a duplicate test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do yo mean the whole case or just expect(typeof value.MAILUP_USERNAME).toBe("undefined");
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole test is exactly the same as the previous one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, didn't notice that. Good catch!
utils/__tests__/config.test.ts
Outdated
// check types | ||
const _: NonEmptyString = value.MAILUP_SECRET; | ||
const __: NonEmptyString = value.MAILUP_USERNAME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a canary to catch possible problems when/if we'll edit the config model. I found that several times the programmatic check doesn't ensure the correct type inference (hence, you have green tests but the project will fail to build). Not exhaustive, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand. At runtime we have IConfig.decode({ ...process.env, isProduction: process.env.NODE_ENV === "production" });
to check the correct type of required values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need a type check inside the tests is better to create an explicit type
const MailUpConfig = t.interface({
MAILHOG_HOSTNAME: t.undefined,
MAILUP_SECRET: NonEmptyString,
MAILUP_USERNAME: NonEmptyString,
MAIL_TRANSPORTS: t.undefined,
NODE_ENV: t.literal("production"),
SENDGRID_API_KEY: t.undefined
})
And check the response. We can use these types even inside the logic with MailUpConfig.is(config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the check is all about "let me se if the interface I designed actually results in the correct type once I exit the io-ts world".
In the case above, it happened that MAILUP_USERNAME
could not be correctly narrowed to NonEmptyString
but to string | NonEmptyString
, although the interface was programmatically good.
Maybe I could do NonEmptyString.is(value.MAILUP_USERNAME)
return () => | ||
healthCheck | ||
.fold<IResponseSuccessJson<IInfo> | IResponseErrorInternal>( | ||
problems => ResponseErrorInternal(problems.join("\n\n")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure than any critical information come from healthCheck
is filtered before sending it through the public API Info
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. For the case I tested I saw no problems (resource unavailable, wrong credentials). Still it's important to point out, thanks.
Do you see any case which can be problematic? Please also consider that, for what concerns to functions, the endpoint is unauthenticated but not public, it's reachable only from our private network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could anyway return a generic Error message of what type of error occurs problems.map(_ => _.__source);
and logs the complete error for detail. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm as soon as it'll be easy to troubleshoot. @gunzip thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
utils/__tests__/config.test.ts
Outdated
const expectRight = <L, R>(e: Either<L, R>, t: (r: R) => void = noop) => | ||
e.fold( | ||
_ => | ||
fail(`Expecting right, received left. Value: ${JSON.stringify(e.value)}`), | ||
_ => t(_) | ||
); | ||
|
||
const expectLeft = <L, R>(e: Either<L, R>, t: (l: L) => void = noop) => | ||
e.fold( | ||
_ => t(_), | ||
_ => | ||
fail(`Expecting left, received right. Value: ${JSON.stringify(e.value)}`) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice approach. Can we move these methods to a test utility library to share this logic across all the projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to write a jest extension module with custom matchers for io-ts/fp-ts. Maybe one day ;)
There are still quirks with the above functions (failures aren't reported very well)
Info/__tests__/handler.test.ts
Outdated
|
||
describe("InfoHandler", () => { | ||
it("should return an internal error if the application is not healthy", async () => { | ||
const healthCheck: HealthCheck = fromLeft(["failure 1", "failure 2"]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a type error in here, the test fails.
once rebased, is this ready to be merged? |
yes |
Introducing health checks for the application.
utils/healthcheck
module/info
endpoint/info
endpoint to openapi spec