[#175014199] Health checks #90

balanza · 2020-09-30T15:53:20Z

Introducing health checks for the application.

a utils/healthcheck module
check for wrong configuration and unreachable resources (cosmosdb, storage, urls)
add healthcheck on /info endpoint
add /info endpoint to openapi spec

pagopa-github-bot · 2020-09-30T15:54:40Z

Affected stories

⚙️ #175014199: Ideare un meccanismo di health check strutturato per le functions

Generated by 🚫 dangerJS

codecov-commenter · 2020-09-30T15:55:24Z

Codecov Report

Merging #90 into master will decrease coverage by 1.17%.
The diff coverage is 54.83%.

@@            Coverage Diff             @@
##           master      #90      +/-   ##
==========================================
- Coverage   83.94%   82.77%   -1.18%     
==========================================
  Files          44       47       +3     
  Lines        1489     1556      +67     
  Branches      124      127       +3     
==========================================
+ Hits         1250     1288      +38     
- Misses        234      263      +29     
  Partials        5        5

Impacted Files	Coverage Δ
utils/healthcheck.ts	`41.02% <41.02%> (ø)`
utils/config.ts	`75.00% <75.00%> (ø)`
Info/handler.ts	`83.33% <85.71%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3918cfd...ba5b0ed. Read the comment docs.

gunzip · 2020-09-30T17:38:34Z

Info/handler.ts

+      .fold<IResponseSuccessJson<IInfo> | IResponseErrorInternal>(
+        problems => ResponseErrorInternal(problems.join("\n")),
+        _ =>
+          ResponseSuccessJson({


that would be useful to return all the statuses even in case of success (to build a status dashboard of all services). we should think about some common vocabulary for status checks.

something like this: https://tools.ietf.org/html/draft-inadarei-api-health-check-04
(but simpler)

return all the statuses even in case of success

What d'you mean? We are checking only the current application status - at most, if it can reach an external resources. So I think we just have a single pass/fail status

common vocabulary for status checks.

I agree indeed.

something like this: https://tools.ietf.org/html/draft-inadarei-api-health-check-04

I like this approach, but I notice that the above is a draft. Meanwhile, we are using a problem+json which is already a standard: https://tools.ietf.org/html/rfc7807.

As for now, we need:

a machine-readable encoding of the service pass/fail status
|> we use http status code for that

a human-readable report of failures, for throubleshooting
|> we user detail section of the ProblemJson schema for that

We can model further and define a proper encoding for failure reports, to make them machine-readable. However, I see no use for that so far.

gunzip · 2020-09-30T17:39:04Z

utils/config.ts

@@ -0,0 +1,97 @@
+/**


can you point this PR versus the other one?

Both this PR and #89 have just one file in common, but very different impact. I'd rather handle them separately and take myself the burden of keeping the lone file in sync.

utils/config.ts

balanza · 2020-10-01T08:50:47Z

utils/healthcheck.ts

+/**
+ * Check the application can connect to an Azure CosmosDb instances
+ *
+ * @param dbUri uri of the database
+ * @param dbUri connection string for the storage
+ *
+ * @returns either true or an array of error messages
+ */
+export const checkAzureCosmosDbHealth = (
+  dbUri: string,
+  dbKey?: string
+): HealthCheck<true> =>
+  tryCatch(() => {
+    const client = new CosmosClient({
+      endpoint: dbUri,
+      key: dbKey
+    });
+    return client.getDatabaseAccount();
+  }, toHealthProblems).map(_ => true);
+
+/**
+ * Check the application can connect to an Azure Storage
+ *
+ * @param connStr connection string for the storage
+ *
+ * @returns either true or an array of error messages
+ */
+export const checkAzureStorageHealth = (connStr: string): HealthCheck =>
+  tryCatch(
+    () =>
+      new Promise<azurestorageCommon.models.ServiceStats>((resolve, reject) =>
+        createBlobService(connStr).getServiceStats((err, result) =>
+          err ? reject(err) : resolve(result)
+        )
+      ),
+    toHealthProblems
+  ).map(_ => true);
+
+/**
+ * Check a url is reachable
+ *
+ * @param url url to connect with
+ *
+ * @returns either true or an array of error messages
+ */
+export const checkUrlHealth = (_: string): HealthCheck =>
+  // TODO: implement this check
+  taskEither.of(true);


These functions may be moved into io-functions-commons (or even io-ts-commons)

functions-commons is ok (ts-commons is used by the app as well)

AleDore

lgtm, apart form minor discussion

AleDore · 2020-10-05T08:22:43Z

utils/config.ts

+    COSMOSDB_KEY: NonEmptyString,
+    COSMOSDB_NAME: NonEmptyString,
+    COSMOSDB_URI: NonEmptyString,
+
+    SERVICE_PRINCIPAL_CLIENT_ID: NonEmptyString,
+    SERVICE_PRINCIPAL_SECRET: NonEmptyString,
+    SERVICE_PRINCIPAL_TENANT_ID: NonEmptyString,
+
+    AZURE_APIM: NonEmptyString,
+    AZURE_APIM_HOST: NonEmptyString,
+    AZURE_APIM_RESOURCE_GROUP: NonEmptyString,
+    AZURE_SUBSCRIPTION_ID: NonEmptyString,
+
+    ADB2C_CLIENT_ID: NonEmptyString,
+    ADB2C_CLIENT_KEY: NonEmptyString,
+    ADB2C_TENANT_ID: NonEmptyString,
+
+    UserDataBackupStorageConnection: NonEmptyString,
+
+    MESSAGE_CONTAINER_NAME: NonEmptyString,
+    USER_DATA_BACKUP_CONTAINER_NAME: NonEmptyString,
+    USER_DATA_CONTAINER_NAME: NonEmptyString,
+
+    StorageConnection: NonEmptyString,
+    SubscriptionFeedStorageConnection: NonEmptyString,
+    UserDataArchiveStorageConnection: NonEmptyString,
+
+    PUBLIC_API_KEY: NonEmptyString,
+    PUBLIC_API_URL: NonEmptyString,
+
+    PUBLIC_DOWNLOAD_BASE_URL: NonEmptyString,
+
+    SESSION_API_KEY: NonEmptyString,
+    SESSION_API_URL: NonEmptyString,
+
+    LOGOS_URL: NonEmptyString,
+
+    SUBSCRIPTIONS_FEED_TABLE: NonEmptyString,
+    USER_DATA_DELETE_DELAY_DAYS: NonEmptyString,


We have to remember that in case of a global env variable creation or deletion or renaming we have to update this

No problem in creation - you'll be forced to add it to the model. When deleting it is tricky, and we'd must rely on our discipline imho

BurnedMarshal · 2020-10-05T07:53:45Z

utils/healthcheck.ts

+ *
+ * @returns either true or an array of error messages
+ */
+export const checkAzureStorageHealth = (


Can we rename this method checkAzureBlobStorageHealth? Because only Blob Storage connection check is executed.

Good point. This checks the storage resource itself and if the app can correctly connect with it. However I actually found no better solution than try to connect to a blob storage.

I'd leave the scope of the test to be "the whole storage resource" (thus, I'd leave the name) and I'd rather improve the implementation. Suggestions?

We could extend the check on lower resources es. for Blob check that the containers exist (api), for table storage that all required tables exist and so on. checkAzureStorageHealth should regroup all these storage health checks. Attention on autogenerated resources.

check that the containers exist

That is a deeper level of check which will introduce additional considerations, for example many containers are lazily created.

Attention on autogenerated resources.

What do you mean?

What do you mean?

The same as you for "lazily created containers". They don't need to be checked. Must be checked only required resource for application start and run. We can improve this logic later if hard to implements.

BurnedMarshal · 2020-10-05T08:26:47Z

utils/__tests__/config.test.ts

+  it("should decode configuration for sendgrid", () => {
+    const rawConf = {
+      MAIL_FROM: aMailFrom,
+      NODE_ENV: "production",
+      SENDGRID_API_KEY: "a-sg-key"
+    };
+    const result = MailerConfig.decode(rawConf);
+
+    expectRight(result, value => {
+      expect(value.SENDGRID_API_KEY).toBe("a-sg-key");
+      expect(typeof value.MAILUP_USERNAME).toBe("undefined");
+    });
+  });


This is a duplicate test

Do yo mean the whole case or just expect(typeof value.MAILUP_USERNAME).toBe("undefined");?

The whole test is exactly the same as the previous one

oops, didn't notice that. Good catch!

BurnedMarshal · 2020-10-05T08:27:49Z

utils/__tests__/config.test.ts

+      // check types
+      const _: NonEmptyString = value.MAILUP_SECRET;
+      const __: NonEmptyString = value.MAILUP_USERNAME;


This can be removed

This is actually a canary to catch possible problems when/if we'll edit the config model. I found that several times the programmatic check doesn't ensure the correct type inference (hence, you have green tests but the project will fail to build). Not exhaustive, though.

Sorry, I don't understand. At runtime we have IConfig.decode({ ...process.env, isProduction: process.env.NODE_ENV === "production" }); to check the correct type of required values.

If you need a type check inside the tests is better to create an explicit type

const MailUpConfig = t.interface({ MAILHOG_HOSTNAME: t.undefined, MAILUP_SECRET: NonEmptyString, MAILUP_USERNAME: NonEmptyString, MAIL_TRANSPORTS: t.undefined, NODE_ENV: t.literal("production"), SENDGRID_API_KEY: t.undefined })

And check the response. We can use these types even inside the logic with MailUpConfig.is(config)

Well, the check is all about "let me se if the interface I designed actually results in the correct type once I exit the io-ts world".
In the case above, it happened that MAILUP_USERNAME could not be correctly narrowed to NonEmptyString but to string | NonEmptyString, although the interface was programmatically good.
Maybe I could do NonEmptyString.is(value.MAILUP_USERNAME)

BurnedMarshal · 2020-10-05T08:38:30Z

Info/handler.ts

+  return () =>
+    healthCheck
+      .fold<IResponseSuccessJson<IInfo> | IResponseErrorInternal>(
+        problems => ResponseErrorInternal(problems.join("\n\n")),


Are we sure than any critical information come from healthCheck is filtered before sending it through the public API Info?

Good point. For the case I tested I saw no problems (resource unavailable, wrong credentials). Still it's important to point out, thanks.

Do you see any case which can be problematic? Please also consider that, for what concerns to functions, the endpoint is unauthenticated but not public, it's reachable only from our private network.

We could anyway return a generic Error message of what type of error occurs problems.map(_ => _.__source); and logs the complete error for detail. What do you think?

lgtm as soon as it'll be easy to troubleshoot. @gunzip thoughts?

BurnedMarshal · 2020-10-05T08:40:16Z

utils/__tests__/config.test.ts

+const expectRight = <L, R>(e: Either<L, R>, t: (r: R) => void = noop) =>
+  e.fold(
+    _ =>
+      fail(`Expecting right, received left. Value: ${JSON.stringify(e.value)}`),
+    _ => t(_)
+  );
+
+const expectLeft = <L, R>(e: Either<L, R>, t: (l: L) => void = noop) =>
+  e.fold(
+    _ => t(_),
+    _ =>
+      fail(`Expecting left, received right. Value: ${JSON.stringify(e.value)}`)
+  );


Nice approach. Can we move these methods to a test utility library to share this logic across all the projects?

I'd like to write a jest extension module with custom matchers for io-ts/fp-ts. Maybe one day ;)
There are still quirks with the above functions (failures aren't reported very well)

BurnedMarshal · 2020-10-05T08:49:26Z

Info/__tests__/handler.test.ts

+
+describe("InfoHandler", () => {
+  it("should return an internal error if the application is not healthy", async () => {
+    const healthCheck: HealthCheck = fromLeft(["failure 1", "failure 2"]);


There is a type error in here, the test fails.

gunzip · 2020-10-06T08:24:25Z

once rebased, is this ready to be merged?

balanza · 2020-10-06T08:47:16Z

once rebased, is this ready to be merged?

yes

balanza added 5 commits September 30, 2020 13:13

add config healthcheck to info endpoint

0f58421

introduce health checks

a2bc002

complete config

76f9bd5

lint fix

832ff21

refactor config

f5e1c75

balanza requested a review from gunzip September 30, 2020 15:53

add info to openapi spec

acb00cf

gunzip reviewed Sep 30, 2020

View reviewed changes

balanza added 2 commits October 1, 2020 10:09

refactor configuration

dcd19fd

refactor config

1116597

balanza commented Oct 1, 2020

View reviewed changes

balanza added 3 commits October 1, 2020 16:04

types for health problems

9292871

add url check

a365426

sync config

8433210

AleDore reviewed Oct 5, 2020

View reviewed changes

BurnedMarshal reviewed Oct 5, 2020

View reviewed changes

balanza added 6 commits October 5, 2020 13:00

removed useless test

dff6096

fix test

ce4a61c

remove info from openapi

9f833a7

add name info

09211cc

storage

e4c5e67

test align

aefad7c

balanza marked this pull request as ready for review October 5, 2020 15:49

balanza requested a review from francescopersico as a code owner October 5, 2020 15:49

Merge branch 'master' into 175014199-healthcheck

ba5b0ed

gunzip approved these changes Oct 6, 2020

View reviewed changes

gunzip merged commit 57ed674 into master Oct 6, 2020

gunzip deleted the 175014199-healthcheck branch October 6, 2020 10:06

[#175014199] Health checks #90

[#175014199] Health checks #90

Conversation

balanza commented Sep 30, 2020 • edited Loading

pagopa-github-bot commented Sep 30, 2020 • edited Loading

Affected stories

codecov-commenter commented Sep 30, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AleDore left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BurnedMarshal Oct 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gunzip commented Oct 6, 2020

balanza commented Oct 6, 2020

balanza commented Sep 30, 2020 •

edited

Loading

pagopa-github-bot commented Sep 30, 2020 •

edited

Loading

codecov-commenter commented Sep 30, 2020 •

edited

Loading

BurnedMarshal Oct 5, 2020 •

edited

Loading