From d028a4c60566a9d5b18aa87f6d0da6acb558b061 Mon Sep 17 00:00:00 2001 From: Asra Ali Date: Wed, 6 Jan 2021 11:24:54 -0500 Subject: [PATCH 1/8] draft Signed-off-by: Asra Ali --- STYLE.md | 103 +++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 74 insertions(+), 29 deletions(-) diff --git a/STYLE.md b/STYLE.md index 3feafbcc10a0..5beaea0e2fbd 100644 --- a/STYLE.md +++ b/STYLE.md @@ -106,23 +106,31 @@ A few general notes on our error handling philosophy: * All error code returns should be checked. * At a very high level, our philosophy is that errors should be handled gracefully when caused by: - - Untrusted network traffic OR + - Untrusted network traffic (from downstream, upstream, or extensions like filters) - Raised by the Envoy process environment and are *likely* to happen + - Third party dependency return codes * Examples of likely environnmental errors include any type of network error, disk IO error, bad - data returned by an API call, bad data read from runtime files, etc. Errors in the Envoy - environment that are *unlikely* to happen after process initialization, should lead to process - death, under the assumption that the additional burden of defensive coding and testing is not an - effective use of time for an error that should not happen given proper system setup. Examples of - these types of errors include not being able to open the shared memory region, system calls that - should not fail assuming correct parameters (which should be validated via tests), etc. Examples - of system calls that should not fail when passed valid parameters include the kernel returning a - valid `sockaddr` after a successful call to `accept()`, `pthread_create()`, `pthread_join()`, etc. -* OOM events (both memory and FDs) are considered fatal crashing errors. An OOM error should never - silently be ignored and should crash the process either via the C++ allocation error exception, an - explicit `RELEASE_ASSERT` following a third party library call, or an obvious crash on a subsequent - line via null pointer dereference. This rule is again based on the philosophy that the engineering - costs of properly handling these cases are not worth it. Time is better spent designing proper system - controls that shed load if resource usage becomes too high, etc. + data returned by an API call, bad data read from runtime files, etc. This includes configuration + updates after startup. +* Third party dependency return codes should be checked and gracefully handled. Examples include + HTTP/2 or JSON parsers. Some return codes may be handled by continuing, for example, in case of an + out of process RPC failure. +* Errors in the Envoy environment that are *unlikely* to happen after process initialization, should + lead to process death, under the assumption that the additional burden of defensive coding and + testing is not an effective use of time for an error that should not happen given proper system + setup. Examples of these types of errors include not being able to open the shared memory region, + system calls that should not fail assuming correct parameters (which should be validated via + tests), etc. Examples of system calls that should not fail when passed valid parameters include + the kernel returning a valid `sockaddr` after a successful call to `accept()`, `pthread_create()`, + `pthread_join()`, etc. However, system calls that require permissions may cause likely errors in + some deployments and need graceful error handling. +* OOM events (both memory and FDs) or ENOMON errors are considered fatal crashing errors. An OOM + error should never silently be ignored and should crash the process either via the C++ allocation + error exception, an explicit `RELEASE_ASSERT` following a third party library call, or an obvious + crash on a subsequent line via null pointer dereference. This rule is again based on the + philosophy that the engineering costs of properly handling these cases are not worth it. Time is + better spent designing proper system controls that shed load if resource usage becomes too high, + etc. * The "less is more" error handling philosophy described in the previous two points is primarily based on the fact that restarts are designed to be fast, reliable and cheap. * Although we strongly recommend that any type of startup error leads to a fatal error, since this @@ -148,20 +156,57 @@ A few general notes on our error handling philosophy: continue seems ridiculous because *"this should never happen"*, it's a very good indication that the appropriate behavior is to terminate the process and not handle the error. When in doubt, please discuss. -* Per above it's acceptable to turn failures into crash semantics - via `RELEASE_ASSERT(condition)` or `PANIC(message)` if there is no other sensible behavior, e.g. - in OOM (memory/FD) scenarios. Only `RELEASE_ASSERT(condition)` should be used to validate - conditions that might be imposed by the external environment. `ASSERT(condition)` should be used - to document (and check in debug-only builds) program invariants. Use `ASSERT` liberally, but do - not use it for things that will crash in an obvious way in a subsequent line. E.g., do not do - `ASSERT(foo != nullptr); foo->doSomething();`. Note that there is a gray line between external - environment failures and program invariant violations. For example, memory corruption due to a - security issue (a bug, deliberate buffer overflow etc.) might manifest as a violation of program - invariants or as a detectable condition in the external environment (e.g. some library returning a - highly unexpected error code or buffer contents). Unfortunately no rule can cleanly cover when to - use `RELEASE_ASSERT` vs. `ASSERT`. In general we view `ASSERT` as the common case and - `RELEASE_ASSERT` as the uncommon case, but experience and judgment may dictate a particular approach - depending on the situation. + +# Macro Usage + +The following macros are available: + * `RELEASE_ASSERT`: fatal check. + * `ASSERT`: fatal check in debug-only builds. These should be used to document (and check in + debug-only builds) program invariants. + * `ENVOY_BUG`: logs and increments a stat in release mode, fatal check in debug builds. These + should be used where it may be useful to detect if an efficient condition is violated in + production (and check in debug-only builds). + +Sub-macros alias the macros above and can be used to annotate specific situations: + * `ASSERT_INTERNAL` (by default compiles to `ASSERT` but can be built to behave like + `ENVOY_BUG`). Used for provable, internal class invariants. No error handling is needed. + * `ENVOY_BUG_ALPHA` (alias `ENVOY_BUG`): Used for alpha or rapidly changing protocols that need + detectability on probable or changing invariants. + +* Per above it's acceptable to turn failures into crash semantics via `RELEASE_ASSERT(condition)` or + `PANIC(message)` if there is no other sensible behavior, e.g. in OOM (memory/FD) scenarios. +* Only `RELEASE_ASSERT(condition)` should be used to validate conditions that might be imposed by + the external environment. +* Annotate macro usage with comments on belief (probable or provable condition). +* Use `ASSERT` and `ENVOY_BUG` liberally, but do not use it for things that will crash in an obvious + way in a subsequent line. E.g., do not do `ASSERT(foo != nullptr); foo->doSomething();`. +* `ENVOY_BUG`s provide detectability and more confidence than an `ASSERT`. They are useful for + non-trivial conditions, those with complex control flow, and rapidly changing protocols. Testing + should be added to ensure that Envoy can continue to operate even if an `ENVOY_BUG` condition is + violated. +* `ASSERT`s should be understandable to a reader. Add comments if not. They should be robust to + future changes. +* Note that there is a gray line between external environment failures and program invariant + violations. For example, memory corruption due to a security issue (a bug, deliberate buffer + overflow etc.) might manifest as a violation of program invariants or as a detectable condition in + the external environment (e.g. some library returning a highly unexpected error code or buffer + contents). Unfortunately no rule can cleanly cover when to use `RELEASE_ASSERT` vs. `ASSERT`. In + general we view `ASSERT` as the common case and `RELEASE_ASSERT` as the uncommon case, but + experience and judgment may dictate a particular approach depending on the situation. + +Below is a guideline for macro usage. On the left side are invariants and the right side is for +error conditions that can be triggered. `ENVOY_BUG` represents a middle ground that can be used for +uncertain conditions that need detectability. `ENVOY_BUG`s can also be added for errors if they need +detection. + +| ASSERT/RELEASE_ASSERT | ENVOY_BUG | Error handling and Testing | +| --- | --- | --- | +| Low level invariants in data structures | | | +| Simple, provable internal class invariants (denote ASSERT_INVARIANT | Complex, uncertain internal class invariants (e.g. need detectability if violated) | | +| Provable (pre/post)-conditions | Complicated but likely (pre-/post-) conditions that are low-risk (Envoy can continue safely) | Triggerable or uncertain conditions, may be based on untrusted data plane traffic or an extensions’ contract. | +| | Conditions in alpha or changing extensions that need detectability. (ENVOY_BUG_ALPHA) | | +| Unlikely environment errors after process initialization that would otherwise crash | | Likely environment errors, e.g. return codes from untrusted extensions, dependencies or system calls, network error, bad data read, permission based errors, etc. | +| Fatal crashing events. e.g. OOMs, deadlocks, no process recovery possible | | | # Hermetic and deterministic tests From 9e022717f62a3392e7495c376e445560528e8c7c Mon Sep 17 00:00:00 2001 From: Asra Ali Date: Wed, 6 Jan 2021 11:30:05 -0500 Subject: [PATCH 2/8] fix markdown format Signed-off-by: Asra Ali --- STYLE.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/STYLE.md b/STYLE.md index 5beaea0e2fbd..84dbcd7b00a5 100644 --- a/STYLE.md +++ b/STYLE.md @@ -160,6 +160,7 @@ A few general notes on our error handling philosophy: # Macro Usage The following macros are available: + * `RELEASE_ASSERT`: fatal check. * `ASSERT`: fatal check in debug-only builds. These should be used to document (and check in debug-only builds) program invariants. @@ -168,8 +169,7 @@ The following macros are available: production (and check in debug-only builds). Sub-macros alias the macros above and can be used to annotate specific situations: - * `ASSERT_INTERNAL` (by default compiles to `ASSERT` but can be built to behave like - `ENVOY_BUG`). Used for provable, internal class invariants. No error handling is needed. + * `ENVOY_BUG_ALPHA` (alias `ENVOY_BUG`): Used for alpha or rapidly changing protocols that need detectability on probable or changing invariants. @@ -199,14 +199,14 @@ error conditions that can be triggered. `ENVOY_BUG` represents a middle ground t uncertain conditions that need detectability. `ENVOY_BUG`s can also be added for errors if they need detection. -| ASSERT/RELEASE_ASSERT | ENVOY_BUG | Error handling and Testing | -| --- | --- | --- | -| Low level invariants in data structures | | | -| Simple, provable internal class invariants (denote ASSERT_INVARIANT | Complex, uncertain internal class invariants (e.g. need detectability if violated) | | -| Provable (pre/post)-conditions | Complicated but likely (pre-/post-) conditions that are low-risk (Envoy can continue safely) | Triggerable or uncertain conditions, may be based on untrusted data plane traffic or an extensions’ contract. | -| | Conditions in alpha or changing extensions that need detectability. (ENVOY_BUG_ALPHA) | | -| Unlikely environment errors after process initialization that would otherwise crash | | Likely environment errors, e.g. return codes from untrusted extensions, dependencies or system calls, network error, bad data read, permission based errors, etc. | -| Fatal crashing events. e.g. OOMs, deadlocks, no process recovery possible | | | +| `ASSERT`/`RELEASE_ASSERT` | `ENVOY_BUG` | Error handling and Testing | +| --- | --- | --- | +| Low level invariants in data structures | | | +| Simple, provable internal class invariants | Complex, uncertain internal class invariants (e.g. need detectability if violated) | | +| Provable (pre/post)-conditions | Complicated but likely (pre-/post-) conditions that are low-risk (Envoy can continue safely) | Triggerable or uncertain conditions, may be based on untrusted data plane traffic or an extensions’ contract. | +| | Conditions in alpha or changing extensions that need detectability. (`ENVOY_BUG_ALPHA`) | | +| Unlikely environment errors after process initialization that would otherwise crash | | Likely environment errors, e.g. return codes from untrusted extensions, dependencies or system calls, network error, bad data read, permission based errors, etc. | +| Fatal crashing events. e.g. OOMs, deadlocks, no process recovery possible | | | # Hermetic and deterministic tests From 7cc91cced6a36f3ce89c95fd8c2b489a56269d34 Mon Sep 17 00:00:00 2001 From: Asra Ali Date: Wed, 6 Jan 2021 11:38:08 -0500 Subject: [PATCH 3/8] markdown style Signed-off-by: Asra Ali --- STYLE.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/STYLE.md b/STYLE.md index 84dbcd7b00a5..f78332bd5df4 100644 --- a/STYLE.md +++ b/STYLE.md @@ -160,17 +160,15 @@ A few general notes on our error handling philosophy: # Macro Usage The following macros are available: - - * `RELEASE_ASSERT`: fatal check. - * `ASSERT`: fatal check in debug-only builds. These should be used to document (and check in + - `RELEASE_ASSERT`: fatal check. + - `ASSERT`: fatal check in debug-only builds. These should be used to document (and check in debug-only builds) program invariants. - * `ENVOY_BUG`: logs and increments a stat in release mode, fatal check in debug builds. These + - `ENVOY_BUG`: logs and increments a stat in release mode, fatal check in debug builds. These should be used where it may be useful to detect if an efficient condition is violated in production (and check in debug-only builds). Sub-macros alias the macros above and can be used to annotate specific situations: - - * `ENVOY_BUG_ALPHA` (alias `ENVOY_BUG`): Used for alpha or rapidly changing protocols that need + - `ENVOY_BUG_ALPHA` (alias `ENVOY_BUG`): Used for alpha or rapidly changing protocols that need detectability on probable or changing invariants. * Per above it's acceptable to turn failures into crash semantics via `RELEASE_ASSERT(condition)` or From 884f1d41251ba7f3c9485b925e1ac13c5007545d Mon Sep 17 00:00:00 2001 From: Asra Ali Date: Wed, 6 Jan 2021 11:47:28 -0500 Subject: [PATCH 4/8] fix indent Signed-off-by: Asra Ali --- STYLE.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/STYLE.md b/STYLE.md index f78332bd5df4..7e0e90d9cc18 100644 --- a/STYLE.md +++ b/STYLE.md @@ -159,17 +159,17 @@ A few general notes on our error handling philosophy: # Macro Usage -The following macros are available: - - `RELEASE_ASSERT`: fatal check. - - `ASSERT`: fatal check in debug-only builds. These should be used to document (and check in - debug-only builds) program invariants. - - `ENVOY_BUG`: logs and increments a stat in release mode, fatal check in debug builds. These - should be used where it may be useful to detect if an efficient condition is violated in - production (and check in debug-only builds). - -Sub-macros alias the macros above and can be used to annotate specific situations: - - `ENVOY_BUG_ALPHA` (alias `ENVOY_BUG`): Used for alpha or rapidly changing protocols that need - detectability on probable or changing invariants. +* The following macros are available: + - `RELEASE_ASSERT`: fatal check. + - `ASSERT`: fatal check in debug-only builds. These should be used to document (and check in + debug-only builds) program invariants. + - `ENVOY_BUG`: logs and increments a stat in release mode, fatal check in debug builds. These + should be used where it may be useful to detect if an efficient condition is violated in + production (and check in debug-only builds). + +* Sub-macros alias the macros above and can be used to annotate specific situations: + - `ENVOY_BUG_ALPHA` (alias `ENVOY_BUG`): Used for alpha or rapidly changing protocols that need + detectability on probable or changing invariants. * Per above it's acceptable to turn failures into crash semantics via `RELEASE_ASSERT(condition)` or `PANIC(message)` if there is no other sensible behavior, e.g. in OOM (memory/FD) scenarios. From 1be29159d5c721017cc5e1027a6c4bd7d99fe133 Mon Sep 17 00:00:00 2001 From: Asra Ali Date: Wed, 6 Jan 2021 11:49:39 -0500 Subject: [PATCH 5/8] fix wording Signed-off-by: Asra Ali --- STYLE.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/STYLE.md b/STYLE.md index 7e0e90d9cc18..5f1b151d40c2 100644 --- a/STYLE.md +++ b/STYLE.md @@ -192,10 +192,10 @@ A few general notes on our error handling philosophy: general we view `ASSERT` as the common case and `RELEASE_ASSERT` as the uncommon case, but experience and judgment may dictate a particular approach depending on the situation. -Below is a guideline for macro usage. On the left side are invariants and the right side is for -error conditions that can be triggered. `ENVOY_BUG` represents a middle ground that can be used for -uncertain conditions that need detectability. `ENVOY_BUG`s can also be added for errors if they need -detection. +Below is a guideline for macro usage. The left side of the table has invariants and the right side +has error conditions that can be triggered and should be gracefully handled. `ENVOY_BUG` represents +a middle ground that can be used for uncertain conditions that need detectability. `ENVOY_BUG`s can +also be added for errors if they warrant detection. | `ASSERT`/`RELEASE_ASSERT` | `ENVOY_BUG` | Error handling and Testing | | --- | --- | --- | From c51bfcad802ecb2e407b5575f0cdd665bb6ec1fd Mon Sep 17 00:00:00 2001 From: Asra Ali Date: Wed, 6 Jan 2021 13:06:17 -0500 Subject: [PATCH 6/8] fix format Signed-off-by: Asra Ali --- STYLE.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/STYLE.md b/STYLE.md index 5f1b151d40c2..ab7e3469bf96 100644 --- a/STYLE.md +++ b/STYLE.md @@ -172,7 +172,7 @@ A few general notes on our error handling philosophy: detectability on probable or changing invariants. * Per above it's acceptable to turn failures into crash semantics via `RELEASE_ASSERT(condition)` or - `PANIC(message)` if there is no other sensible behavior, e.g. in OOM (memory/FD) scenarios. + `PANIC(message)` if there is no other sensible behavior, e.g. in OOM (memory/FD) scenarios. * Only `RELEASE_ASSERT(condition)` should be used to validate conditions that might be imposed by the external environment. * Annotate macro usage with comments on belief (probable or provable condition). @@ -201,7 +201,7 @@ also be added for errors if they warrant detection. | --- | --- | --- | | Low level invariants in data structures | | | | Simple, provable internal class invariants | Complex, uncertain internal class invariants (e.g. need detectability if violated) | | -| Provable (pre/post)-conditions | Complicated but likely (pre-/post-) conditions that are low-risk (Envoy can continue safely) | Triggerable or uncertain conditions, may be based on untrusted data plane traffic or an extensions’ contract. | +| Provable (pre/post)-conditions | Complicated but likely (pre-/post-) conditions that are low-risk (Envoy can continue safely) | Triggerable or uncertain conditions, may be based on untrusted data plane traffic or an extensions’ contract. | | | Conditions in alpha or changing extensions that need detectability. (`ENVOY_BUG_ALPHA`) | | | Unlikely environment errors after process initialization that would otherwise crash | | Likely environment errors, e.g. return codes from untrusted extensions, dependencies or system calls, network error, bad data read, permission based errors, etc. | | Fatal crashing events. e.g. OOMs, deadlocks, no process recovery possible | | | From 94568b028f99d30c9131056cadd16c1d9c79d37a Mon Sep 17 00:00:00 2001 From: Asra Ali Date: Fri, 8 Jan 2021 08:43:01 -0500 Subject: [PATCH 7/8] update Signed-off-by: Asra Ali --- STYLE.md | 37 ++++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/STYLE.md b/STYLE.md index ab7e3469bf96..c6c634e43129 100644 --- a/STYLE.md +++ b/STYLE.md @@ -110,8 +110,8 @@ A few general notes on our error handling philosophy: - Raised by the Envoy process environment and are *likely* to happen - Third party dependency return codes * Examples of likely environnmental errors include any type of network error, disk IO error, bad - data returned by an API call, bad data read from runtime files, etc. This includes configuration - updates after startup. + data returned by an API call, bad data read from runtime files, etc. This includes loading + configuration at runtime. * Third party dependency return codes should be checked and gracefully handled. Examples include HTTP/2 or JSON parsers. Some return codes may be handled by continuing, for example, in case of an out of process RPC failure. @@ -131,6 +131,9 @@ A few general notes on our error handling philosophy: philosophy that the engineering costs of properly handling these cases are not worth it. Time is better spent designing proper system controls that shed load if resource usage becomes too high, etc. +* Testing should cover any serious cases that may result in infinite loops, crashes, or serious + errors. Non-trivial invariants are also encouraged to have testing. Internal, localized invariants + may not need testing. * The "less is more" error handling philosophy described in the previous two points is primarily based on the fact that restarts are designed to be fast, reliable and cheap. * Although we strongly recommend that any type of startup error leads to a fatal error, since this @@ -165,24 +168,29 @@ A few general notes on our error handling philosophy: debug-only builds) program invariants. - `ENVOY_BUG`: logs and increments a stat in release mode, fatal check in debug builds. These should be used where it may be useful to detect if an efficient condition is violated in - production (and check in debug-only builds). + production (and fatal check in debug-only builds). * Sub-macros alias the macros above and can be used to annotate specific situations: - `ENVOY_BUG_ALPHA` (alias `ENVOY_BUG`): Used for alpha or rapidly changing protocols that need - detectability on probable or changing invariants. + detectability on probable conditions or invariants. * Per above it's acceptable to turn failures into crash semantics via `RELEASE_ASSERT(condition)` or - `PANIC(message)` if there is no other sensible behavior, e.g. in OOM (memory/FD) scenarios. -* Only `RELEASE_ASSERT(condition)` should be used to validate conditions that might be imposed by - the external environment. -* Annotate macro usage with comments on belief (probable or provable condition). -* Use `ASSERT` and `ENVOY_BUG` liberally, but do not use it for things that will crash in an obvious + `PANIC(message)` if there is no other sensible behavior, e.g. in OOM (memory/FD) scenarios. +* Do not `ASSERT` on conditions imposed by the external environment. Either add error handling + (potentially with an `ENVOY_BUG` for detectability) or `RELEASE_ASSERT` if the condition indicates + that the process is unrecoverable. +* Use `ASSERT` and `ENVOY_BUG` liberally, but do not use them for things that will crash in an obvious way in a subsequent line. E.g., do not do `ASSERT(foo != nullptr); foo->doSomething();`. +* Use `ASSERT`s for true invariants and well-defined conditions that are useful for tests, + debug-only checks and documentation. They may be `ENVOY_BUG`s if performance allows, see point + below. * `ENVOY_BUG`s provide detectability and more confidence than an `ASSERT`. They are useful for non-trivial conditions, those with complex control flow, and rapidly changing protocols. Testing should be added to ensure that Envoy can continue to operate even if an `ENVOY_BUG` condition is violated. -* `ASSERT`s should be understandable to a reader. Add comments if not. They should be robust to +* Annotate conditions with comments on belief or reasoning, for example `Condition is guaranteed by + caller foo` or `Condition is likely to hold after processing through external library foo`. +* Macro usage should be understandable to a reader. Add comments if not. They should be robust to future changes. * Note that there is a gray line between external environment failures and program invariant violations. For example, memory corruption due to a security issue (a bug, deliberate buffer @@ -190,7 +198,14 @@ A few general notes on our error handling philosophy: the external environment (e.g. some library returning a highly unexpected error code or buffer contents). Unfortunately no rule can cleanly cover when to use `RELEASE_ASSERT` vs. `ASSERT`. In general we view `ASSERT` as the common case and `RELEASE_ASSERT` as the uncommon case, but - experience and judgment may dictate a particular approach depending on the situation. + experience and judgment may dictate a particular approach depending on the situation. The risk of + process death from `RELEASE_ASSERT` should be justified with the severity and possibility of the + condition to avoid unintentional crashes. You may use the following guide: + - If a violation is high risk (will cause a crash in subsequent data processing or indicates a + failure state beyond recovery), use `RELEASE_ASSERT`. + - If a violation is medium or low risk (Envoy can continue safely) and is not expensive, + consider `ENVOY_BUG`. + - Otherwise (if a condition is expensive or test-only), use `ASSERT`. Below is a guideline for macro usage. The left side of the table has invariants and the right side has error conditions that can be triggered and should be gracefully handled. `ENVOY_BUG` represents From 8d8e10d7b2d538b7c7887ed95a66bb8a7cf95ea6 Mon Sep 17 00:00:00 2001 From: Asra Ali Date: Wed, 20 Jan 2021 10:18:17 -0500 Subject: [PATCH 8/8] address comments Signed-off-by: Asra Ali --- STYLE.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/STYLE.md b/STYLE.md index c6c634e43129..7156614b1d48 100644 --- a/STYLE.md +++ b/STYLE.md @@ -115,6 +115,9 @@ A few general notes on our error handling philosophy: * Third party dependency return codes should be checked and gracefully handled. Examples include HTTP/2 or JSON parsers. Some return codes may be handled by continuing, for example, in case of an out of process RPC failure. +* Testing should cover any serious cases that may result in infinite loops, crashes, or serious + errors. Non-trivial invariants are also encouraged to have testing. Internal, localized invariants + may not need testing. * Errors in the Envoy environment that are *unlikely* to happen after process initialization, should lead to process death, under the assumption that the additional burden of defensive coding and testing is not an effective use of time for an error that should not happen given proper system @@ -124,17 +127,14 @@ A few general notes on our error handling philosophy: the kernel returning a valid `sockaddr` after a successful call to `accept()`, `pthread_create()`, `pthread_join()`, etc. However, system calls that require permissions may cause likely errors in some deployments and need graceful error handling. -* OOM events (both memory and FDs) or ENOMON errors are considered fatal crashing errors. An OOM +* OOM events (both memory and FDs) or ENOMEM errors are considered fatal crashing errors. An OOM error should never silently be ignored and should crash the process either via the C++ allocation error exception, an explicit `RELEASE_ASSERT` following a third party library call, or an obvious crash on a subsequent line via null pointer dereference. This rule is again based on the philosophy that the engineering costs of properly handling these cases are not worth it. Time is better spent designing proper system controls that shed load if resource usage becomes too high, etc. -* Testing should cover any serious cases that may result in infinite loops, crashes, or serious - errors. Non-trivial invariants are also encouraged to have testing. Internal, localized invariants - may not need testing. -* The "less is more" error handling philosophy described in the previous two points is primarily +* The "less is more" error handling philosophy described in the previous points is primarily based on the fact that restarts are designed to be fast, reliable and cheap. * Although we strongly recommend that any type of startup error leads to a fatal error, since this is almost always a result of faulty configuration which should be caught during a canary process, @@ -201,11 +201,11 @@ A few general notes on our error handling philosophy: experience and judgment may dictate a particular approach depending on the situation. The risk of process death from `RELEASE_ASSERT` should be justified with the severity and possibility of the condition to avoid unintentional crashes. You may use the following guide: - - If a violation is high risk (will cause a crash in subsequent data processing or indicates a - failure state beyond recovery), use `RELEASE_ASSERT`. - - If a violation is medium or low risk (Envoy can continue safely) and is not expensive, - consider `ENVOY_BUG`. - - Otherwise (if a condition is expensive or test-only), use `ASSERT`. + * If a violation is high risk (will cause a crash in subsequent data processing or indicates a + failure state beyond recovery), use `RELEASE_ASSERT`. + * If a violation is medium or low risk (Envoy can continue safely) and is not expensive, + consider `ENVOY_BUG`. + * Otherwise (if a condition is expensive or test-only), use `ASSERT`. Below is a guideline for macro usage. The left side of the table has invariants and the right side has error conditions that can be triggered and should be gracefully handled. `ENVOY_BUG` represents