diff --git a/data-processing-lib/doc/overview.md b/data-processing-lib/doc/overview.md index ba300a72c..c95eca01b 100644 --- a/data-processing-lib/doc/overview.md +++ b/data-processing-lib/doc/overview.md @@ -42,6 +42,7 @@ the `DataAccess` instance (see below) according to the CLI parameters. To learn more consider the following: * [Transforms](transforms.md) +* [Transform Exceptions](transform-exceptions.md) * [Transform Runtimes](transform-runtimes.md) * [Transform Examples](transform-tutorial-examples.md) * [Testing Transforms](transform-testing.md) diff --git a/data-processing-lib/doc/ray-runtime.md b/data-processing-lib/doc/ray-runtime.md index 7818a9827..b826eca66 100644 --- a/data-processing-lib/doc/ray-runtime.md +++ b/data-processing-lib/doc/ray-runtime.md @@ -124,22 +124,6 @@ The `computed_execution_stats()` provides an opportunity to augment the statisti collected and aggregated by the TransformStatistics actor. It is called by the RayOrchestrator after all files have been processed. -## Exceptions -A transform may find that it needs to signal error conditions. -For example, if a referenced model could not be loaded or -a given input data (e.g., pyarrow Table) does not have the expected format (.e.g, columns). -In general, it should identify such conditions by raising an exception. -With this in mind, there are two types of exceptions: - -1. Those that would not allow any data to be processed (e.g. model loading problem). -2. Those that would not allow a specific datum to be processed (e.g. missing column). - -In the first situation the transform should throw an exception from the initializer, which -will cause the Ray framework to terminate processing of all data. -In the second situation (identified in the `transform()` or `flush()` methods), the transform -should throw an exception from the associated method. -This will cause only the error-causing datume to be ignored and not written out, -but allow continued processing of tables by the transform. -In both cases, the framework will log the exception as an error. + diff --git a/data-processing-lib/doc/transform-exceptions.md b/data-processing-lib/doc/transform-exceptions.md new file mode 100644 index 000000000..de039487e --- /dev/null +++ b/data-processing-lib/doc/transform-exceptions.md @@ -0,0 +1,17 @@ +# Exceptions +A transform may find that it needs to signal error conditions. +For example, if a referenced model could not be loaded or +a given input data (e.g., pyarrow Table) does not have the expected format (.e.g, columns). +In general, it should identify such conditions by raising an exception. +With this in mind, there are two types of exceptions: + +1. Those that would not allow any data to be processed (e.g. model loading problem). +2. Those that would not allow a specific datum to be processed (e.g. missing column). + +In the first situation the transform should throw an exception from the initializer, which +will cause the runtime to terminate processing of all data. +In the second situation (identified in the `transform()` or `flush()` methods), the transform +should throw an exception from the associated method. +This will cause only the error-causing datum to be ignored and not written out, +but allow continued processing of tables by the transform. +In both cases, the runtime will log the exception as an error. \ No newline at end of file