TensorFlow Pipelines

Training pipeline

The TensorFlow training pipeline can be found in training/pipeline.py. The training component train_tensorflow_model is the main training component which contains the implementation of a TensorFlow Keras model. This component can then be wrapped in a custom kfp ContainerOp from google-cloud-pipeline-components which submits a Vertex Training job with added flexibility for machine_type, replica_count, accelerator_type among other machine configurations.

Data

The input data is split into three parts in BigQuery and stored in Google Cloud Storage:

80% of the input data is used for model training
10% of the input data is used for model validation
10% of the input data is used for model testing/evaluation

Model Architecture

The architecture of the example TensorFlow Keras model is shown below:

Input layer: there is one input node for each of the 7 features used in the example:
- dayofweek
- hourofday
- trip_distance
- trip_miles
- trip_seconds
- payment_type
- company
Pre-processing layers
- Categorical encoding for categorical features is done using Tensorflow's StringLookup layer. New/unknown values are handled using this layer's default parameters. (https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup).
  - The feature payment_type is one-hot encoded. New/unknown categories are assigned to a one-hot encoded array with zeroes everywhere.
  - The feature company is ordinal encoded. New/unknown categories are assigned to zero.
- Normalization for the numerical features (dayofweek, hourofday, trip_distance, trip_miles, trip_seconds)
Dense layers
- One Dense layer with 64 units whose activation function is ReLU.
- One Dense layer with 32 units whose activation function is ReLU.
Output layer
- One Dense layer with 1 unit where no activation is applied (this is because the example is a regression problem)

Model hyperparameters

You can specify different hyperparameters through the model_params argument of train_tensorflow_model, including:

Batch size
No. of epochs to check for early stopping
Learning rate
Number of hidden units and type of activation function in each layer
Loss function
Optimization method
Evaluation metrics
Whether you want early stopping

For a comprehensive list of options for the above hyperparameters, see the docstring in train.py.

Model artifacts

A number of different model artifacts/objects are created by the training of the TensorFlow model. With these files, you can load the model into a new script (without any of the original training code) and run it or resume training from exactly where you left off. For more information, see this.

Model test/evaluation

Once the model is trained, it will be used to get challenger predictions for evaluation purposes. In general, the component predict_tensorflow_model which expects a single CSV file to create predictions for test data is implemented in the pipeline, However, if you are working working with larger test data, it is more efficient to replace it with a prebuilt component provided by Google, ModelBatchPredictOp, to avoid crash caused by insufficent memory usage.

Distribution strategy

In deep learning, it is common to use GPUs, which utilise a large number of simple cores allowing parallel computing though thousands of threads at a time, to train complicated neural networks fed by massive datasets. For optimisation tasks, it is often better to use CPUs.

There is a variable, distribute_strategy, in tensorflow training pipeline that allows you to set up distribution strategy. You have three options:

Value	description
`single`	This strategy use GPU is a GPU device of the requested kind is available, otherwise, it uses CPU
`mirror`	This strategy is typically used for training on one machine with multiple GPUs.
`multi`	This strategy implements synchronous distributed training across multiple machines, each with potentially multiple GPUs

Prediction pipeline

The TensorFlow prediction pipeline can be found in prediction/pipeline.py.

The rationale for exporting the data twice (once as a CSV file and once as JSONL file) is that the CSV file is passed to the generate_statistics component (which uses the function tfdv.generate_statistics_from_csv) while the JSONL file is used when calling the ModelBatchPredictOp component for batch prediction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TensorFlow Pipelines

Training pipeline

Data

Model Architecture

Model hyperparameters

Model artifacts

Model test/evaluation

Distribution strategy

Prediction pipeline

Files

README.md

Latest commit

History

README.md

File metadata and controls

TensorFlow Pipelines

Training pipeline

Data

Model Architecture

Model hyperparameters

Model artifacts

Model test/evaluation

Distribution strategy

Prediction pipeline