Automate Model Deployment

31 / 57 Deployment This practice was ranked as medium.
Click to read more.
This practice helps to increase
the team's agility.
Click to read more.


Intent

Increase the ability to deploy models on demand, which increases availability and scalability.

Motivation

Deploying and orchestrating different components of an application can be a tedious task. Instead of manually packaging and delivering models, and in order to avoid manual interventions or errors, one can automate this task.

Applicability

Automatic model deployment should be implemented in any production-level ML application.

Description

Automated deployment involves automatically packing the model together with its dependencies and ‘shipping’ it to a production server, instead of manually connecting to a server and performing the deployment.

Automated model deployment brings several advantages. At first, it saves time and it increases reliability because the chances to introduce human errors are removed. Secondly, it improves availability and scalability because one can repeat the process on demand for as many instances as it is needed, without manual intervention. This means one can spin off many instances of the model whenever many users need access to a service and decrease the number of instances when the demand lowers.

In order to facilitate continuous deployment:

  • use virtualization abstractions , e.g. Docker, Kubeflow,
  • use CD tools, e.g. Gitlab CD/Shipyard, Travis, etc.

Export Models in Portable, Standard Formats

Tying a deployed model to a specific framework version or runtime creates fragility: a library update can break serving, and migrating infrastructure becomes difficult. Export trained models to a framework-independent format as part of the deployment pipeline:

  • ONNX for interoperability across frameworks (PyTorch, TensorFlow, scikit-learn) and runtimes (ONNX Runtime, TensorRT, OpenVINO),
  • TorchScript for self-contained PyTorch models that do not require Python at inference time,
  • TensorFlow SavedModel for TensorFlow/Keras models targeting TF Serving or TFLite.

Include format validation and an inference parity test (output of exported model equals output of original within numerical tolerance) as a gate in the packaging step.

Adoption

Related

Read more



31 / 57 Deployment This practice was ranked as medium.
Click to read more.
This practice helps to increase
the team's agility.
Click to read more.