Kubeflow Inference Service

Launch a kubeflow Inference Service and demonstrate knative scale

In this demo you will:

From the Deploy UI create a server:

iris-create

Use the model url:

gs://kfserving-samples/models/sklearn/iris

Run a Load Test

When the deployment is ready click on it and “Start a Load Test”.

iris-load

Set the duration to 60 secs and the number of connections to 10.

Use the request.json file in this folder:

{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}

You should see pod counts scale up and then down, something like:

iris-load2

Now follow the “Add Canary” wizard and add an XGBoost canary:

canary

Use the XGBoost Iris model whose saved Booster is stored at:

gs://kfserving-samples/models/xgboost/iris

One the canary is running you can rerun the load test and see traffic split between both.

canary-load

To promote canary press the “Promote Canary” button

promote

Finally, you can delete the model.

delete

Was this page helpful?

Sorry to hear that. Please tell us how we can improve.