Using Kubeflow for Orchestrating ML Workflows

Using Kubeflow for Orchestrating ML Workflows

How many of you have tried to build the famous "facial recognition model" in your local machine and felt proud? But imagine you build a software where the same model is being used for facial recognition of thousands of students for automatic attendance registration in a college or university.

The point being ML code you write on your local machine is just a small part of the whole process it takes to create an enterprise grade software that can actually solve real world problems.

A glimpse of how enterprise production solution looks like -

Challenges Enterprise face in deploying ML solutions

  • Data Collection

  • Deploying and Reproducing the model in production

  • Model Monitoring

  • Keeping model relevant by adopting to changing business scenarios

  • Communicate and interpret model output to various stakeholders.

Kubeflow allows you to make the deployment of machine learning workflows on Kubernetes simple, composable, portable, and scalable.

How to setup Kubeflow in Windows Subsystem for Linux Locally

The official website for Kubeflow currently does not have a clear doc on how to set it up Windows, so I will be discussing it in this blog.

To install Kubeflow locally, you need a few tools.

We’ll assume you have knowledge of Docker, as it is basically a prerequisite for working with Kubernetes. If you are unfamiliar with Docker, check out this tutorial to get you up to speed, then come back here.

Since Kubeflow runs wherever K8s runs, we can just deploy a K8s cluster locally and try to run KubeFlow.
Additionally, you’ll need the following tools:

  • kubectl, which is a command-line tool to manage your K8s cluster

  • kustomize to configure applications using YAML

Minikube Installation

Minikube will help setup a cluster locally on your machine. In terms of terminology, think of your computer as a single node responsible for housing the pods. These pods are where your application containers operate. The management of these pods is carried out by a deployment, which outlines the ideal condition for your Kubernetes application.

The initial command fetches and downloads the necessary binary, whereas the subsequent command facilitates its installation to the designated location.

curl -LO <https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64> sudo install minikube-linux-amd64 /usr/local/bin/minikube

What is kubectl?

kubectl is a versatile command-line utility designed to manage Kubernetes clusters efficiently. Often likened to a "Swiss Army knife" for its multipurpose functionality, it stands as an essential instrument in the administration of cluster resources.

sudo snap install kubectl --classic

What is Kustomize

Kubernetes operations are predominantly governed through an extensive array of YAML files. To streamline customization, kustomize offers a powerful solution. This tool enables the modification of raw, template-free YAML files without altering the original documents, ensuring they remain intact and operational as is.

Finally!!

Now that you have all the prerequisite software and packages installed, it is now time to install Kubeflow.

Follow these steps:

  1. Clone the manifests repo from the Kubeflow team:
git clone <https://github.com/kubeflow/manifests.git>
  1. Change to the repo directory:
cd manifests
  1. Build and apply the YAML files for all Kubeflow components:
while ! kustomize build example | awk '!/well-defined/' | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
  1. Wait for everything to settle out.

You can check to see if everything has settled out by running:

kubectl get pods -A

This will list all pods across all namespaces.

NAMESPACE         NAME                                     READY  STATUS    RESTARTS      AGE
auth              dex-7ff46847-sqxzj                       1/1    Running   0             10h
cert-manager      cert-manager-7fb78674d7-nllnn            1/1    Running   0             10h
cert-manager      cert-manager-cainjector-5dfc946d84-m6f7  1/1    Running   0             10h
cert-manager      cert-manager-webhook-8744b7588-cvzzm     1/1    Running   0             10h
istio-system      authservice-0                            1/1    Running   0             10h
istio-system      cluster-local-gateway-675bb7b74-49x27    1/1    Running   0             10h
istio-system      istio-ingressgateway-c7fdd4bf6-z68qt     1/1    Running   0             10h
istio-system      istiod-6995577d4-7h6zv                   1/1    Running   0             10h
knative-eventing  eventing-controller-86647cbc5b-62tl4     1/1    Running   0             10h
knative-eventing  eventing-webhook-6f48bb5f4c-c5ljb        1/1    Running   0             10h
knative-serving   activator-855b695596-zrfrr               2/2    Running   0             10h
knative-serving   autoscaler-7cbddfc9f7-gjckn              2/2    Running   0             10h
knative-serving   controller-6657c556fd-q728z              2/2    Running   0             10h
knative-serving   domain-mapping-544987775c-bffh5          2/2    Running   0             10h
knative-serving   domainmapping-webhook-6b48bdc856-bmllz   2/2    Running   0             10h
knative-serving   net-istio-controller-6fbdbd9959-bmglm    2/2    Running   0             10h
knative-serving   net-istio-webhook-7d4879cd7f-xwsl5       2/2    Running   0             10h
knative-serving   webhook-665c977469-rw6v6                 2/2    Running   0             10h
kube-system       coredns-787d4945fb-mgpsr                 1/1    Running   1 (10h ago)   10h
kube-system       etcd-minikube                            1/1    Running   2 (52s ago)   10h
kube-system       kube-apiserver-minikube                  1/1    Running   1 (10h ago)   10h
kube-system       kube-controller-manager-minikube         1/1    Running   2 (8h ago)    10h
kube-system       kube-proxy-l4tvb                         1/1    Running   1 (10h ago)   10h
kube-system       kube-scheduler-minikube                  1/1    Running   1 (10h ago)   10h
kube-system       nvidia-device-plugin-daemonset-cd6h8     1/1    Running   0             10h
kube-system       storage-provisioner                      1/1    Running   2 (10h ago)   10h
kubeflow          admission-webhook-deployment-6d48f6f745  1/1    Running   53 (10h ago)  10h
kubeflow          cache-server-6b44c46d47-lvcqr            2/2    Running   0             10h
kubeflow          centraldashboard-f966d7897-ltjhn         2/2    Running   0             10h
kubeflow          jupyter-web-app-deployment-795dcd4c9b-r  2/2    Running   0             10h
kubeflow          katib-controller-746969dc99-2fz29        1/1    Running   53 (10h ago)  10h
kubeflow          katib-db-manager-5ddbffd67-w429n         1/1    Running   0             10h
kubeflow          katib-mysql-66c8cdff4f-mrhz9             1/1    Running   0             10h
kubeflow          katib-ui-58b54d465f-kxmv2                2/2    Running   1 (10h ago)   10h
kubeflow          kserve-controller-manager-96b896c66-84v  2/2    Running   0             10h
kubeflow          kserve-models-web-app-9fbcd79f5-xksvx    2/2    Running   0             10h
kubeflow          kubeflow-pipelines-profile-controller-6  1/1    Running   0             10h
kubeflow          metacontroller-0                         1/1    Running   0             10h
kubeflow          metadata-envoy-deployment-7b49bdb748-tn  1/1    Running   0             10h
kubeflow          metadata-grpc-deployment-6d744c66bb-fkt  2/2    Running   3 (10h ago)   10h
kubeflow          metadata-writer-5bfdbf79b7-b5trj         2/2    Running   0             10h
kubeflow          minio-549846c488-x7jj6                   2/2    Running   0             10h
kubeflow          ml-pipeline-86d69497fc-mvtb9             2/2    Running   53 (10h ago)  10h
kubeflow          ml-pipeline-persistenceagent-5789446f9c  2/2    Running   0             10h
kubeflow          ml-pipeline-scheduledworkflow-fb9fbd76b  2/2    Running   0             10h
kubeflow          ml-pipeline-ui-74fcbdddd9-sm7dd          2/2    Running   0             10h
kubeflow          ml-pipeline-viewer-crd-bdf696cb9-97tks   2/2    Running   1 (10h ago)   10h
kubeflow          ml-pipeline-visualizationserver-845d745  2/2    Running   0             10h
kubeflow          mysql-5f968h4688-dlgv4                   2/2    Running   0             10h
kubeflow          notebook-controller-deployment-576df594  2/2    Running   2 (10h ago)   10h
kubeflow          profiles-deployment-7bc6469cdd-r5vzw     3/3    Running   53 (10h ago)  10h
kubeflow          tensorboard-controller-deployment-84954  3/3    Running   1 (10h ago)   10h
kubeflow          tensorboards-web-app-deployment-74bc589  2/2    Running   0             10h
kubeflow          training-operator-7c5456c65-fsqdr        1/1    Running   0             10h
kubeflow          volumes-web-app-deployment-86dddc89d4-8  2/2    Running   0             10h
kubeflow          workflow-controller-56cc57796-gjtd9      2/2    Running   1 (10h ago)   10h

How do I setup a Kubeflow Dashboard?

The dashboard is accessed via http requests routed through the istio-ingressgateway service in the istio-system namespace. To forward the port, you use kubectl:

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

This tells your cluster to listen on port 8080 locally and forward it to the service on port 80. You can then reach the dashboard at http://localhost:8080.

The default username is and the password is 12341234.

How do I stop minikube?

You can stop everything you’re running by stopping minikube:

minikube stop

If you want to delete your Kubeflow cluster, run:

minikube delete

Congratulations!! You've done it.

There will be a lot of errors that you will encounter while setting it up, so please refer to the Kubeflow Docs.