Version: release-0.9

Manage Xinference with KubeBlocks

Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you’re empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your premises, and create robust AI-driven applications.

This tutorial illustrates how to create and manage a Xinference cluster by kubectl or a YAML file. You can find the YAML examples and guides in the GitHub repository.

Before you start

Install KubeBlocks.
View all the database types and versions available for creating a cluster.
Make sure the xinference cluster definition is installed. If the cluster definition is not available, refer to this doc to enable it first.
```
kubectl get clusterdefinition xinference
>
NAME           TOPOLOGIES   SERVICEREFS   STATUS      AGE
xinference                                Available   30m
```
View all available versions for creating a cluster.
```
kubectl get clusterversions -l clusterdefinition.kubeblocks.io/name=xinference
```
To keep things isolated, create a separate namespace called demo throughout this tutorial.
```
kubectl create namespace demo
```

Create a cluster

KubeBlocks implements a Cluster CRD to define a cluster. Here is an example of creating a Xinference cluster.

cat <<EOF | kubectl apply -f -
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: mycluster
  namespace: demo
spec:
  clusterDefinitionRef: xinference
  clusterVersionRef: xinference-0.11.0-cpu
  terminationPolicy: Delete
  affinity:
    podAntiAffinity: Preferred
    topologyKeys:
    - kubernetes.io/hostname
  tolerations:
    - key: kb-data
      operator: Equal
      value: 'true'
      effect: NoSchedule
  componentSpecs:
  - name: xinference
    componentDefRef: xinference
    serviceAccountName: kb-xinference-cluster
    replicas: 1
    resources:
      limits:
        cpu: '1'
        memory: 1Gi
      requests:
        cpu: '1'
        memory: 1Gi
EOF

Field	Definition
`spec.clusterDefinitionRef`	It specifies the name of the ClusterDefinition for creating a specific type of cluster.
`spec.clusterVersionRef`	It is the name of the cluster version CRD that defines the cluster version.
`spec.terminationPolicy`	It is the policy of cluster termination. The default value is `Delete`. Valid values are `DoNotTerminate`, `Halt`, `Delete`, `WipeOut`. - `DoNotTerminate` blocks deletion operation. - `Halt` deletes workload resources such as statefulset and deployment workloads but keep PVCs. - `Delete` is based on Halt and deletes PVCs. - `WipeOut` is based on Delete and wipe out all volume snapshots and snapshot data from a backup storage location.
`spec.affinity`	It defines a set of node affinity scheduling rules for the cluster's Pods. This field helps control the placement of Pods on nodes within the cluster.
`spec.affinity.podAntiAffinity`	It specifies the anti-affinity level of Pods within a component. It determines how pods should spread across nodes to improve availability and performance.
`spec.affinity.topologyKeys`	It represents the key of node labels used to define the topology domain for Pod anti-affinity and Pod spread constraints.
`spec.tolerations`	It is an array that specifies tolerations attached to the cluster's Pods, allowing them to be scheduled onto nodes with matching taints.
`spec.componentSpecs`	It is the list of components that define the cluster components. This field allows customized configuration of each component within a cluster.
`spec.componentSpecs.componentDefRef`	It is the name of the component definition that is defined in the cluster definition and you can get the component definition names with `kubectl get clusterdefinition xinference -o json \\| jq '.spec.componentDefs[].name'`.
`spec.componentSpecs.name`	It specifies the name of the component.
`spec.componentSpecs.replicas`	It specifies the number of replicas of the component.
`spec.componentSpecs.resources`	It specifies the resource requirements of the component.

KubeBlocks operator watches for the Cluster CRD and creates the cluster and all dependent resources. You can get all the resources created by the cluster with kubectl get all,secret,rolebinding,serviceaccount -l app.kubernetes.io/instance=mycluster -n demo.

kubectl get all,secret,rolebinding,serviceaccount -l app.kubernetes.io/instance=mycluster -n demo

Run the following command to see the created Xinference cluster object:

kubectl get cluster mycluster -n demo -o yaml

Scale vertically

You can vertically scale a cluster by changing resource requirements and limits (CPU and storage). For example, you can change the resource class from 1C2G to 2C4G by performing vertical scaling.

Before you start

Check whether the cluster status is Running. Otherwise, the following operations may fail.

kubectl get cluster mycluster -n demo
>
NAME        CLUSTER-DEFINITION     VERSION                 TERMINATION-POLICY     STATUS    AGE
mycluster   xinference             xinference-0.11.0-cpu   Delete                 Running   47m

Steps

There are two ways to apply vertical scaling.

OpsRequest
Edit cluster YAML file

Apply an OpsRequest to the specified cluster. Configure the parameters according to your needs.

kubectl apply -f - <<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: mycluster-vertical-scaling
  namespace: demo
spec:
  clusterName: mycluster
  type: VerticalScaling
  verticalScaling:
  - componentName: xinference
    requests:
      memory: "2Gi"
      cpu: "2"
    limits:
      memory: "4Gi"
      cpu: "2"
EOF

Check the operation status to validate the vertical scaling.

kubectl get ops -n demo
>
NAMESPACE   NAME                         TYPE              CLUSTER     STATUS    PROGRESS   AGE
demo        mycluster-vertical-scaling   VerticalScaling   mycluster   Succeed   3/3        6m

If an error occurs, you can troubleshoot with kubectl describe ops -n demo command to view the events of this operation.

Check whether the corresponding resources change.
```
kubectl describe cluster mycluster -n demo
```

Change the configuration of spec.componentSpecs.resources in the YAML file. spec.componentSpecs.resources controls the requirement and limit of resources and changing them triggers a vertical scaling.

apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: mycluster
  namespace: demo
spec:
  clusterDefinitionRef: xinference
  clusterVersionRef: xinference-0.11.0-cpu
  componentSpecs:
  - name: xinference
    componentDefRef: xinference
    replicas: 1
    resources: # Change the values of resources.
      requests:
        memory: "2Gi"
        cpu: "2"
      limits:
        memory: "4Gi"
        cpu: "2"

Check whether the corresponding resources change.
```
kubectl describe cluster mycluster -n demo
```

Stop/Start a cluster

You can stop/start a cluster to save computing resources. When a cluster is stopped, the computing resources of this cluster are released, which means the pods of Kubernetes are released, but the storage resources are reserved. Start this cluster again if you want to restore the cluster resources from the original storage by snapshots.

Stop a cluster

OpsRequest
Edit cluster YAML file

Run the command below to stop a cluster.

kubectl apply -f - <<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: mycluster-stop
  namespace: demo
spec:
  clusterName: mycluster
  type: Stop
EOF

Configure replicas as 0 to delete pods.

apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
    name: mycluster
    namespace: demo
spec:
  clusterDefinitionRef: xinference
  clusterVersionRef: xinference-0.11.0-cpu
  terminationPolicy: Delete
  componentSpecs:
  - name: xinference
    componentDefRef: xinference
    disableExporter: true  
    replicas: 0

Start a cluster

OpsRequest
Edit cluster YAML file

Run the command below to start a cluster.

kubectl apply -f - <<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: mycluster-start
  namespace: demo
spec:
  clusterName: mycluster
  type: Start
EOF 

Change replicas back to the original amount to start this cluster again.

apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
    name: mycluster
    namespace: demo
spec:
  clusterDefinitionRef: xinference
  clusterVersionRef: xinference-0.11.0-cpu
  terminationPolicy: Delete
  componentSpecs:
  - name: xinference
    componentDefRef: xinference
    disableExporter: true  
    replicas: 1

Restart

Restart a cluster.

kubectl apply -f - <<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: mycluster-restart
  namespace: demo
spec:
  clusterName: mycluster
  type: Restart 
  restart:
  - componentName: xinference
EOF

Check the pod and operation status to validate the restarting.
```
kubectl get pod -n demo

kubectl get ops mycluster-restart -n demo
```
During the restarting process, there are two status types for pods.
- STATUS=Terminating: it means the cluster restart is in progress.
- STATUS=Running: it means the cluster has been restarted.

Manage Xinference with KubeBlocks

Before you start​

Create a cluster​

Scale vertically​

Before you start​

Steps​

Stop/Start a cluster​

Stop a cluster​

Start a cluster​

Restart​

Before you start

Create a cluster

Scale vertically

Before you start

Steps

Stop/Start a cluster

Stop a cluster

Start a cluster

Restart