Instant Clusters
This document explains how to create an instant cluster and how to start training with a Kubernetes Cluster.
Create your Instant Cluster
- Create a Cluster
text Text
1. Click on the Cluster size, for example 8xH100
2. Enter a cluster name
3. Choose a cluster type
4. Select a Region
5. Select the required duration for your cluster
6. Create and name your shared volume. The minimum size is 1TB
7. Optional: Select your Nvidia driver and CUDA versions
8. Click on Proceed
- Check Status of your Cluster
- Increase your cluster size : click on the ??in the cluster line and click on Edit Cluster and click on ??umber of GPUs??select the desired amount and click update
Start training with Kubernetes
-
Prerequisites: install kubernetes in your environment. For example on MAC install this: https://kubernetes.io/docs/tasks/tools/install-kubectl-macos/
-
Get the cluster Kubeconfig
To schedule kubernetes jobs on your cluster, download the kubectl context from the Instant Clusters UI page and copy it to your local machine
text Text
~/.kube/config_k8s_bytecompute_instant
export KUBECONFIG=$HOME/.kube/config_k8s_bytecompute_instant or kubectl --kubeconfig=$HOME/.kube/config_k8s_bytecompute_instant get nodes
Note: It?? possible to name config as the default ??onfig?? If doing so, make sure to take a backup of your current config file prior
- Verify you can connect to your K8s cluster
text Text
kubectl get nodes
NAME STATUS ROLES AGE VERSION
5fa43eae-01.cloud.bytecompute.ai Ready <none> 21h v1.31.4+k3s1
5fa43eae-02.cloud.bytecompute.ai Ready <none> 21h v1.31.4+k3s1
5fa43eae-hn1.cloud.bytecompute.ai Ready control-plane,etcd,master 22h v1.31.4+k3s1
5fa43eae-hn2.cloud.bytecompute.ai Ready control-plane,etcd,master 8h v1.31.4+k3s1
5fa43eae-hn3.cloud.bytecompute.ai Ready control-plane,etcd,master 22h v1.31.4+k3s1
- How to deploy a pod from a docker image
- Create a manifest yaml for storage to mount on your container
- Apply the manifest:
kubectl apply -f pvc.yaml
text Text
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Ti
storageClassName: shared-rdma
***
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: local-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Ti
storageClassName: scratch-storage-gpu
iii. Create a manifest yaml file with your docker image and mount the volumes created above. This is a general purpose shell test pod with ubuntu allowing you to see files on the data volume for example.
text Text
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: test-pod
image: [registry/]repository/ubuntu[:tag]
command: ["/bin/sh", "-c"]
args: ["sleep infinity"]
volumeMounts:
- name: shared-pvc
mountPath: /<path-for-shared>
- name: local-pvc
mountPath: /<path-for-local>
volumes:
- name: shared-pvc
persistentVolumeClaim:
claimName: shared-pvc
- name: local-pvc
persistentVolumeClaim:
claimName: local-pvc
---- Real manifest ----
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
restartPolicy: Never
containers:
- name: ubuntu
image: debian:stable-slim
command: ["/bin/sh", "-c", "sleep infinity"]
volumeMounts:
- name: shared-pvc
mountPath: /mnt/shared
- name: local-pvc
mountPath: /mnt/local
volumes:
- name: shared-pvc
persistentVolumeClaim:
claimName: shared-pvc
- name: local-pvc
persistentVolumeClaim:
claimName: local-pvc
b. Create the pod by running kubectl apply -f manifest.yaml
c. Get a shell into the pod by running kubectl exec -it test-pod -- bash
- How to access to the Kubernetes Dashboard
You can access the k8s dashboard by clicking on your cluster?? name, then click on the k8s dashboard url. You will be prompted to enter a password, which can be obtained as follows:
text Text
kubectl --kubeconfig=$HOME/.kube/config get secret admin-user-token -n
kubernetes-dashboard -o jsonpath={".data.token"} | base64 -d
Authenticate with Bytecompute Cloud via Google SSO
You can authenticate with Bytecompute Cloud using Google Single Sign-On (SSO) via the tcloud CLI. Run the following command:
text Text
%tcloud sso login
Your verification code is: ABC-DEFG-HIJ
Opening browser to https://www.google.com/device
Waiting for device authorization...
Open https://www.google.com/device in your browser and enter the verification code to complete authentication. Note: You must be part of an approved Google Workspace organization to authenticate.
Create a cluster
Callout: Cluster creation requires a valid payment method to be set up in your account.
You can add a payment method at https://api.bytecompute.ai/settings/billing.
text Text
tcloud cluster create my-cluster
--num-gpus 8
--reservation-duration 1
--instance-type H100-SXM
--region us-central-8
--shared-volume-name my-volume
--size-tib 1
Deleting a cluster
text Text
tcloud cluster delete \<CLUSTER_UUID>
