- Content for testing Accelerator Profiles features in RHODS/ODH
- Requirements
- Pick the right target namespace
- Applying the fixes (temporary)
- Ensure the Accelerator Profile CRD exists
- Creating some sample accelerator profiles
- Creating some dummy custom images with the profile annotations
- Adding a couple NVIDIA examples too
- Adding an OpenVino runtime that prefers Intel FLEX GPUs
- Confirming that Habana HPUs are truly being used
- notes for later:
- uninstall live build
This repository is a draft of the sort of content you'd need in order to test/experiment with Accelerator Profiles.
Notes:
- for now, the
identifiers
defined in the YAML files are probably not all accurate, and will need to be adjusted. If you know what the right values are, open an issue. - this does not include the taints that you should apply to your nodes in order to finely target nodes.
Ensure your cluster does not already have RHODS running in it. These instructions will setup a pre-release cut of RHODS and assume the cluster has neither RHODS nor ODH in it.
Ensure you are cluster-admin and have a working oc
command.
-
The command below will kick off the RHODS live build install.
-
Allow 5-10 minutes for it to complete
GH_ROOT="https://raw.githubusercontent.com/rh-aiservices-bu/accelerator-profiles-testing/main/manifests" oc apply -f ${GH_ROOT}/rhods-1.33-livebuild.yaml
The rest of these instructions will walk you through setting up extra resources in your cluster, and this has to be done in a specific namespace.
-
You should set the
NS
environment variable to eitheropendatahub
orredhat-ods-applications
, depending on which cluster you are working in, and where RHODS is running. -
If you used the above-mentioned RHODS live build, it's likely to be
redhat-ods-applications
. -
If you're not sure, you can execute:
if oc get namespace opendatahub >/dev/null 2>&1 && oc get namespace redhat-ods-applications >/dev/null 2>&1; then echo "Error: Both 'opendatahub' and 'redhat-ods-applications' namespaces exist." elif ! oc get namespace opendatahub >/dev/null 2>&1 && ! oc get namespace redhat-ods-applications >/dev/null 2>&1; then echo "Error: Neither 'opendatahub' nor 'redhat-ods-applications' namespaces exist." elif oc get namespace opendatahub >/dev/null 2>&1; then NS="opendatahub" elif oc get namespace redhat-ods-applications >/dev/null 2>&1; then NS="redhat-ods-applications" else NS="" echo "Error: not sure what happened" fi # Print the selected namespace echo "Selected namespace: $NS"
From this point on, make sure the NS
environment variable has the right value.
-
apply this to get the CRD
oc apply -f https://raw.githubusercontent.com/opendatahub-io/odh-dashboard/f/accelerator-support/manifests/crd/acceleratorprofiles.opendatahub.io.crd.yaml
-
so far, seems to be required still:
-
Apply this:
## do not apply this. Confirm it works for a regular (non cluster-admin) user GH_ROOT="https://raw.githubusercontent.com/rh-aiservices-bu/accelerator-profiles-testing/main/manifests" oc -n ${NS} apply -f ${GH_ROOT}/fix_acc_profiles.yaml
-
execute:
oc get crd | grep -i acceleratorprofile
-
should return
bash-4.4 ~ $ oc get crd | grep -i acceleratorprofile acceleratorprofiles.dashboard.opendatahub.io 2023-08-25T18:44:23Z bash-4.4 ~ $
-
if you don't see the line above, probably not worth continuing
-
Apply the YAML directly from the
main
branch of the github repo:GH_ROOT="https://raw.githubusercontent.com/rh-aiservices-bu/accelerator-profiles-testing/main/manifests" oc -n ${NS} apply -f ${GH_ROOT}/acc_profiles.yaml
-
If you need to delete them:
# oc -n ${NS} delete acceleratorprofile <actual_names>
-
Apply the YAML directly from the
main
branch of the github repo:GH_ROOT="https://raw.githubusercontent.com/rh-aiservices-bu/accelerator-profiles-testing/main/manifests" oc -n ${NS} apply -f ${GH_ROOT}/custom_workbench_images.yaml
-
Apply the YAML directly from the
main
branch of the github repo:GH_ROOT="https://raw.githubusercontent.com/rh-aiservices-bu/accelerator-profiles-testing/main/manifests" oc -n ${NS} apply -f ${GH_ROOT}/nvidia_examples.yaml
-
Apply the YAML directly from the
main
branch of the github repo:GH_ROOT="https://raw.githubusercontent.com/rh-aiservices-bu/accelerator-profiles-testing/main/manifests" oc -n ${NS} apply -f ${GH_ROOT}/serving_runtime.yaml
- not sure here. maybe these repos can help
look into NVIDIA GPU simulator: https://issues.redhat.com/browse/NVIDIA-50
This will remove RHODS from the cluster.
draft instructions (probably slightly incomplete):
## acc profiles deletion
oc -n redhat-ods-applications delete acceleratorprofiles --all
# rhods uninstall
oc create configmap delete-self-managed-odh -n redhat-ods-operator
oc label configmap/delete-self-managed-odh api.openshift.com/addon-managed-odh-delete=true -n redhat-ods-operator
PROJECT_NAME=redhat-ods-applications
while oc get project $PROJECT_NAME &> /dev/null; do
echo "$PROJECT_NAME still exists"
sleep 1
done
echo "$PROJECT_NAME no longer exists"
oc delete namespace redhat-ods-operator