Current situation:
Currently, opnepai-runtime is tightly coupled with PAI and Framework Controller.
We just split the code but some logic is mixed. To use runtime, we need to use PAI and framework controller.
We need decouple with these components for independent release cycle and efficient development.
Current Problem:
- Add features need to cross many repos.
- Runtime can not be used by other projects, need to modify runtime code if third-party user want to customize runtime.
- Runtime only can work with PAI and framework controller.
Methods:
Treat all PAI related logical as runtime-plugin
. Then openpai-runtime
repo only keep the main logical, PAI related code will be treated as PAI specific runtime plugin and maintained in PAI repo.
To implement this, we introduce two concept: init-plugin
and runtime-plugin
. init-plugin
is maintained by developer and used to generate executable code which run in runtime-plugin
. End users don't known anything about the init-plugin
.
runtime-plugin
is used by end user. End user use this plugin to run some command before/after actually commands.
Workflow for openpai-runtime.
- Start init-container, read
init-plugin
spec and run init-plugin sequentially.
- Read
runtime-plugin
spec generate runtime executable file
- Start user container, run runtime executable and start the user commands.
Implementation
Init plugin config spec
For init-plugin
, it will run in init container. The workdir for init plugin is init.d
folder. These plugins is doing some preparing actions such as render user commands... Here is a sample spec for init-plugin
. The plugins will run sequentially :
initPlugins:
- name: frameworkBarrier
command: 'frameworkBarrier framework.json'
- name: frameworkParser
command: 'python frameworkParser.py framework.json'
- name: imageChecker
command: 'python imageChecker.py'
- name: portChecker
command: 'python portChecker.py portListFile'
- name: userCommandRender
command: 'python command_render.py'
This spec can be transfer to runtime through INIT_CONFIG
env or can be a file named init_plugins.yaml
under PAI_CONFIG_DIR
. We will try to parse INIT_CONFIG
env first. If this env is empty, we will try to read init_plugins.yaml
. If init_plugins.yaml
is absent, the default config init_plugins_default.yaml
will be used.
Assumption about init-plugin
We believe init-plugin is rarely changes。 Each cluster only has one configured init-plugin config. So we prefer put init-plugin config into docker image or k8s configMap
Runtime plugin & secret & exitSpec & env
Runtime spec need to be passed through RUNTIME_CONFIG
env, or can be a file at PAI_CONFIG_DIR/runtime_plugin.yaml
The spec for runtime plugin
commands: ["ls -al"]
runtimePlugin:
- plugin: ssh
parameters:
jobssh: true
- plugin: teamwise_storage
parameters:
storageConfigNames:
- confignfs
secret file should stored at ${PAI_CONFIG_DIR}/secret.yaml
and exit-spec should stored at ${PAI_CONFIG_DIR}/runtime-exit-spec.yaml
for environment
which want to pass to user container, please put env into ${PAI_RUNTIME_DIR}/env
Development & Usage
Customize runtime
In init container, we will try to run scripts under /user/local/pai/init.d folder. If you want to customize your init-container, please put your scripts under /user/local/pai/init.d folder
The way to build init-container:
FROM openpairuntime/openpai-runtime:latest
COPY src/* /user/local/pai/init.d
If init-plugin will output to a file, it's developer responsibility to make sure the file is on the correct path and don't overwrite something. It's developer responsibility to maintain the customized config file and make sure it's work
Use openPAI runtime
apiVersion: v1
kind: Pod
metadata:
name: job
namespace: default
spec:
initContainers:
- name: init
image: openpairuntime/openpai-runtime:latest
env:
- name: RUNTIME_CONFIG
value: >-
commands: ["ls -al", "echo hi"]
runtimePlugin:
- plugin: ssh
parameters:
jobssh: true
- plugin: teamwise_storage
parameters:
storageConfigNames:
- confignfs
- name: INIT_CONFIG
value: >-
initPlugins:
- name: frameworkParser
command: pai/frameworkBarrier framework.json
- name: frameworkParser
command: frameworkParser.py framework.json
volumeMounts:
- name: pai-vol
mountPath: '/usr/local/pai'
- name: 'job-secrets'
mountPath: '/usr/local/pai/config/secrets.yaml'
- name: 'job-exit-spec'
mountPath: '/usr/local/pai/config/runtime-exit-spec.yaml'
containers:
- name: app
image: ubuntu:latest
resources:
limits:
memory: "200Mi"
requests:
memory: "100Mi"
command: ['/usr/local/pai/runtime']
volumeMounts:
- name: pai-vol
mountPath: '/usr/local/pai'
- name: 'job-secrets'
mountPath: '/usr/local/pai/config/secrets.yaml'
- name: 'job-exit-spec'
mountPath: '/usr/local/pai/config/runtime-exit-spec.yaml'
volumes:
- name: pai-vol
emptyDir: {}
- name: 'job-secrets'
secret:
secretName: 'job-secrets'
- name: 'job-exit-spec'
configMap:
name: runtime-exit-spec-configuration
Result
After this change runtime repo will only keep common plugin:
imageChecker
, userCommandRender
, portConflictChecker
, envGenerator
. Each plugin will have clear interface and developer can reuse these plugins.
PAI related plugins will move to PAI repo. such as frameworkBarrier
, frameworkParser
...
Interfaces:
ENV: INIT_CONFIG
, RUNTIME_CONFIG
File: PAI_CONFIG_DIR/init_plugins.yaml
${PAI_CONFIG_DIR}/secret.yaml
, ${PAI_CONFIG_DIR}/runtime-exit-spec.yaml
, PAI_CONFIG_DIR/runtime_plugin.yaml
Pro:
For new runtime requirement, can be implement rather as init_plugin and runtme_plugin. Do not need to change runtime code is the feature is PAI specific.
Runtime can be reused by other project
Con:
- New interface, much work to do. Complex data/config pass through env, not friendly for end-user.
- And new config, the job spec size may larger than before. (Can let other plugin provide task spec, such as call API to get task sepc and put it into some path)
TBD
- How to customize image build. Allow user customize init container, will need to copy file into docker image. Should provide a pattern for build new runtime.