Comments (1)
Here's the template as generated by ks
ks show gke -c cnn
---
apiVersion: tensorflow.org/v1alpha1
kind: TfJob
metadata:
name: cnn
namespace: default
spec:
replicaSpecs:
- replicas: 1
template:
spec:
containers:
- args:
- python
- tf_cnn_benchmarks.py
- --batch_size=32
- --model=resnet50
- --variable_update=parameter_server
- --flush_stdout=true
- --num_gpus=1
image: gcr.io/kubeflow/tf-benchmarks-gpu:v20171202-bdab599-dirty-284af3
name: tensorflow
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
resources:
limits:
nvidia.com/gpu: 1
restartPolicy: OnFailure
tfReplicaType: MASTER
- replicas: 1
template:
spec:
containers:
- args:
- python
- tf_cnn_benchmarks.py
- --batch_size=32
- --model=resnet50
- --variable_update=parameter_server
- --flush_stdout=true
- --num_gpus=1
image: gcr.io/kubeflow/tf-benchmarks-gpu:v20171202-bdab599-dirty-284af3
name: tensorflow
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
resources:
limits:
nvidia.com/gpu: 1
restartPolicy: OnFailure
tfReplicaType: WORKER
- replicas: 1
template:
spec:
containers:
- args:
- python
- tf_cnn_benchmarks.py
- --batch_size=32
- --model=resnet50
- --variable_update=parameter_server
- --flush_stdout=true
- --num_gpus=1
image: gcr.io/kubeflow/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
name: tensorflow
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
restartPolicy: OnFailure
tfReplicaType: PS
tfImage: gcr.io/kubeflow/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
It looks like resources aren't being specified at the right level; i.e. the are being specified for the pod and not the container.
As a result they don't end up in the actual TfJob spec
apiVersion: tensorflow.org/v1alpha1
kind: TfJob
metadata:
clusterName: ""
creationTimestamp: 2017-12-20T05:48:12Z
generation: 0
name: cnn
namespace: default
resourceVersion: "5359495"
selfLink: /apis/tensorflow.org/v1alpha1/namespaces/default/tfjobs/cnn
uid: 5d6cfebc-e549-11e7-b842-42010af0014d
spec:
RuntimeId: vtcm
replicaSpecs:
- IsDefaultPS: false
replicas: 1
template:
metadata:
creationTimestamp: null
spec:
containers:
- args:
- python
- tf_cnn_benchmarks.py
- --batch_size=32
- --model=resnet50
- --variable_update=parameter_server
- --flush_stdout=true
- --num_gpus=1
image: gcr.io/kubeflow/tf-benchmarks-gpu:v20171202-bdab599-dirty-284af3
name: tensorflow
resources: {}
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
restartPolicy: OnFailure
tfPort: 2222
tfReplicaType: MASTER
- IsDefaultPS: false
replicas: 1
template:
metadata:
creationTimestamp: null
spec:
containers:
- args:
- python
- tf_cnn_benchmarks.py
- --batch_size=32
- --model=resnet50
- --variable_update=parameter_server
- --flush_stdout=true
- --num_gpus=1
image: gcr.io/kubeflow/tf-benchmarks-gpu:v20171202-bdab599-dirty-284af3
name: tensorflow
resources: {}
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
restartPolicy: OnFailure
tfPort: 2222
tfReplicaType: WORKER
- IsDefaultPS: false
replicas: 1
template:
metadata:
creationTimestamp: null
spec:
containers:
- args:
- python
- tf_cnn_benchmarks.py
- --batch_size=32
- --model=resnet50
- --variable_update=parameter_server
- --flush_stdout=true
- --num_gpus=1
image: gcr.io/kubeflow/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
name: tensorflow
resources: {}
workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
restartPolicy: OnFailure
tfPort: 2222
tfReplicaType: PS
tensorboard: null
tfImage: tensorflow/tensorflow:1.3.0
status:
conditions: null
controlPaused: false
phase: Running
reason: ""
replicaStatuses:
- ReplicasStates:
Running: 1
state: Running
tf_replica_type: MASTER
- ReplicasStates:
Failed: 1
state: Failed
tf_replica_type: WORKER
- ReplicasStates:
Running: 1
state: Running
tf_replica_type: PS
state: ""
from kubeflow.
Related Issues (20)
- failed to create a pytorch container for using Kubeflow UI
- (kubeflow.error): Code 500 with message HOT 2
- How to sinicize kubeflow HOT 2
- How can I show a dynamic kubeflow dashboard by each namespace? HOT 1
- dex-passwords secret missing when applied from master branch HOT 6
- Trying to get in touch regarding a security issue HOT 9
- Ability to resize stopped notebooks. HOT 3
- Upgrade the centraldashboard for supporting the BoundServiceAccountTokenVolume HOT 3
- Create notebooks page and get [500] error HOT 1
- Custom notebook result in a "upstream connect error or disconnect/reset before header" HOT 1
- Integrate KubeFlow with GitLab HOT 1
- Newline characters in PythonComponent arguments break YAML compilation HOT 2
- [TRACKING] Kubeflow 1.8.1 (Notebooks WG Components) HOT 6
- Creating or starting notebook throws 500 error - KeyError: 'message' HOT 1
- tensorboard fails to connect if there is a notebook with the same name
- Notebook doesn't created after upgrade to Kubeflow 1.8 HOT 2
- [TRACKING] Kubeflow 1.9.0 (Notebooks WG Components) HOT 2
- Problem with PVCviewer-Controller deployment - golang error HOT 10
- AttributeError: module 'asyncio' has no attribute 'coroutine'
- Package does not install. Pip installation just stops without any error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubeflow.