richstokes / k8s-folding-at-home Goto Github PK

View Code? Open in Web Editor NEW

105.0 9.0 30.0 42 KB

⛑ Run folding@home on your Kubernetes cluster

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 100.00%

folding kubernetes coronavirus research folding-at-home covid kubernetes-deployment

k8s-folding-at-home's People

Contributors

Stargazers

Watchers

k8s-folding-at-home's Issues

Add DaemonSet as second deployment option

In order to fully utilize a cluster without the need to scale a deployment (manually) I would suggest to add the option of a deployment via DaemonSet. This would run FaH on all nodes on the cluster.

~~I'm currently setting up my Raspberry Pi experimental cluster and will implement and test this addition.~~
Edit: Just figured out, that FaH will not run on ARM. So I have (at the moment) no cluster available for testing and implementing.

Ideas so far:

selection of nodes via nodeSelector
sane default resource limits to not over-utilize a cluster

Any chance of an arm version

Your image looks awesome, I'm really keen to try this on my cluster but I only have raspberry pis so far. Any chance you'd be able to release an arm (or arm64 specifically) image?

GPU slot failing to start

Background: I have a cluster with a few nodes with GPUs and capacity, so I wanted to use that extra capacity for this project.
First, I tried running the provided image as a daemonset on my GPU enabled nodes, but that failed because FAH couldn't find the CUDA libraries and thus did not detect the GPU. I solved that by make a new container based on the nvidia/cuda:10.0-runtime-ubuntu18.04 container, which enabled the FAH to find the CUDA libraries and detect the GPU.

Now though I'm seeing the following error:
ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually

Based on some googling, this is either a problem with my drivers, or with the config in FAH. Has anyone else seen this error? It seems like it can be solved from the FAH control interface, but I'm not sure how to expose that.

Minimum CPU?

Hi! I'd love to eval this for a large org. Do you know what the minimum viable CPU is? Thanks!

StatefulSets & Persistent Volume Claims

I have found using Persistent Volume Claim mounts are helpful for FAH on K8s as it allows the container to checkpoint work (default is every 15m) and recover state for an existing assignment if the container is terminated.

To do this I have implement the StatefulSet deployment pattern, in order for each container to have its own unique PVC (in a deployment pattern they would all share the same PVC). I am mounting the PVC at mountPath: /var/lib/fahclient . I don't know how much storage is expected/ required by FAH, but I can look into the average utilization on my cluster.

Happy to share my YAML as an example but I have forked a bit for my environment on Google Cloud/Google Kubernetes Engine.

Resource limit memory too low

Hi,

First of all thanks for creating this project, I've used it to quickly donate some of my spare CPU power. However I just wanted to let you know that the 256Mi memory limit in your configuration files is too low and that pods will get OOMKilled.

Ex:

NAME        CPU(cores)   MEMORY(bytes)
fah-5j7bg   1001m        48Mi
fah-plvwd   1001m        308Mi
fah-wh5tg   999m         110Mi

So far I haven't seen it exceed 512Mi though, maybe that's a better limit?

Thanks for your work!

richstokes / k8s-folding-at-home Goto Github PK

k8s-folding-at-home's People

Contributors

Stargazers

Watchers

Forkers

k8s-folding-at-home's Issues

Add DaemonSet as second deployment option

Any chance of an arm version

GPU slot failing to start

Minimum CPU?

StatefulSets & Persistent Volume Claims

Resource limit memory too low

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent