richstokes / k8s-folding-at-home Goto Github PK
View Code? Open in Web Editor NEW⛑ Run folding@home on your Kubernetes cluster
License: BSD 3-Clause "New" or "Revised" License
⛑ Run folding@home on your Kubernetes cluster
License: BSD 3-Clause "New" or "Revised" License
In order to fully utilize a cluster without the need to scale a deployment (manually) I would suggest to add the option of a deployment via DaemonSet. This would run FaH on all nodes on the cluster.
I'm currently setting up my Raspberry Pi experimental cluster and will implement and test this addition.
Edit: Just figured out, that FaH will not run on ARM. So I have (at the moment) no cluster available for testing and implementing.
Ideas so far:
nodeSelector
Your image looks awesome, I'm really keen to try this on my cluster but I only have raspberry pis so far. Any chance you'd be able to release an arm (or arm64 specifically) image?
Background: I have a cluster with a few nodes with GPUs and capacity, so I wanted to use that extra capacity for this project.
First, I tried running the provided image as a daemonset on my GPU enabled nodes, but that failed because FAH couldn't find the CUDA libraries and thus did not detect the GPU. I solved that by make a new container based on the nvidia/cuda:10.0-runtime-ubuntu18.04 container, which enabled the FAH to find the CUDA libraries and detect the GPU.
Now though I'm seeing the following error:
ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually
Based on some googling, this is either a problem with my drivers, or with the config in FAH. Has anyone else seen this error? It seems like it can be solved from the FAH control interface, but I'm not sure how to expose that.
Hi! I'd love to eval this for a large org. Do you know what the minimum viable CPU is? Thanks!
I have found using Persistent Volume Claim mounts are helpful for FAH on K8s as it allows the container to checkpoint work (default is every 15m) and recover state for an existing assignment if the container is terminated.
To do this I have implement the StatefulSet deployment pattern, in order for each container to have its own unique PVC (in a deployment pattern they would all share the same PVC). I am mounting the PVC at mountPath: /var/lib/fahclient
. I don't know how much storage is expected/ required by FAH, but I can look into the average utilization on my cluster.
Happy to share my YAML as an example but I have forked a bit for my environment on Google Cloud/Google Kubernetes Engine.
Hi,
First of all thanks for creating this project, I've used it to quickly donate some of my spare CPU power. However I just wanted to let you know that the 256Mi memory limit in your configuration files is too low and that pods will get OOMKilled.
Ex:
NAME CPU(cores) MEMORY(bytes)
fah-5j7bg 1001m 48Mi
fah-plvwd 1001m 308Mi
fah-wh5tg 999m 110Mi
So far I haven't seen it exceed 512Mi though, maybe that's a better limit?
Thanks for your work!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.