Git Product home page Git Product logo

k8s-folding-at-home's People

Contributors

davidsouther avatar jsloyer avatar kaovilai avatar kbruner avatar richstokes avatar saipathuri avatar skandix avatar yhaenggi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k8s-folding-at-home's Issues

Add DaemonSet as second deployment option

In order to fully utilize a cluster without the need to scale a deployment (manually) I would suggest to add the option of a deployment via DaemonSet. This would run FaH on all nodes on the cluster.

I'm currently setting up my Raspberry Pi experimental cluster and will implement and test this addition.
Edit: Just figured out, that FaH will not run on ARM. So I have (at the moment) no cluster available for testing and implementing.

Ideas so far:

  • selection of nodes via nodeSelector
  • sane default resource limits to not over-utilize a cluster

Any chance of an arm version

Your image looks awesome, I'm really keen to try this on my cluster but I only have raspberry pis so far. Any chance you'd be able to release an arm (or arm64 specifically) image?

GPU slot failing to start

Background: I have a cluster with a few nodes with GPUs and capacity, so I wanted to use that extra capacity for this project.
First, I tried running the provided image as a daemonset on my GPU enabled nodes, but that failed because FAH couldn't find the CUDA libraries and thus did not detect the GPU. I solved that by make a new container based on the nvidia/cuda:10.0-runtime-ubuntu18.04 container, which enabled the FAH to find the CUDA libraries and detect the GPU.

Now though I'm seeing the following error:
ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually

Based on some googling, this is either a problem with my drivers, or with the config in FAH. Has anyone else seen this error? It seems like it can be solved from the FAH control interface, but I'm not sure how to expose that.

Minimum CPU?

Hi! I'd love to eval this for a large org. Do you know what the minimum viable CPU is? Thanks!

StatefulSets & Persistent Volume Claims

I have found using Persistent Volume Claim mounts are helpful for FAH on K8s as it allows the container to checkpoint work (default is every 15m) and recover state for an existing assignment if the container is terminated.

To do this I have implement the StatefulSet deployment pattern, in order for each container to have its own unique PVC (in a deployment pattern they would all share the same PVC). I am mounting the PVC at mountPath: /var/lib/fahclient . I don't know how much storage is expected/ required by FAH, but I can look into the average utilization on my cluster.

Happy to share my YAML as an example but I have forked a bit for my environment on Google Cloud/Google Kubernetes Engine.

Resource limit memory too low

Hi,

First of all thanks for creating this project, I've used it to quickly donate some of my spare CPU power. However I just wanted to let you know that the 256Mi memory limit in your configuration files is too low and that pods will get OOMKilled.

Ex:

NAME        CPU(cores)   MEMORY(bytes)
fah-5j7bg   1001m        48Mi
fah-plvwd   1001m        308Mi
fah-wh5tg   999m         110Mi

So far I haven't seen it exceed 512Mi though, maybe that's a better limit?

Thanks for your work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.