Comments (31)
I have implemented solution 2 for GKE. You can choose between LVM or RAID. It assumes that you want to combine all the available disks together (which is not necessarily a correct assumption for how everyone does K8s): https://github.com/pingcap/tidb-operator/blob/master/manifests/gke/local-ssd-provision/local-ssd-provision.yaml
from sig-storage-local-static-provisioner.
@nerddelphi I believe the namePattern could be used to support matching scsi disks, although I think there will be challenges to distinguish a SCSI local SSD from a SCSI PD if you match on /dev/sd*. I believe the same issue happens for nvme as well.
@andyzhangx regarding extending the provisioner, can we make the setup action scriptable, similar to what we do for cleaning block devices? This will make the solution more customizable to any configuration.
from sig-storage-local-static-provisioner.
hi, @nerddelphi
your manual operation is correct but unfortunately, there is no automatic solution right now. I'm thinking about writing a cloud controller to automate this.
from sig-storage-local-static-provisioner.
CRD is more flexible and Kubernetes-native way to configure, it seems a good idea to have an operator to do these tasks (option 3). Had a discussion with @gnufied, we can add raid support in local-storage-operator. What do you think?
from sig-storage-local-static-provisioner.
Since this PR(#187) has already added namePattern
parameter, what about this lightweight design: add a new parameter raid
in storageClassMap
, e.g. in following example, provisioner discovery will:
- in the discovery loop, it will check whether
/dev/md0
exists, if exists, then skip; if not:- discover all
/dev/nvme*
devices(with basic capacity check), format those devices, and make RAID as/dev/md0
. - create a new PV with
/dev/md0
asFilesystem
volumeMode
- discover all
if
raid
is empty, then don't set up RAID, compatible with default config.
So on every agent node, provisioner would at most create a new PV with local.path
: /dev/md0
as Filesystem
volumeMode
apiVersion: v1
kind: ConfigMap
metadata:
name: local-provisioner-config
namespace: default
data:
storageClassMap: |
fast-disks:
hostDir: /dev
mountDir: /dev
blockCleanerCommand:
- "/scripts/shred.sh"
- "2"
volumeMode: Filesystem
fsType: ext4
namePattern: "nvme*"
raid: "md0"
from sig-storage-local-static-provisioner.
We published an example of a DaemonSet that can RAID the disks on GKE. If you're not on GKE, the example could be adapted to other enviornments: https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/docs/getting-started.md#option-2-gke
from sig-storage-local-static-provisioner.
cc @gnufied who has also been working on an operator
In the past, we've wanted to keep a clear separation between environment-specific prep and the general PV lifecycle management, but I can see value in providing some optional helpers if it's beneficial to many users (and I have seen many requests for supporting raid setup). I would still like to keep it separate from the actual provisioner process so that we don't complicate the logic there (and also potentially require installing mdadm in the container image for everyone). So either options 2) or 3) sounds good to me.
I think the biggest question to figure out is how will the disk names be passed in? List every disk? Pattern match? Nodes can have different number/names for disks.
from sig-storage-local-static-provisioner.
I'd prefer the second option.
I think the biggest question to figure out is how will the disk names be passed in? List every disk? Pattern match? Nodes can have different number/names for disks.
Yes, if we want to support local volume prep in various environments, the configuration must be flexible.
This is my proposal, what do you think?
class "local" {
dir = "/mnt/raid-local"
# mode defaults to "filesystem"
# mode = "filesytem"
}
class "local-device" {
dir = "/mnt/raid-local-device"
mode = "block"
}
#
# For all gke-demo-default-pool-* nodes, we combine all local SSDs into one
# raid0 disk and format/mount it into "local" class directory.
#
node "gke-demo-default-pool-*" {
raid0 md0 {
class = "local"
disks = ["/dev/disk/by-id/google-local-ssd-*"]
}
}
#
# For all gke-demo-another-pool-* nodes, we combine two local SSDs into one
# raid0 disk and link the disk to "local-device" class directory.
#
node "gke-demo-another-pool-*" {
raid0 md0 {
class = "local-device"
disks = ["/dev/disk/by-id/google-local-ssd-0", "/dev/disks/by-id/google-local-ssd-1"]
}
raid0 md1 {
class = "local-device"
disks = ["/dev/disk/by-id/google-local-ssd-2", "/dev/disks/by-id/google-local-ssd-3"]
}
raid0 md2 {
class = "local-device"
disks = ["/dev/disk/by-id/google-local-ssd-4", "/dev/disks/by-id/google-local-ssd-5"]
}
}
The configuration language is HCl which is used by terraform.
from sig-storage-local-static-provisioner.
We have been working on a local-storage operator that uses following API to allow user to specify disks that can be used by local-storage-provisioner - https://github.com/openshift/local-storage-operator/blob/master/pkg/apis/local/v1alpha1/types.go#L54 (example: https://github.com/openshift/local-storage-operator/blob/master/examples/olm/create-cr.yaml )
@cofyc An earlier version of API we proposed for local-storage-operator allowed specifying wildcards and regexp, but at least we quickly realized that we may have to allow users to specify exclusion mechanism (like don't use this disk but use others that match this regex). It might be worth starting small and keeping surface area of API small and gather user feedback and then iterate on design. If we allow wildcards/regexes from v1 then, it will be hard to rollback on them.
I agree with @msau42 that separating disk preparation and general PV lifecycle managment is a good idea and since kubelet itself is capable of formatting disks, this provisioner does not need to do that (at least for non-RAID volumes).
from sig-storage-local-static-provisioner.
Another simpler solution is to annotate the node to tell provisioner or sidecar of it to combine the disks before mounting (filesystem) or symlinking the combined disk to discovery directory.
from sig-storage-local-static-provisioner.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
from sig-storage-local-static-provisioner.
/remove-lifecycle stale
Awesome! We can consider adding the script here in some addons folder if you think that would be beneficial.
from sig-storage-local-static-provisioner.
To be more generally useful you would probably want to do disk combining based on some node pool labeling scheme or other metadata available at startup.
This solution also causes a failure when the node restarts due to brittleness in GKE startup scripts. This has been reported in multiple places. When reporting this to GKE support they told me that un-mounting disks is not supported at this time and they don't care to make this situation more transparent in their documentation.
from sig-storage-local-static-provisioner.
@gregwebs this is awesome, I have been looking for something like this for a very long time! It would be absolutely awesome to have this as a ready-to-use component rather than a large code copy/paste. A few notes:
- GCP will soon (hopefully) introduce NVMEs --
gcloud alpha
already supports --local-ssd-volumes parameter. They will be listed as/dev/nvme*
rather thanssd*
. Also, it seems they can be created without being formatted withformat=block
. - for some reason
mdadm
kept raising 141 exit code, despite seemingly completing successfully. Have you had that issue? - could that code be packaged into a published docker hub image? I already created nyurik/kuberaid (uses a very simple script to force format), but yours is far better and more thorough.
Thank you for you awesome work on this!
from sig-storage-local-static-provisioner.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
from sig-storage-local-static-provisioner.
@nyurik sorry I missed your message. GCP improvements here are still in the alpha phase.
I haven't seen errors from mdadm.
We just updated the script for an incompatibility with newer GKE image verisons.
You are welcome to take the script for your docker image.
from sig-storage-local-static-provisioner.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
from sig-storage-local-static-provisioner.
/lifecycle frozen
from sig-storage-local-static-provisioner.
/remove-lifecycle frozen
@msau42 @gregwebs @nyurik @cofyc @gnufied @schallert
Hi there.
I'm excited using Local SSDs in GKE and make a RAID-0 Volume using theses disks. Although, even with the daemonset of local-static-provisioner helm chart with initContainer and RAID script from @gregwebs, I have an critical issue simulating disrupting scenarios.
If I use a StatefulSet with PVC, like this
volumeClaimTemplates:
- metadata:
name: local-vol
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "local-storage"
resources:
requests:
storage: 700Gi
and node is recreated after a node-pool upgrade (example), the sts pod stuck on Pending state and I have to delete its PVC and that pod manually (PV is auto-deleted after PVC deletion, once PV-disk no more exist). So the new pod is scheduled on new upgraded node and the new PVC (poiting to new PV) is created as well.
Are you facing that issue? If yes, how do you deal it? If no, what you suggest me?
Thank you.
from sig-storage-local-static-provisioner.
@andyzhangx Excellent job! Is there any way to do with SCSI interface, once GKE only support NVMe in alpha clusters (beta and GA are SCSI)?
Thank you!
from sig-storage-local-static-provisioner.
@andyzhangx Excellent job! Is there any way to do with SCSI interface, once GKE only support NVMe in alpha clusters (beta and GA are SCSI)?
Thank you!
I am not aware of that. do you have the link about SCSI interface support? @nerddelphi
from sig-storage-local-static-provisioner.
@andyzhangx I'm using localssd in my GKE nodes and I can confirm only SCSI interface is available on GKE beta/ga clusters (once there're only /dev/sdX disk on node, pointing o localssds).
NVMe are available in alpha -> https://cloud.google.com/sdk/gcloud/reference/alpha/container/node-pools/create#--local-ssd-volumes
@nyurik said the same in this thread as well -> #65 (comment)
Perhaps your code could check if localssds are NVMe or SCSI (asking for the user).
Thank!
from sig-storage-local-static-provisioner.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
from sig-storage-local-static-provisioner.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
from sig-storage-local-static-provisioner.
/remove-lifecycle rotte
from sig-storage-local-static-provisioner.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
from sig-storage-local-static-provisioner.
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen
.
Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from sig-storage-local-static-provisioner.
/reopen
/remove-lifecycle rotten
/lifecycle frozen
from sig-storage-local-static-provisioner.
@msau42: Reopened this issue.
In response to this:
/reopen
/remove-lifecycle rotten
/lifecycle frozen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from sig-storage-local-static-provisioner.
/kind feature
from sig-storage-local-static-provisioner.
Any movement on this? This would be ideal for a solution I am working on.
from sig-storage-local-static-provisioner.
Related Issues (20)
- Release Helm chart for v2.5.0 HOT 5
- Transition Windows daemonset to become a HostProcess container HOT 4
- Helm -- nodeSelector not applied to windows daemonset HOT 4
- Remove generation code for helm v2 HOT 1
- Missing doc on 2.5.0 version compatibility with K8s versions in readme.md HOT 1
- v2.5.0 Release breaks helm chart, version, etc. and provides v2.4.0 application HOT 2
- Switch from k8s.gcr.io to registry.k8s.io HOT 1
- Use Helm chart manifests in the e2e tests HOT 6
- Please help me HOT 9
- Can't scrape metrics for more than 1000 nodes HOT 4
- Add the ability to disable the Windows daemonset HOT 6
- Multiple PersistentVolumes from single local device HOT 7
- Take ownership of the Helm repo on Artifact Hub HOT 8
- Deletion of PVCs on statefulsetscale-up due to low intervals HOT 7
- Failed to discover local volumes: failed to get ReclaimPolicy from storage class "fast-disks": storageclass.storage.k8s.io "fast-disks" not found HOT 1
- 0/1 nodes are available: 1 node(s) didn't find available persistent volumes to bind. HOT 2
- Fix Critical CVE-2019-8457 by upgrading base image to debian-base:bookworm-v1.0.1
- node-cleanup deleter process repeatedly deleting unused, but valid PVs
- Any plans to cut a new release soon? HOT 3
- Clean up docs (don't mention very old Kubernetes versions)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sig-storage-local-static-provisioner.