Git Product home page Git Product logo

Comments (7)

YiannisGkoufas avatar YiannisGkoufas commented on June 3, 2024

That's an interesting issue. There are certainly few ways we can approach that.
Was thinking the following: you could prevent users from working the datasets all together. It's straightforward to modify the RBAC but let me know if you have problems with that. They would still be allowed to use the PVCs created from the Datasets.

from datashim.

viktoriaas avatar viktoriaas commented on June 3, 2024

That could be an option, however, it gets too static then. We aim for dynamic provisioning because with amount of PVC created on-demand, it wouldn't be sustainable to work like this. We have some workflows which create PVCs on the fly and right now, we are using some deprecated chart because with dlf, we would have to add everywhere Dataset creation (unrealistic) but we really want to use this framework.

The best would be to provide dynamic provisioning. Next to this solution would be to allow users to deploy datasets (so it is at least a bit flexible - they don't have to ask for every piece of storage) but have export path hidden from users. Then, the directory would be always created according to Dataset name on this path and PVC with same name provided.

I think the option of mounting already existing directory would cease to exists as the problem of mounting someone else's dir would still exist. But I don't think it's a big loss - if you want to have certain files preprepared, use initContainer to copy/create them in Deployement or Pod.

How do you feel about that? Have you thought about dynamic provisiong or there are other problems standing in way?

from datashim.

YiannisGkoufas avatar YiannisGkoufas commented on June 3, 2024

Hi @viktoriaas thanks a lot for the details on your use case. It definitely makes sense, I am just trying to model it in a way that the conventions are still valid for DLF.
Would something like this work:

  • We introduce a new CRD DatasetBase which will have the same specs with Dataset plus few more. For instance "allowOverride"
  • We add the functionality so that the Dataset can inherit the specs from a DatasetBase
  • The admin creates a DatasetBase with any root they want
  • The users are able to create Datasets extending the DatasetBase the admins specify
    That way if the admin has set "allowOverride": false, the users won't be able to mount any path they want on NFS
    Bear in mind that this requires a bit of work from our side.

We really appreciate the fact that you have embraced the framework so much and coming up with all those ideas and contributions!

from datashim.

viktoriaas avatar viktoriaas commented on June 3, 2024

@YiannisGkoufas
The steps you described sound great. Surely I don't expect you to have them done in a second 😄 We will wait and until then use our current solution.
Let me know when you need something or would like to test!


Just a side note, have you thought about supporting dynamic provisioning or this is the final solution?

from datashim.

YiannisGkoufas avatar YiannisGkoufas commented on June 3, 2024

The current functionality is already considered to be dynamic provisioning, in the sense that we create on the fly persistent volume claims. Now if you prefer to work with Storage Classes instead of DatasetBase, Dataset you can use directly the CSI NFS provisioner we bundle with DLF as well: https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/deploy/example/README.md

from datashim.

viktoriaas avatar viktoriaas commented on June 3, 2024

I assume that if I create nfs StorageClass (let's call it csi-nfs), then it is enough to specify storageClass in PVC and volume and everything else will be created.

Now, I have problems with understanding all the steps that have to be done. I tried to deploy csi-driver-nfs according to link you've supported (sorry, I haven't seen it before because there is not a direct link from main dlf repo :) ) but wasn't successful at all. In the end, even Dataset creation stopped working. However, I would like to know the steps I have to take to enable deployment only by defining StoraeClass in PVC.

  1. The first step is to setup a NFS server on Kubernetes cluster. However, if I have an existing server and share path? It is described here that new service will be created

which exposes the NFS server endpoint nfs-server.default.svc.cluster.local and the share path /.

But I have some other IP and share path. But anyway, I used provided command to deploy nfs-service

  1. Then I should install nfs-csi driver with provided link, that's okay, installation successful.
kube-system     csi-nfs-controller-7fb595656-mbm6p               3/3     Running        0          35m
kube-system     csi-nfs-controller-7fb595656-pnxdt               3/3     Running        0          35m
  1. Then I should deploy https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/deploy/example/storageclass-nfs.yaml to have StorageClass but this file again features that weird path and export. Here I've changed the server and share to our server IP and share path.
csi-nfs (default)   nfs.csi.k8s.io                         Retain          Immediate           false                  3h10m
  1. I create PVC aaand nothing happens. Stays in Pending forever.
galaxy-ns     galaxy-galaxy-pvc               Pending                                                                        csi-nfs        5m8s

logs from nfs-controller:

E0127 15:58:50.450860       1 utils.go:89] GRPC error: rpc error: code = Internal desc = failed to mount nfs server: rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o hard,nfsvers=4.1 147.251.6.50:/gpfs/vol1/nfs/wes /tmp/pvc-6cced430-5acf-4644-91a7-b279559e3386
Output: mount.nfs: Operation not permitted
I0127 15:58:52.410865       1 utils.go:84] GRPC call: /csi.v1.Controller/CreateVolume
I0127 15:58:52.410979       1 utils.go:85] GRPC request: {"capacity_range":{"required_bytes":10737418240},"name":"pvc-ee3cfb7f-14ec-4a81-9689-a5583842ad9a","parameters":{"server":"147.251.6.50","share":"/gpfs/vol1/nfs/wes/"},"volume_capabilities":[{"AccessType":{"Mount":{"mount_flags": ["hard","nfsvers=4.1"]}},"access_mode":{"mode":5}}]}
I0127 15:58:52.418528       1 controllerserver.go:249] internally mounting 147.251.6.50:/gpfs/vol1/nfs/wes at /tmp/pvc-ee3cfb7f-14ec-4a81-9689-a5583842ad9a
I0127 15:58:52.418640       1 nodeserver.go:77] NodePublishVolume: volumeID(147.251.6.50/gpfs/vol1/nfs/wes/pvc-ee3cfb7f-14ec-4a81-9689-a5583842ad9a) source(147.251.6.50:/gpfs/vol1/nfs/wes) targetPath(/tmp/pvc-ee3cfb7f-14ec-4a81-9689-a5583842ad9a) mountflags([hard nfsvers=4.1])
I0127 15:58:52.418728       1 mount_linux.go:146] Mounting cmd (mount) with arguments (-t nfs -o hard,nfsvers=4.1 147.251.6.50:/gpfs/vol1/nfs/wes /tmp/pvc-ee3cfb7f-14ec-4a81-9689-a5583842ad9a)
E0127 15:58:52.645433       1 mount_linux.go:150] Mount failed: exit status 32

so I lowered nfsvers to 3. Logs then:

I0127 16:06:46.754177       1 mount_linux.go:146] Mounting cmd (mount) with arguments (-t nfs -o hard,nfsvers=3 147.251.6.50:/gpfs/vol1/nfs/wes /tmp/pvc-887aec5b-8e43-4665-967a-e4f8d0248a1c)
E0127 16:06:49.454905       1 mount_linux.go:150] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t nfs -o hard,nfsvers=3 147.251.6.50:/gpfs/vol1/nfs/wes /tmp/pvc-887aec5b-8e43-4665-967a-e4f8d0248a1c
Output: 
E0127 16:06:49.455350       1 utils.go:89] GRPC error: rpc error: code = Internal desc = failed to mount nfs server: rpc error: code = Internal desc = mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t nfs -o hard,nfsvers=3 147.251.6.50:/gpfs/vol1/nfs/wes /tmp/pvc-887aec5b-8e43-4665-967a-e4f8d0248a1c
Output: 

How can I fix this? This would be very nice to have.

EDIT
I'm keeping all above if someone comes with same issue but we found out that nfs-controller is failing on memory. Dmesg output:

[ 4288.088143] Memory cgroup out of memory: Killed process 153388 (mount.nfs) total-vm:156416kB, anon-rss:70416kB, file-rss:4200kB, shmem-rss:0kB, UID:0

In this file we have increased limits on this line
and this one 10 times (just added a 0). Everything works as expecting, I think this solution is even better than Dataset + Dataset Base because noone can mount anything else. Do you want to still work on Dataset? I think this is perfect.

from datashim.

srikumar003 avatar srikumar003 commented on June 3, 2024

Closing issue as this is not in our present planning

from datashim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.