Comments (14)
@trajano when this PR will be integrated on master branch for release ?
@btheu: I suggested that I fork and publish a fixed version of the plugin and @trajano advised that that is the best way currently to get this fix released.
I'll comment here when the release from my own fork is ready. Thanks @trajano.
from docker-volume-plugins.
Yes! I'll submit a PR for using tini as that solved it for me.
from docker-volume-plugins.
Update: I'll try and get to publishing my port as soon as I can, some other priorities and the consequences of the global situation got in the way until now. FYI for my own purposes I did already publish a Docker image containing the tini
patch but there's no published source code for it nor any CI to build it, both of which I want to setup as a proper fork. It's also only the Gluster volume plugin, I'm not sure if I can or should take over the other backend volume plugins as I don't use them, know anything about them nor have a setup to test them. The plugin that I am using is here: https://hub.docker.com/r/ximoneighteen/glusterfs-volume-plugin. Apologies for the delay.
from docker-volume-plugins.
FYI I see the same issue with mwmahlberg/glusterfs-volume-plugin
.
I tried to reproduce with sapk/docker-volume-gluster
but hit sapk/docker-volume-gluster#34.
FYI I'm using glusterfs-server 3.13.2-1ubuntu1
on Ubuntu 18.04, on both LXC/LXD and DigitalOcean. In both cases I'm using a replica force volume on the default filesystem, nothing special.
As I can reproduce this creation of defunct glusterfs processes with another Docker volume plugin I wonder is this a GlusterFS issue or are both plugins misusing GlusterFS in the same way?
Or is the issue with GlusterFS itself? I'll check my logs.
Update: The only log changing is /var/logs/glusterfs/bricks/data-glustefs.log
. Here are three invocations of a Docker run with GlusterFS volume mount, console statements showing the increasing defunct process count, and the corresponding GlusterFS log entries:
root@xxx:~# date ; docker run -it --rm -v gfs:/data alpine ls -la / >/dev/null; ps ax | fgrep defunct | wc -l
Sat Mar 7 10:21:05 UTC 2020
8
root@xxx:~# date ; docker run -it --rm -v gfs:/data alpine ls -la / >/dev/null; ps ax | fgrep defunct | wc -l
Sat Mar 7 10:21:15 UTC 2020
9
root@xxx:~# date ; docker run -it --rm -v gfs:/data alpine ls -la / >/dev/null; ps ax | fgrep defunct | wc -l
Sat Mar 7 10:21:18 UTC 2020
10
root@xxx:~# tail -F -n 0 /var/log/glusterfs/bricks/data-glusterfs.log
[2020-03-07 10:21:06.328983] I [addr.c:55:compare_addr_and_update] 0-/data/glusterfs: allowed = "*", received addr = "10.34.5.139"
[2020-03-07 10:21:06.329140] I [login.c:76:gf_auth] 0-auth/login: allowed user names: 62d0249b-3e5e-4aba-b8e7-eb21dc05389a
[2020-03-07 10:21:06.329210] I [MSGID: 115029] [server-handshake.c:802:server_setvolume] 0-gfs-server: accepted client from CTX_ID:7553118d-7f66-4fdd-83a8-baca9a88e632-GRAPH_ID:0-PID:511-HOST:xxx-PC_NAME:gfs-client-0-RECON_NO:-0 (version: 5.9)
[2020-03-07 10:21:12.030149] I [MSGID: 115036] [server.c:535:server_rpc_notify] 0-gfs-server: disconnecting connection from CTX_ID:7553118d-7f66-4fdd-83a8-baca9a88e632-GRAPH_ID:0-PID:511-HOST:xxx-PC_NAME:gfs-client-0-RECON_NO:-0
[2020-03-07 10:21:12.033190] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-gfs-server: Shutting down connection CTX_ID:7553118d-7f66-4fdd-83a8-baca9a88e632-GRAPH_ID:0-PID:511-HOST:xxx-PC_NAME:gfs-client-0-RECON_NO:-0
[2020-03-07 10:21:16.340762] I [addr.c:55:compare_addr_and_update] 0-/data/glusterfs: allowed = "*", received addr = "10.34.5.139"
[2020-03-07 10:21:16.340923] I [login.c:76:gf_auth] 0-auth/login: allowed user names: 62d0249b-3e5e-4aba-b8e7-eb21dc05389a
[2020-03-07 10:21:16.341030] I [MSGID: 115029] [server-handshake.c:802:server_setvolume] 0-gfs-server: accepted client from CTX_ID:6feaa896-35e5-4d11-afef-e5924867cfa8-GRAPH_ID:0-PID:584-HOST:xxx-PC_NAME:gfs-client-0-RECON_NO:-0 (version: 5.9)
[2020-03-07 10:21:17.393601] I [MSGID: 115036] [server.c:535:server_rpc_notify] 0-gfs-server: disconnecting connection from CTX_ID:6feaa896-35e5-4d11-afef-e5924867cfa8-GRAPH_ID:0-PID:584-HOST:xxx-PC_NAME:gfs-client-0-RECON_NO:-0
[2020-03-07 10:21:17.396156] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-gfs-server: Shutting down connection CTX_ID:6feaa896-35e5-4d11-afef-e5924867cfa8-GRAPH_ID:0-PID:584-HOST:xxx-PC_NAME:gfs-client-0-RECON_NO:-0
[2020-03-07 10:21:19.344230] I [addr.c:55:compare_addr_and_update] 0-/data/glusterfs: allowed = "*", received addr = "10.34.5.139"
[2020-03-07 10:21:19.344426] I [login.c:76:gf_auth] 0-auth/login: allowed user names: 62d0249b-3e5e-4aba-b8e7-eb21dc05389a
[2020-03-07 10:21:19.344497] I [MSGID: 115029] [server-handshake.c:802:server_setvolume] 0-gfs-server: accepted client from CTX_ID:b368aeb3-91ce-49cc-b83b-6fe9c7488491-GRAPH_ID:0-PID:657-HOST:xxx-PC_NAME:gfs-client-0-RECON_NO:-0 (version: 5.9)
[2020-03-07 10:21:20.488265] I [MSGID: 115036] [server.c:535:server_rpc_notify] 0-gfs-server: disconnecting connection from CTX_ID:b368aeb3-91ce-49cc-b83b-6fe9c7488491-GRAPH_ID:0-PID:657-HOST:xxx-PC_NAME:gfs-client-0-RECON_NO:-0
[2020-03-07 10:21:20.490676] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-gfs-server: Shutting down connection CTX_ID:b368aeb3-91ce-49cc-b83b-6fe9c7488491-GRAPH_ID:0-PID:657-HOST:xxx-PC_NAME:gfs-client-0-RECON_NO:-0
And some info about my test setup:
root@xxx:~# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"
root@xxx:~# docker --version
Docker version 19.03.7, build 7141c199a2
root@xxx:~# docker plugin list
ID NAME DESCRIPTION ENABLED
41c9503b7039 glusterfs:v2.0.3 GlusterFS plugin for Docker true
root@xxx:~# docker volume ls
DRIVER VOLUME NAME
glusterfs:v2.0.3 gfs
root@xxx:~# gluster pool list
UUID Hostname State
c01bcd91-1dc1-4ff1-805e-55c8bde3598b xxx2 Connected
316e953e-4a35-45b8-903b-8765d9614736 localhost Connected
root@xxx:~# gluster volume list
gfs
root@xxx:~# gluster volume info gfs
Volume Name: gfs
Type: Replicate
Volume ID: 1bf162ec-24f3-4a74-997e-ea259a980daf
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: xxx:/data/glusterfs
Brick2: xxx2:/data/glusterfs
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
root@xxx:~# gluster volume status gfs
Status of volume: gfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick xxx:/data/glusterfs 49152 0 Y 1364
Brick xxx2:/data/glusterfs 49152 0 Y 1093
Self-heal Daemon on localhost N/A N/A Y 1386
Self-heal Daemon on xxx2 N/A N/A Y 1115
Task Status of Volume gfs
------------------------------------------------------------------------------
There are no active volume tasks
root@xxx:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2020-03-07 09:20:35 UTC; 1h 9min ago
Process: 271 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 274 (glusterd)
Tasks: 37 (limit: 18765)
CGroup: /system.slice/glusterd.service
├─ 274 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
├─1364 /usr/sbin/glusterfsd -s xxx --volfile-id gfs.xxx.data-glusterfs -p /var/run/gluster/vols/gfs/xxx-data-glusterfs.pid -S /var/run/gluster/e90dacd1d86048a591a862a6a322236d.socket --brick-name /data/glusterfs -l /var/lo
└─1386 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/b65dcfd2d8e1bbb51917e76651267c48.socket --xlator-opti
Mar 07 09:20:32 xxx systemd[1]: Starting GlusterFS, a clustered file-system server...
Mar 07 09:20:35 xxx systemd[1]: Started GlusterFS, a clustered file-system server.
With mwmahlberg/glusterfs-volume-plugin
I get exactly the same logs, only the (version: 5.9)
part is instead (version: 4.0.2)
.
from docker-volume-plugins.
Note: Mounting and unmounting via an /etc/fstab entry and writing to the mount point does not cause the defunct process count to increase.
Note: I can also reproduce the issue with just a single server and a gluster volume created with gluster volume create gfs xxx:/data/glusterfs force
.
Linux mount command produced this in the /var/log/glusterfs/bricks/data-glusterfs.log
:
[2020-03-07 11:09:06.041412] I [addr.c:55:compare_addr_and_update] 0-/data/glusterfs: allowed = "*", received addr = "10.34.5.178"
[2020-03-07 11:09:06.041614] I [login.c:76:gf_auth] 0-auth/login: allowed user names: 93295202-1f41-4b95-9f0a-3db00bc9eb83
[2020-03-07 11:09:06.041684] I [MSGID: 115029] [server-handshake.c:802:server_setvolume] 0-gfs-server: accepted client from xxx-2188-2020/03/07-11:09:06:9274-gfs-client-0-0-0 (version: 3.13.2)
And unmount caused this:
[2020-03-07 11:09:11.680512] I [MSGID: 115036] [server.c:535:server_rpc_notify] 0-gfs-server: disconnecting connection from xxx-2188-2020/03/07-11:09:06:9274-gfs-client-0-0-0
[2020-03-07 11:09:11.682759] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-gfs-server: Shutting down connection xxx-2188-2020/03/07-11:09:06:9274-gfs-client-0-0-0
These seem to match the GlusterFS log lines produced by the trajano Docker volume plugin.
from docker-volume-plugins.
So, as far as I can work it out it seems that this plugin executes a command to do the mount, which looks like glusterfs -s <servername> --volfile-id=<glusterfs_vol_name> </mount/point>
where the mount point is something to do with a DefaultDockerRootDirectory
. Unmount invokes Go function syscall.Unmount
which calls Syscall(SYS_UMOUNT2)
. The go docs say "Deprecated: this package is locked down. Callers should use the corresponding package in the golang.org/x/sys repository instead" where there is an Unmount
function which also calls Syscall(SYS_UMOUNT2)
.
If I execute such a command on my host myself, first making sure that the target mount point directory exists, then indeed I get a successful GlusterFS mount, and a background glusterfs
process. If I kill that process I do not get left with a defunct process and the mount disappears.
A defunct process implies a parent that hasn't or won't clean it up. I don't know anything about the Docker process model, I assume the docker run
command is just sending a request to the Docker daemon and thus the daemon is the parent of the trajano Go code process and also therefore the parent of the launched glusterfs process. Perhaps there is another way that a process should be launched than the way this plugin is currently using Go exec
?
Maybe the issue here is that since the glusterfs --help
command reports a -N --no-daemon
option the implication is that it daemonizes itself, so perhaps severes the connection to its parent? And maybe the plugin should be using a consistent mount/unmount approach, i.e. exec then kill, or syscall mount/syscall unmount, rather than a mixed model of exec then syscall unmount? Maybe the syscall unmount pulls the mount out from under the daemonized glusterfs process and causes it to die but it doesn't get reaped?
However, I see that the mwmahlberg/glusterfs-volume-plugin
uses a different approach of executing mount.glusterfs
and executing the unmount
command, but I still had defunct processes with that plugin as well... maybe again mount causes a daemonized glusterfs process and unmount doesn't explicitly cleanup that process. Maybe this is all just about how the glusterfs daemon handles shutdown? I would be interested to see if a hack to kill the process "works" as it did for me locally using the kill
shell command.
from docker-volume-plugins.
Passing --no-daemon
to the trajano glusterfs plugin causes container creation to block in Creating
state unsurprisingly. FYI I used this command:
docker volume create -d glusterfs:v2.0.3 -o glusteropts="-s xxx --volfile-id=gfs --no-daemon" gfs
from docker-volume-plugins.
I'm attempting to modify the plugin to store the pid then kill it instead of calling unmount. Will let you know how it goes. Thanks to build.sh
this looks like it could be very easy!
from docker-volume-plugins.
Hmm, I was wrong, killing the process also makes it defunct.
from docker-volume-plugins.
So, on my LXD VM at least, the process hierarchy is as follows (according to pstree -p -c
):
systemd -> containerd -> containerd-shim -> glusterfs-volume-plugin -> glusterfs
And it's this glusterfs
process that becomes defunct. Maybe the issue is that the glusterfs process doesn't have the same process group as the parent glusterfs-volume-plugin instance?
Using pstree -p -g
I can see the process group IDs:
containerd-shim(3370,3370)-+-glusterfs-volum(3387,3387)-+-glusterfs(3954,3954)
So, wikipedia, source of all truth, says about defunct processes:
This occurs for the child processes, where the entry is still needed to allow the parent process to read its child's exit status: once the exit status is read via the wait system call, the zombie's entry is removed from the process table and it is said to be "reaped".
So the gluster-volume-plugin doesn't call wait
, I'll try that.
from docker-volume-plugins.
The plugin uses CombinedOutput() as mentioned in the ticket here about defunct processes caused by Go exec, however the help given that ticket doesn't apply here as we cannot stop or do not want to stop glusterfs becoming a daeamon.
This looks relevant: gluster/glusterfs#626
from docker-volume-plugins.
Note: running docker plugin disable
on the trajano/glusterfs-docker-volume plugin causes the defunct processes to be reaped, as if it holds a reference to them or is blocking their reaping somehow. I wonder if the defunct process is NOT the run glusterfs
process but a child of that process and that we need to read its exit code or release it in some way?
from docker-volume-plugins.
Maybe the problem is as simple as the plugin not being a single process docker container but launching other processes and not acting as a true PID 1 process so nothing ever reaps the other processes, i.e. use tini or s6 or any of the other Docker mini init implementations?
from docker-volume-plugins.
@ximon18 Your PR have successfully fixed this issue after tests on same condition.
@trajano when this PR will be integrated on master branch for release ?
Best regards,
from docker-volume-plugins.
Related Issues (20)
- Error response from daemon: manifest for lukics/glusterfs-volume-plugin:latest not found HOT 1
- --name option to name the volume is missing for cifs-volume-plugin HOT 2
- Passing special mount options through glusteropts option ( enable-ino32 ) HOT 5
- Can't mount volume HOT 10
- Need help in creating volume with docker service command HOT 1
- Failure to create directory with GlusterFS 4.1
- When mounting a CIFS volume, the service does not get replicated HOT 3
- glusterfs volume name HOT 1
- Do I need create the folder "/vol1" on volume manual or not? HOT 1
- Question: Which latest version of glusterfs plugin is work? HOT 2
- VolumeDriver.Mount: error mounting data: exit status 1 HOT 17
- Subdir is not created and therefore doesn't seem to work, need to create subdir first HOT 2
- Transport endpoint is not connected HOT 1
- failed to set file label ... operation not supported HOT 2
- name Additional property name is not allowed HOT 1
- name Additional property name is not allowed HOT 2
- VolumeDriver.Mount: error mounting <volume>: exit status 1 HOT 5
- Cannot mount CIFS shares: permission denied HOT 1
- Gluster mount drive vs this plugin ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docker-volume-plugins.