xcat2 / goconserver Goto Github PK
View Code? Open in Web Editor NEWMicroservice based console server to help log and redirect the terminal content for multiple session hosts.
License: Eclipse Public License 1.0
Microservice based console server to help log and redirect the terminal content for multiple session hosts.
License: Eclipse Public License 1.0
This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
# for node in foo{000..999}
> do
> congo create $node driver=ssh ondemand=false \
> --params user=foo,host=10.3.1.9,port=22,password=something
> congo logging $node on
> done
I created 1,000 node definitions in congo
. With 1,000 active ssh session with console log enabled, the /var
file system was filled up quickly.
# df -h /var/log/goconserver
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-var 9.8G 9.8G 0 100% /var
# du -sh /var/log/goconserver/nodes
9.7G /var/log/goconserver/nodes
After that, connect to any of the node with congo console
will fail.
# congo console foo997
The connection is disconnected
Session is teminated unexpectedly, retrying....
The connection is disconnected
Session is teminated unexpectedly, retrying....
The connection is disconnected
Session is teminated unexpectedly, retrying....
What actually happened
It seems goconserver
refuse to work when it cannot to write console log any more. As the /var
file system was filled up.
# ps ax
PID TTY STAT TIME COMMAND
<<<...omit...>>>
16290 ? Ssl 394:43 /usr/local/bin/goconserver
<<<...omit...>>>
# lsof -p 16290 | wc -l
446
# lsof -p 16290 | wc -l
396
# lsof -p 16290 | wc -l
399
# lsof -p 16290 | wc -l
501
# lsof -p 16290 | wc -l
580
# lsof -p 16290 | wc -l
490
# lsof -p 16290 | wc -l
393
What is expected
congo console
work.This issue is against goconserver
version 0.3.1. While goconserver
introduced broadcast node in pull request #45, it failed to support xCAT hierarchy mode.
# goconserver --version
Version: 0.3.1, BuildTime: 2018-08-20T21:15:20-0400 Commit: 1a3762c6de1dab60f5f2dedac659c532c1eda76f
In my test environment, c910f03c01p09
act as xCAT management node, c910f03c01p10
act as xCAT service node, and c910f03c01p19
act as the compute node.
# lsdef c910f03c01p10 -i setupconserver
Object name: c910f03c01p10
setupconserver=2
# lsdef c910f03c01p19 -i conserver,servicenode,xcatmaster
Object name: c910f03c01p19
conserver=c910f03c01p10
servicenode=c910f03c01p10
xcatmaster=c910f03c01p10
In the configuration above, Run rcons c910f03c01p10,c910f03c01p19
will fail. See details below.
# rcons c910f03c01p10,c910f03c01p19
Could not connect to c910f03c01p10, error: Node not exist
Connection to c910f03c01p10 closed.
# rcons c910f03c01p19,c910f03c01p10
Could not connect to c910f03c01p10, error: Node not exist
Connection to c910f03c01p10 closed.
I'm having an issue where I need the goconserver to listen on multiple IPs (local IP, HA IP and 127.0.0.1), but not listen on all IPs present on the system.
It appears that the host:
parameter in the configuration file only accepts one IP address.
I would like to be able to specify multiple listener IPs for goconserver.
When building goconserver on Ubuntu 18, the following warnings are issued:
npm WARN deprecated [email protected]: Deprecated. Please use https://github.com/webpack-contrib/mini-css-extract-plugin
npm WARN deprecated [email protected]: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
npm WARN deprecated [email protected]: gulp-util is deprecated - replace it, following the guidelines at https://medium.com/gulpjs/gulp-util-ca3b1f9f9ac5
npm WARN deprecated [email protected]: https://github.com/lydell/resolve-url#deprecated
npm WARN deprecated [email protected]: Please see https://github.com/lydell/urix#deprecated
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: please upgrade to graceful-fs 4 for compatibility with current and future versions of Node.js
npm WARN deprecated [email protected]: This module relies on Node.js's internals and will break at some point. Do not use it, and update to [email protected].
npm WARN deprecated [email protected]: request has been deprecated, see https://github.com/request/request/issues/3142
WARN engine [email protected]: wanted: {"node":"<8.10.0"} (current: {"node":"8.10.0","npm":"3.5.2"})
WARN engine [email protected]: wanted: {"node":"^8.16.0 || ^10.6.0 || >=11.0.0"} (current: {"node":"8.10.0","npm":"3.5.2"})
npm WARN deprecated [email protected]: Chokidar 2 will break on node v14+. Upgrade to chokidar 3 with 15x less dependencies.
npm WARN prefer global [email protected] should be installed with -g
I'm using 0.2.0
with xCAT. My console section looks like:
console:
datadir: /var/lib/goconserver/ # the data file to save the hosts
port: 12430 # the port for console
log_timestamp: true # log the timestamp at the beginning of line
reconnect_interval: 10 # retry interval in second if console could not be connected
logger: # multiple logger targets could be specified
file: # file logger, valid fields: name,logdir. Accept array in yaml format
- name: default # the identity name customized by user
logdir: /var/log/consoles # default log directory of xcat
tcp: # valied fields: name, host, port, timeout, ssl_key_file, ssl_cert_file, ssl_ca_cert_file, ssl_insecure
- name: rsyslog
host: 127.0.0.1
port: 5140
It seems that goconserver
will output the next timestamp before any output is generated. If I tail a console, I have a timestamp with no newline at the end:
[[email protected] consoles]# tail -n2 h41n01.log
[2018-01-29 16:40:10] [30595.853585] Policy zone: DMA
[2018-01-29 16:40:10] [[email protected] consoles]#
This test shows that the timestamps are incorrect:
on h41n01:
[root@h41n01 ~]# echo > /dev/console;date > /dev/console; date > /dev/console;sleep 10; date > /dev/console;echo > /dev/console
Checking the log:
[[email protected] consoles]# tail -n6 h41n01.log
[2018-01-29 16:40:10]
[2018-01-29 16:47:01] Mon Jan 29 16:47:01 EST 2018
[2018-01-29 16:47:01] Mon Jan 29 16:47:01 EST 2018
[2018-01-29 16:47:01] Mon Jan 29 16:47:11 EST 2018
[2018-01-29 16:47:11]
[2018-01-29 16:47:11] [[email protected] consoles]#
The third date should have a timestamp of 11 seconds, not 1.
Dear GoConServer experts,
Since I could not find a configuration option for the "escape key": will that eventually become a configuration option? I suppose not only our environment does not work well with "Ctrl-E C ."
Thanks!
If the console session is not established, list users command may fail with nil error
{"file":"github.com/xcat2/goconserver/api/node/command.go (51)","level":"debug","msg":"Receive GET request /command/user/bulknode2 map[node:bulknode2] from 127.0.0.1:52125.","time":"2018-03-26T16:57:45+08:00"}
2018/03/26 16:57:45 http: panic serving 127.0.0.1:52125: runtime error: invalid memory address or nil pointer dereference
goroutine 211 [running]:
net/http.(*conn).serve.func1(0xc42039c640)
/usr/local/go/src/net/http/server.go:1697 +0xd0
panic(0x15dbaa0, 0x19f1410)
/usr/local/go/src/runtime/panic.go:491 +0x283
github.com/xcat2/goconserver/console.(*Console).ListSessionUser(0x0, 0xc4201bc330, 0xc4202d4612, 0x9)
/Users/longcheng/Project/golang/src/github.com/xcat2/goconserver/console/console.go:246 +0x49
github.com/xcat2/goconserver/console.(*NodeManager).ListUser(0xc42005aa00, 0xc4202d4612, 0x9, 0x4, 0xc42018b288, 0xc420021020, 0x55)
/Users/longcheng/Project/golang/src/github.com/xcat2/goconserver/console/server.go:939 +0x494
github.com/xcat2/goconserver/api.(*CommandApi).listUser(0xc4201b8960, 0x19b2300, 0xc420448460, 0xc42017b800)
/Users/longcheng/Project/golang/src/github.com/xcat2/goconserver/api/command.go:52 +0x224
github.com/xcat2/goconserver/api.(*CommandApi).(github.com/xcat2/goconserver/api.listUser)-fm(0x19b2300, 0xc420448460, 0xc42017b800)
/Users/longcheng/Project/golang/src/github.com/xcat2/goconserver/api/command.go:20 +0x48
net/http.HandlerFunc.ServeHTTP(0xc4201ba710, 0x19b2300, 0xc420448460, 0xc42017b800)
/usr/local/go/src/net/http/server.go:1918 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc42006de60, 0x19b2300, 0xc420448460, 0xc42017b800)
/Users/longcheng/Project/golang/src/github.com/gorilla/mux/mux.go:133 +0xed
net/http.serverHandler.ServeHTTP(0xc420184dd0, 0x19b2300, 0xc420448460, 0xc42017b600)
/usr/local/go/src/net/http/server.go:2619 +0xb4
net/http.(*conn).serve(0xc42039c640, 0x19b2c00, 0xc4201ef140)
/usr/local/go/src/net/http/server.go:1801 +0x71d
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2720 +0x288
This issue was brought to our attention from a user of xCAT where a 3rd party tool scanning the output of the console logs running goconsever
did not detect the log entry until the next log entry was written.
This causes a problem because an entry from a certain day will not be caught until a next entry is written to the log due to the placement of the newline. If no entry gets logged, it would be potentially multiple days before the issue is caught by the monitoring tool.
We do see references to the newline being added before the timestamp in the code.
goconserver/console/pipeline/logger.go
Line 209 in 9cff69d
Should this be opened on xcat-core? or here?
Trying to open console on mic05tor12cn15.
[root@briggs01 ~]# rcons mid05tor12cn15
Could not find node mid05tor12cn15
Maybe we could do better....
Could not find node mid05tor12cn15, did you run 'makegocons mid05tor12cn15'?
After running, it is OK
[root@briggs01 ~]# makegocons mid05tor12cn15
mid05tor12cn15: Created
[root@briggs01 ~]# rcons mid05tor12cn15
[Enter `^Ec?' for help]
goconserver(2017-12-08 14:45:22): Hello 172.10.253.27:34166, welcome to the session of mid05tor12cn15
Red Hat Enterprise Linux Server 7.4 Beta (Maipo)
Kernel 4.11.0-39.el7a.ppc64le on an ppc64le
mid05tor12cn15 login:
I have TCP hooked up to rsyslog
. It seems that newlines are not included in the output, but carriage returns are. I would expect both to be stripped.
{"type": "console","message":"[30595.853585] Policy zone: DMA\r","node":"h41n01","date":"2018-01-29 16:40:10.62790"}
This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
# congo create c910f03c17 driver=cmd ondemand=false --params cmd="ipmitool -I lanplus -H 50.3.17.1 -U '' -P PASSW0RD sol activate"
Created
# congo console c910f03c17
goconserver(2017-11-08 03:41:59): Hello 127.0.0.1:57410, welcome to the session of c910f03c18
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:09): Hello 127.0.0.1:57412, welcome to the session of c910f03c18
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:19): Hello 127.0.0.1:57414, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:29): Hello 127.0.0.1:57416, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:39): Hello 127.0.0.1:57418, welcome to the session of c910f03c17
Error: Unable to establish IPMI v2 / RMCP+ session
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:49): Hello 127.0.0.1:57420, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:59): Hello 127.0.0.1:57422, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:43:09): Hello 127.0.0.1:57424, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:43:19): Hello 127.0.0.1:57426, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
The connection is disconnected
What actually happened
The intention is to run the command, ipmitool -I lanplus -H 50.3.17.1 -U '' -P PASSW0RD sol activate
, under the cmd
driver of congo
. Please notice, in the ipmitool
command, there is an empty string as one of its command line argument. For pass such an empty string in bash
, it can be done with single quotes. But for congo
, there is no way to do this.
Currently, there is no way to pass a space character inside the command line argument. Space character is always used as the separation character.
What is expected
bash -c 'exec run a command'
to run the command, in this way, bash
will do the command line parsing works.This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-02T06:57:35-0400 Commit: 7e9278e88eb4f3035707b41f0a717fede7291498
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
Create a node definition in congo
. And then connect to the newly created node. BTW, the node, c910f03c01p09
is just another common Linux node, which runs Red Hat 7.4.
# congo create c910f03c01p09 driver=ssh ondemand=false \
> --params user=root,host=10.3.1.9,port=22,password=a_password
# congo console c910f03c01p09
goconserver(2017-11-07 23:47:16): Hello 127.0.0.1:57278, welcome to the session of c910f03c01p09
[root@c910f03c01p09 ~]# showkey -a
Press any keys - Ctrl-D will terminate this program
<<<-- PRESS `Ctrl-E', `c', AND `.' HERE
^E 5 0005 0x05
c 99 0143 0x63
What actually happened
It seems, the escape sequence, Ctrl-E
, c
was sent to the remote side through ssh session.
What is expected
Here is the expect behavior, in this way, the escape sequence will not send to the remote side.
Ctrl-E
is pressed by the user, hold it in the buffer, do not sent it to the remote side.c
, sent Ctrl-E
and this input character to the remote side.c
, hold it in the buffer as well..
, sent Ctrl-E
, c
and this input character to the remote side..
, clear the buffer, disconnect the session.When console is not created, this is what we see..
[root@csm03 xcat]# rcons f5n12
{"file":"github.com/xcat2/goconserver/console/client.go (296)","level":"error","msg":"Fatal error: Could not connect to f5n12\n","node":"f5n12","time":"2018-08-03T21:47:50-04:00"}
The connection is disconnected
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
{"file":"github.com/xcat2/goconserver/console/client.go (296)","level":"error","msg":"Fatal error: Could not connect to f5n12\n","node":"f5n12","time":"2018-08-03T21:48:00-04:00"}
The connection is disconnected
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
Disconnected
Then after creating the console:
[root@csm03 xcat]# makegocons f5n12
f5n12: Created
[root@csm03 xcat]# rcons f5n12
[Enter `^Ec?' for help]
goconserver(2018-08-03T21:48:12-04:00): Hello 192.168.10.1:47132, welcome to the session of f5n12
Can we improve it?
Is it possible to implement standard log rotation mechanism using signal handling?
An example of a good mechanism for log rotation is nginx
, which does log rotation via SIGUSR1
: when the nginx binary receives this signal, it closes and reopens the log file descriptors. This is used in tandem with logrotate
by creating a configuration for nginx in /etc/logrotate.d
which contains a postrotate
script kill -USR1 $NGINX_PID
. logrotate rotates the logs, then signals nginx to close and reopen logs so that nginx has the correct logs open for writing.
Many systems use milliseconds (elasticsearch), microseconds, or nanoseconds; but 5 zeros is very uncommon. Consider moving this to 3 or 6. Better yet, allow the format to be customized in the config file.
Hi!
I have the strangest issue where rcons
and wcons
don't display anything anymore for some nodes (not all), whereas using ipmitool
to access the serial-on-lan console works perfectly.
Things used to work perfectly, but now, I can't see some of those consoles anymore using goconserver
.
For instance, for node sh03-sn07
:
# makegocons -q sh03-sn07
NODE SERVER STATE
sh03-sn07 sh02-hn01.SUNet connected
The goconserver
service is running on sh02-hn01.SUNet
# systemctl status goconserver.service
● goconserver.service - goconserver console daemon
Loaded: loaded (/usr/lib/systemd/system/goconserver.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/goconserver.service.d
└─override.conf
Active: active (running) since Wed 2022-11-30 16:46:25 PST; 14min ago
Docs: https://github.com/xcat2/goconserver
Process: 39625 ExecStop=/bin/kill -TERM $MAINPID (code=exited, status=0/SUCCESS)
Main PID: 39627 (goconserver)
CGroup: /system.slice/goconserver.service
├─14798 perl /opt/xcat/share/xcat/cons/ipmi sh03-sn07
├─39627 /usr/bin/goconserver
└─39671 perl /opt/xcat/share/xcat/cons/kvm sh-vm-sl-test02
Nov 30 16:46:25 sh02-hn01.SUNet systemd[1]: Started goconserver console daemon.
And using ipmitool
to access the serial console works well (to rule out a node BIOS/IPMI configuration problem):
# ipmitool -I lanplus -U $IPMI_USER -P $IPMI_PASSWORD -H sh03-sn07.infra sol activate
[SOL Session operational. Use ~? for help]
CentOS Linux 7 (Core)
Kernel 3.10.0-1160.80.1.el7.x86_64 on an x86_64
sh03-sn07 login:
~. [terminated ipmitool]
But rcons
just displays a blank screen after initiating the connection:
# rcons sh03-sn07
[Enter `^Ec?' for help]
goconserver(2022-11-30T17:08:04-08:00): Hello 10.18.0.1:38424, welcome to the session of sh03-sn07
[Disconnected]
Shared connection to sh-hn01 closed.
As indicated above, the console state is connected
so things seem to work.
Activating "debug" logging in /etc/goconserver/server.conf
, the following message are logged during the rcons
session:
{"file":"github.com/xcat2/goconserver/console/server.go (289)","level":"debug","msg":"New client connection received.","time":"2022-11-30T17:10:25-08:00"}
{"file":"github.com/xcat2/goconserver/console/proto.go (153)","level":"debug","msg":"Receive connection from client: {\"action\":1,\"node\":\"sh03-sn07\"}","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/server.go (332)","level":"info","msg":"Register client connection successfully.","node":"sh03-sn07","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (43)","level":"debug","msg":"Accept connection from client","node":"sh03-sn07","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (138)","level":"debug","msg":"Create new connection to write message to client.","node":"sh03-sn07","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (101)","level":"debug","msg":"Create new connection to read message from client.","node":"sh03-sn07","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (203)","level":"warning","msg":"Could not receive message from remote. Error:%!(EXTRA string=read /dev/ptmx: input/output error)","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (177)","level":"debug","msg":"readTarget goroutine quit","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/server.go (196)","level":"info","msg":"Start console again due to the ondemand setting.","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (286)","level":"debug","msg":"Close console session.","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (168)","level":"info","msg":"Failed to send message to client. Error:tls: use of closed connection","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (116)","level":"warning","msg":"Failed to receive message head from client. Error:read tcp 10.18.0.1:12430-\u003e10.18.0.1:39008: use of closed network connection.","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (103)","level":"debug","msg":"writeTarget goroutine quit","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (140)","level":"debug","msg":"writeClient goroutine quit","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/server.go (142)","level":"debug","msg":"Restart console session.","node":"sh03-sn07","time":"2022-11-30T17:10:41-08:00"}
{"file":"github.com/xcat2/goconserver/plugins/cmd.go (64)","level":"debug","msg":"Could not get tty size, use 80,80 as default","node":"sh03-sn07","time":"2022-11-30T17:10:41-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (240)","level":"debug","msg":"Start console session.","node":"sh03-sn07","time":"2022-11-30T17:10:41-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (175)","level":"debug","msg":"Read target session has been initialized.","node":"sh03-sn07","time":"2022-11-30T17:10:41-08:00"}
which doesn't seem to show any obvious problem.
Other (identical) nodes still work normally:
# rcons sh03-sn01
[Enter `^Ec?' for help]
goconserver(2022-11-30T17:12:37-08:00): Hello 10.18.0.1:39220, welcome to the session of sh03-sn01
CentOS Linux 7 (Core)
Kernel 3.10.0-1160.80.1.el7.x86_64 on an x86_64
sh03-sn01 login:
Would there be any way to debug this further, and identify the issue?
Thanks!
Without much documentation , I am giving the goconserver
a try and seeing what I can figure out from a usability perspective. Will edit/add to this issue as I find more...
Any reason why we choose congo
instead of gocons
as the command name?
Message when conserver or goconserver is running can be better
When conserver is running, and trying to execute makegocons
, the following message comes out:
[root@stratton01 ~]# makegocons
Error: conserver is started, please stop it at first.
Suggest something like:
[root@stratton01 ~]# makegocons
Error: conserver is running, did you mean 'makeconservercf'? If not, stop conserver and retry.
The reverse:
[root@fs2vm110 ~]# makeconservercf f6u17
Error: goconserver is started, please stop it at first.
Suggest something like:
[root@fs2vm110 ~]# makeconservercf f6u17
Error: goconserver is running, did you mean `makegocon'? If not, stop goconserver and retry.
This may help those who have muscle memory and typing the wrong command. Also would help those who are un-aware the admin switched to goconserver, and trying to run makeconservercf
on a console before running rcons
.
congo list
doesn't seem to work
[root@fs2vm112 ~]# congo list
Could not list resources, Get http://127.0.0.1:12429/nodes: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
[root@fs2vm112 ~]#
Other congo commands have a similar problem
[root@fs2vm111 gurevich]# congo create testnode driver=ssh ondemand=false --params user=root,host=10.6.7.254,port=22,password=xxxxx
Post http://127.0.0.1:12429/nodes: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02\x16"
[root@fs2vm111 gurevich]# congo create testnode driver=cmd ondemand=false --params cmd="ssh -l root -p 22 10.6.7.254"
Post http://127.0.0.1:12429/nodes: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02\x16"
[root@fs2vm111 gurevich]#
Why do we create a new server.conf file https://github.com/xcat2/xcat-core/blob/master/xCAT-server/lib/perl/xCAT/Goconserver.pm#L524. Wouldn't it be better to sed the values instead of creating a static file. If goconserver ships a newer version of the server.conf file, it will revert back to this version hard coded in xcat-core...
It looks like #38 added some support for clustering, but there isn't much documentation. What features are currently supported, and what is deferred to future implementation?
For example:
If I setup 4 goconservers and add 40 nodes, could I expect each server to (approximately) handle 10 nodes each? If one of the goconservers were to fail, would its ~10 nodes be evenly distributed among the other 3? When it recovers, would that server take its ~10 nodes back over?
Is there any plan to integrate cluster-mode/etcd into xCAT?
This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
# congo create c910f03c01p09 driver=ssh ondemand=false --params user=root,host=10.3.1.9,port=22,password=a_password
Created
# congo create c910f03c01p09 driver=ssh ondemand=false --params user=root,host=10.3.1.9,port=22,password=a_password
Error: unexpected response status code
What actually happened
Create the same node definition in congo
twice cause an unexpected error.
What is expected
Enroll the node
[root@c910f05c01bc02k74 ~]# chdef kvmguest1 consoleondemand=1
1 object definitions have been created or modified.
[root@c910f05c01bc02k74 ~]# makegocons kvmguest1
kvmguest1: Created
[root@c910f05c01bc02k74 ~]# makegocons -q
NODE SERVER STATE
kvmguest1 c910f05c01bc02k74 enroll
Connect to the node
[root@c910f05c01bc02k74 ~]# rcons kvmguest1
[Enter `^Ec?' for help]
goconserver(2018-03-18T23:26:08-04:00): Hello 10.5.102.74:36146, welcome to the session of kvmguest1
kvmguest1 login:
Ubuntu 16.04.1 LTS kvmguest1 ttyS0
Then disconnect
kvmguest1 login: [Disconnected]
Check the status
[root@c910f05c01bc02k74 ~]# makegocons -q
NODE SERVER STATE
kvmguest1 c910f05c01bc02k74 error
The console state should be available
.
When trying to delete a console... I was not sure if it was -d
or -D
and so i tried -D
first, the output I saw is:
[root@stratton01 ~]# makegocons -D c910f3zz01
In preprocess_request, request is $VAR1 = {
'_xcat_clientfqdn' => [
'localhost'
],
'arg' => [
'-D'
],
'_xcat_authname' => [
'root'
],
'_xcat_clientport' => [
45572
],
'node' => [
'c910f3zz01'
],
'username' => [
'root'
],
'noderange' => [
'c910f3zz01'
],
'_xcatdest' => '10.6.29.1',
'cwd' => [
'/root'
],
'_allnodes' => [
0
],
'clienttype' => [
'cli'
],
'_xcat_clienthost' => [
'localhost'
],
'command' => [
'makegocons'
],
'_xcatpreprocessed' => [
1
]
};
c910f3zz01: Created
So seems like some verbose (DEBUG?) data is printed, but the console is "Created"
The correct command is -d
[root@stratton01 ~]# makegocons -d c910f3zz01
c910f3zz01: Deleted
Env:
root@c910f05c01bc02k70:~/goconserver/build# uname -a
Linux c910f05c01bc02k70 4.4.0-75-generic #96-Ubuntu SMP Thu Apr 20 09:56:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Output:
root@c910f05c01bc02k70:~/goconserver# cd build && VERSION=0.2.1 ../dirty-debuild goconserver_linux_amd64.tar.gz
-rw-r--r-- 1 root root 9422625 Feb 26 04:36 goconserver-repack-amd64.tar.gz
drwxr-xr-x root/root 0 2018-02-26 04:36 etc/
drwxr-xr-x root/root 0 2018-02-26 04:36 etc/profile.d/
-rw-r--r-- root/root 318 2018-02-26 04:36 etc/profile.d/congo.sh
drwx------ root/root 0 2018-02-26 04:36 etc/goconserver/
-rw-r--r-- root/root 3060 2018-02-26 04:36 etc/goconserver/server.conf
drwxr-xr-x root/root 0 2018-02-26 04:36 usr/
drwxr-xr-x root/root 0 2018-02-26 04:36 usr/bin/
-rwxr-xr-x root/root 15515876 2018-02-26 04:36 usr/bin/congo
-rwxr-xr-x root/root 16034720 2018-02-26 04:36 usr/bin/goconserver
drwxr-xr-x root/root 0 2018-02-26 04:36 usr/lib/
drwxr-xr-x root/root 0 2018-02-26 04:36 usr/lib/systemd/
drwxr-xr-x root/root 0 2018-02-26 04:36 usr/lib/systemd/system/
-rw-r--r-- root/root 301 2018-02-26 04:36 usr/lib/systemd/system/goconserver.service
drwxr-xr-x root/root 0 2018-02-26 04:36 var/
drwxr-xr-x root/root 0 2018-02-26 04:36 var/log/
drwx------ root/root 0 2018-02-26 04:36 var/log/goconserver/
drwx------ root/root 0 2018-02-26 04:36 var/log/goconserver/nodes/
drwxr-xr-x root/root 0 2018-02-26 04:36 var/lib/
drwx------ root/root 0 2018-02-26 04:36 var/lib/goconserver/
Directories goconserver-0.2.1 and goconserver-0.2.1.orig prepared.
dh_testdir
dh_testdir
dh_testroot
dh_prep
dh_installdirs
dh_installdocs
dh_installchangelogs
find . -maxdepth 1 -mindepth 1 -not -name debian -print0 | \
xargs -0 -r -i cp -a {} debian/
dh_compress
dh_makeshlibs
dh_installdeb
dh_shlibdeps
dh_gencontrol
dh_md5sums
dh_builddeb
Check the deb package, no result:
root@c910f05c01bc02k70:~/goconserver# find ./ -name "*.deb"
root@c910f05c01bc02k70:~/goconserver#
@neo954 commented on Tue May 21 2019
Here is the bug recreation steps.
Run rcons
against a compute node, and then, press Ctrl-E C ?
. After that, run tftp client, tftp
. The tftp client will print out command line prompt repeatedly and endlessly.
root@f6u13k13:~# tftp
tftp>
tftp>
root@f6u13k13:~# rcons f6u13k15
[Enter `^Ec?' for help]
goconserver(2019-05-21T04:12:05-04:00): Hello 10.6.13.13:33358, welcome to the session of f6u13k15
[Disconnected]
root@f6u13k13:~# tftp
tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp>
... <<< omit thousands of lines >>> ...
Additional information
The xCAT management node runs Ubuntu 18.04.2 on a ppc64el node. It has xCAT 2.15-snap201905170621
installed. The xCAT compute node f6u13k15
is a regular KVM guest.
root@f6u13k13:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
root@f6u13k13:~# go-xcat check
Operating system: linux
Architecture: ppc64le
Linux Distribution: ubuntu
Version: 18.04
go-xcat Version: 1.0.38
Reading repositories ...... done
xCAT Core Packages
==================
Package Name Installed In Repository
------------ --------- -------------
perl-xcat 2.15-snap201905170621 2.15-snap201905170621
xcat 2.15-snap201905170621 2.15-snap201905170621
xcat-buildkit 2.15-snap201905170621 2.15-snap201905170621
xcat-client 2.15-snap201905170621 2.15-snap201905170621
xcat-confluent (not installed) 2.15-snap201905170621
xcat-genesis-scripts-amd64 2.15-snap201905170621 2.15-snap201905170621
xcat-genesis-scripts-ppc64 2.15-snap201905170621 2.15-snap201905170621
xcat-probe 2.15-snap201905170621 2.15-snap201905170621
xcat-server 2.15-snap201905170621 2.15-snap201905170621
xcat-test 2.15-snap201905170621 2.15-snap201905170621
xcat-vlan (not installed) 2.15-snap201905170621
xcatsn (not installed) 2.15-snap201905170621
xCAT Dependency Packages
========================
Package Name Installed In Repository
------------ --------- -------------
elilo-xcat 3.14-4 3.14-4
goconserver 0.3.2-snap201811080419 0.3.2-snap201811080419
grub2-xcat 2.02-0.76.el7.1.snap2019051602 2.02-0.76.el7.1.snap2019051602
ipmitool-xcat 1.8.18 1.8.18
syslinux-xcat 3.86-2 3.86-2
xcat-genesis-base-amd64 2.14.5-snap201811190037 2.14.5-snap201811190037
xcat-genesis-base-ppc64 2.14.5-snap201811160710 2.14.5-snap201811160710
xnba-undi 1.0.3-7 1.0.3-7
root@f6u13k13:~# lsdef f6u13k15 -z
# <xCAT data object stanza file>
f6u13k15:
objtype=node
addkcmdline=console=tty0 console=hvc0,115200
arch=ppc64el
cons=kvm
consoleenabled=1
currchain=boot
currstate=netboot ubuntu18.04.2-ppc64el-compute
groups=all
ip=10.6.13.15
mac=42:11:0a:06:0d:0f|42:02:0a:06:0d:0f!*NOIP*|42:a2:0a:06:0d:0f!*NOIP*
mgt=kvm
monserver=f6u13k13
netboot=grub2
nfsserver=f6u13k13
os=ubuntu18.04.2
profile=compute
provmethod=ubuntu18.04.2-ppc64el-netboot-compute
serialport=0
serialspeed=115200
status=powering-on
statustime=05-21-2019 03:44:56
tftpserver=f6u13k13
updatestatus=failed
updatestatustime=05-20-2019 14:18:04
vmcpus=2
vmhost=f6u13
vmmemory=4096
vmnicnicmodel=virtio-net-pci
vmnics=br0,private_br0,private_br1
vmstorage=phy:/dev/mapper/vdiskvg00-vdisk00n15
xcatmaster=f6u13k13
@neo954 commented on Tue May 21 2019
I tried to get the tty state with stty -a
before and after I run rcons
. But the two of outputs looked exactly same.
root@f6u13k13:~# stty -a >stty.out.good
root@f6u13k13:~# rcons f6u13k15
[Enter `^Ec?' for help]
goconserver(2019-05-21T04:31:18-04:00): Hello 10.6.13.13:33358, welcome to the session of f6u13k15
[Disconnected]
root@f6u13k13:~# stty -a >stty.out.1
root@f6u13k13:~# diff -u stty.out.good stty.out.1
root@f6u13k13:~# echo $?
0
root@f6u13k13:~# cat stty.out.good
speed 38400 baud; rows 62; columns 135; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = M-^?; eol2 = M-^?;
swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W;
lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff
-iuclc ixany imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke -flusho -extproc
root@f6u13k13:~# cat stty.out.1
speed 38400 baud; rows 62; columns 135; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = M-^?; eol2 = M-^?;
swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W;
lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff
-iuclc ixany imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke -flusho -extproc
@neo954 commented on Tue May 21 2019
I tried to run strace tftp
when the tty is in the broken state. It seems the read()
system calls of tftp client process failed continuously and the errno
was set to EAGAIN
.
... <<< omit thousands of lines >>> ...
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
read(0, 0x397cdea1410, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> ) = 6
... <<< omit thousands of lines >>> ...
@neo954 commented on Tue May 21 2019
Enclosed please find the strace
outputs.
strace.tftp.out.good.txt
strace.tftp.out.problem.txt
@neo954 commented on Tue May 21 2019
This problem can be recreated with cat
.
root@f6u13k13:~# rcons f6u13k14
[Enter `^Ec?' for help]
goconserver(2019-05-21T04:59:08-04:00): Hello 10.6.13.13:38404, welcome to the session of f6u13k14
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Disconnected]
root@f6u13k13:~# cat
cat: -: Resource temporarily unavailable
@neo954 commented on Tue May 21 2019
root@f6u13k13:~# strace cat
execve("/bin/cat", ["cat"], 0x7fffc7e70f20 /* 28 vars */) = 0
brk(NULL) = 0xd9e3c350000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=28770, ...}) = 0
mmap(NULL, 28770, PROT_READ, MAP_PRIVATE, 3, 0) = 0x794af1900000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/powerpc64le-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\25\0\1\0\0\0\20G\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2181704, ...}) = 0
mmap(NULL, 2250384, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x794af16d0000
mmap(0x794af18e0000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x200000) = 0x794af18e0000
close(3) = 0
mprotect(0x794af18e0000, 65536, PROT_READ) = 0
mprotect(0xd9e01270000, 65536, PROT_READ) = 0
mprotect(0x794af1970000, 65536, PROT_READ) = 0
munmap(0x794af1900000, 28770) = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1991136, ...}) = 0
mmap(NULL, 1991136, PROT_READ, MAP_PRIVATE, 3, 0) = 0x794af14e0000
close(3) = 0
brk(NULL) = 0xd9e3c350000
brk(0xd9e3c380000) = 0xd9e3c380000
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 5), ...}) = 0
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 5), ...}) = 0
fadvise64(0, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x794af14a0000
read(0, 0x794af14b0000, 131072) = -1 EAGAIN (Resource temporarily unavailable)
write(2, "cat: ", 5cat: ) = 5
write(2, "-", 1-) = 1
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
read(3, "# Locale name alias data base.\n#"..., 4096) = 2995
read(3, "", 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, ": Resource temporarily unavailab"..., 34: Resource temporarily unavailable) = 34
write(2, "\n", 1
) = 1
munmap(0x794af14a0000, 262144) = 0
close(0) = 0
close(1) = 0
close(2) = 0
exit_group(1) = ?
+++ exited with 1 +++
@neo954 commented on Tue May 21 2019
Okay, here is the problem.
root@f6u13k13:~# cat /proc/self/fdinfo/0
pos: 0
flags: 02
mnt_id: 26
root@f6u13k13:~# rcons f6u13k14
[Enter `^Ec?' for help]
goconserver(2019-05-21T05:21:47-04:00): Hello 10.6.13.13:41330, welcome to the session of f6u13k14
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Disconnected]
root@f6u13k13:~# cat /proc/self/fdinfo/0
pos: 0
flags: 024002
mnt_id: 26
@neo954 commented on Tue May 21 2019
It seems the problem affected all three file descriptors 0, 1, and 2.
root@f6u13k13:~# head -n 99 /proc/self/fdinfo/{0,1,2}
==> /proc/self/fdinfo/0 <==
pos: 0
flags: 02
mnt_id: 26
==> /proc/self/fdinfo/1 <==
pos: 0
flags: 02
mnt_id: 26
==> /proc/self/fdinfo/2 <==
pos: 0
flags: 02
mnt_id: 26
root@f6u13k13:~# rcons f6u13k14
[Enter `^Ec?' for help]
goconserver(2019-05-21T05:23:16-04:00): Hello 10.6.13.13:41500, welcome to the session of f6u13k14
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Disconnected]
root@f6u13k13:~# head -n 99 /proc/self/fdinfo/{0,1,2}
==> /proc/self/fdinfo/0 <==
pos: 0
flags: 024002
mnt_id: 26
==> /proc/self/fdinfo/1 <==
pos: 0
flags: 024002
mnt_id: 26
==> /proc/self/fdinfo/2 <==
pos: 0
flags: 024002
mnt_id: 26
@neo954 commented on Tue May 21 2019
See http://man7.org/linux/man-pages/man5/proc.5.html for details of the fdinfo
subdirectory.
flags
This is an octal number that displays the file access mode and file status flags (see open(2)). If the close-on-exec file descriptor flag is set, then flags will also include the valueO_CLOEXEC
.
Before Linux 3.1, this field incorrectly displayed the setting ofO_CLOEXEC
at the time the file was opened, rather than the current setting of the close-on-exec flag.
@neo954 commented on Tue May 21 2019
In file /usr/src/linux-headers-4.15.0-47/include/uapi/asm-generic/fcntl.h
#define O_RDWR 00000002
#ifndef O_NONBLOCK
#define O_NONBLOCK 00004000
#endif
#ifndef FASYNC
#define FASYNC 00020000 /* fcntl, for BSD compatibility */
#endif
@neo954 commented on Tue May 21 2019
root@f6u13k13:~# uname -a
Linux f6u13k13 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:40:40 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
@neo954 commented on Tue May 21 2019
This is what happened on a RHEL 8 testing environment.
[root@c910f03c01p19 ~]# head -n 99 /proc/self/fdinfo/{0,1,2}
==> /proc/self/fdinfo/0 <==
pos: 0
flags: 02
mnt_id: 25
==> /proc/self/fdinfo/1 <==
pos: 0
flags: 02
mnt_id: 25
==> /proc/self/fdinfo/2 <==
pos: 0
flags: 02
mnt_id: 25
[root@c910f03c01p19 ~]# rcons c910f03c01p10
[Enter `^Ec?' for help]
goconserver(2019-05-21T05:47:27-04:00): Hello 10.3.1.19:59050, welcome to the session of c910f03c01p10
done
[Disconnected]
[root@c910f03c01p19 ~]# head -n 99 /proc/self/fdinfo/{0,1,2}
==> /proc/self/fdinfo/0 <==
pos: 0
flags: 020002
mnt_id: 25
==> /proc/self/fdinfo/1 <==
pos: 0
flags: 020002
mnt_id: 25
==> /proc/self/fdinfo/2 <==
pos: 0
flags: 020002
mnt_id: 25
[root@c910f03c01p19 ~]# uname -a
Linux c910f03c01p19 4.18.0-80.el8.ppc64le #1 SMP Wed Mar 13 11:26:21 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
@neo954 commented on Tue May 21 2019
In the latest goconserver
source, v0.3.2
.
$ grep -n -r NONBLOCK .
./console/client.go:132: err := common.Fcntl(in, syscall.F_SETFL, syscall.O_ASYNC|syscall.O_NONBLOCK)
./console/client.go:173: err := common.Fcntl(in, syscall.F_SETFL, syscall.O_ASYNC|syscall.O_NONBLOCK)
./console/cli.go:320: err = common.Fcntl(in, syscall.F_SETFL, syscall.O_ASYNC|syscall.O_NONBLOCK)
@neo954 commented on Wed May 22 2019
@chenglch, Do you have any idea about this issue? :-/
Looks like there's no automatic update of goconserver
per our instructions for updating xCAT product.
Currently we document, to update xCAT (using yum
):
yum update "*xCAT*" "*xcat*"
This will not pull in any goconserver updates...
How do we feel we should handle this?
I see the following messages in the system journal:
Feb 08 10:09:58 mgmt1.peak.olcf.ornl.gov systemd[1]: [/usr/lib/systemd/system/goconserver.service:8] Unknown lvalue 'killMode' in section 'Service'
Feb 08 10:09:58 mgmt1.peak.olcf.ornl.gov systemd[1]: [/usr/lib/systemd/system/goconserver.service:9] Unknown lvalue 'After' in section 'Service'
The After
parameter should go in the [Unit]
section, not [Service]
killMode
needs to be KillMode
We are also seeing shutdown hangs that appear to be because goconserver is writing to a NFS share. Still looking into the cause...
This is a feature request.
Currently, there is no way to do log rotate against the console logs. goconserver
need to provide a method to let the end user to log rotate against the console logs. Otherwise, all the logs will fill up /var
file system, eventually.
Either provide a mechanism to work with logrotate
or do the log rotate inside goconserver
will be fine.
To work with logrotate
. Do the following ...
SIGHUP
or SIGUSR1
.This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
Create a node definition in congo
. And then connect to the newly created node. BTW, the node, c910f03c01p09
is just another common Linux node, which runs Red Hat 7.4.
# congo create c910f03c01p09 driver=ssh ondemand=false \
> --params user=root,host=10.3.1.9,port=22,password=a_password
# congo console c910f03c01p09
goconserver(2017-11-08 00:16:57): Hello 127.0.0.1:57304, welcome to the session of c910f03c01p09
Run nano
through the console session.
Resize the terminal window. And all the screen display messed up.
What actually happened
The SIGWINCH
signal was not handled properly
What is expected
This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
# for node in foo{000..999}
> do
> congo create ${node} driver=ssh ondemand=false \
> --params user=foo,host=10.3.1.9,port=22,password=something
> ln -s /dev/null /var/log/goconserver/nodes/${node}.log
> congo logging ${node} on
> done
I created 1,000 node definitions in congo
. On the ssh server side, a bash script will run after the user log in, which will generate outputs continuously. Plus, the console log files, /var/log/goconserver/nodes/foo{000..999}.log
are all symbolic linked to /dev/null
. In order to avoid fill up the /var
file system.
# ps auwx
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
<<<...omit...>>>
root 18953 42.3 0.4 514368 93696 ? Ssl 05:02 0:07 /usr/local/bin/goconserver
<<<...omit...>>>
There are 1,000 active ssh connections.
# lsof -p 18953 | grep TCP | grep ssh | grep ESTABLISHED | wc -l
1000
Then, I left this test environment run for a handful of days.
What actually happened
After around five days...
See the TIME
slot of the ps
output. It shows goconserver
run for about 1,532 minutes and 41 seconds "accumulated cpu time".
# ps auwx
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
<<<...omit...>>>
root 18953 20.1 0.6 781120 129664 ? Ssl Nov08 1532:41 /usr/local/bin/goconserver
<<<...omit...>>>
I noticed some of the ssh connections were disconnected.
# lsof -p 18953 | grep TCP | grep ssh | grep ESTABLISHED | wc -l
574
What is expected
TCPKeepAlive
, which can be set to yes
. I quote the relevant part of man ssh_config
below. Maybe, similar thing can be implemented in goconserver
. TCPKeepAlive
Specifies whether the system should send TCP keepalive messages
to the other side. If they are sent, death of the connection or
crash of one of the machines will be properly noticed. However,
this means that connections will die if the route is down tempo‐
rarily, and some people find it annoying.
The default is yes (to send TCP keepalive messages), and the
client will notice if the network goes down or the remote host
dies. This is important in scripts, and many users want it too.
To disable TCP keepalive messages, the value should be set to no.
This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
# congo list
c910f03c01p09 (host: 127.0.0.1)
c910f03c17 (host: 127.0.0.1)
# congo delete .
Deleted
# echo $?
0
# congo list
c910f03c01p09 (host: 127.0.0.1)
c910f03c17 (host: 127.0.0.1)
What actually happened
The command, congo delete .
actually delete nothing. But it get a positive response. And the exit code is zero.
What is expected
Can this be marked %config(noreplace)
?
In v0.3.1, we have include the fix #46 , but fix will only work after goconserver is restarted.
We need to add prerm
script to cleanup goconserver service, and postinst
to restart goconserver after updated.
This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-02T06:57:35-0400 Commit: 7e9278e88eb4f3035707b41f0a717fede7291498
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
Create a node definition in congo
. And then connect to the newly created node. BTW, the node, c910f03c01p09
is just another common Linux node, which runs Red Hat 7.4.
# congo create c910f03c01p09 driver=ssh ondemand=false \
> --params user=root,host=10.3.1.9,port=22,password=a_password
# congo console c910f03c01p09
While the console session is running, on the node c910f03c01p09
, kill
the sshd
process which serve the connection, with a KILL
signal.
[root@c910f03c01p09 ~]# ps axf
PID TTY STAT TIME COMMAND
<<<...omit...>>>
18489 ? Ss 0:00 /usr/sbin/sshd -D
18964 ? Ss 0:00 \_ sshd: root@pts/0
18966 pts/0 Ss+ 0:00 \_ -bash
<<<...omit...>>>
[root@c910f03c01p09 ~]# kill -KILL 18964
And then, back to the congo
session, the session get
EOF` on both directions and was dropped.
# congo console c910f03c01p09
goconserver(2017-11-07 03:20:19): Hello 127.0.0.1:57208, welcome to the session of c910f03c01p09
[root@c910f03c01p09 ~]# date
Tue Nov 7 03:20:26 EST 2017
[root@c910f03c01p09 ~]# EOF
EOF
What actually happened
From the end user point of view, the congo
session was dropped when network problem occurred, or remote server reached problem.
What is expected
goconserver
should try to reconnect the ssh session behind the screen, and maintain the congo
session.Some sort of cleanup is required to remove consoles that are not there anymore.
In discovery, the node-XXX
discovered definitions are automatically removed from the xCAT database, but a user could have ran makegocons
and created consoles for these....
[root@briggs01 log]# makegocons -d node-8335-gtc-785262a
Error: Invalid nodes and/or groups in noderange: node-8335-gtc-785262a
[root@briggs01 log]# nodels | grep node-8335-gtc-785262a
[root@briggs01 log]#
However, the console configuration still exists so I am able to open an rcons session:
[root@briggs01 log]# rcons node-8335-gtc-785262a
{"file":"github.com/xcat2/goconserver/console/client.go (296)","level":"error","msg":"Fatal error: Could not connect to node-8335-gtc-785262a\n","node":"node-8335-gtc-785262a","time":"2018-02-27T16:11:45-05:00"}
The connection is disconnected
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
How do we get rid of these in gocons?
The good thing is that no logs write to here, so the log does not keep growing....
We had a similar thing in makeconservercf --cleanup
function...
I can't connect to the remote console using rcons.
The settings are as follows:
management node : mn01
client node : cn01,cn02
[root@ccm51 ~]# lsdef cn01
Object name: cn01a
arch=x86_64
bmc=cn01a
consoleenabled=1
currchain=boot
currstate=boot
groups=aaa,bbb
interface=eth0
ip=xx.xx.xx.xx
mac=aa:aa:aa:aa:aa:aa
mgt=ipmi
netboot=xnba
os=centos7.7
postbootscripts=otherpkgs
power=ipmi
profile=compute
provmethod=centos7.7-x86_64-install-compute
serialflow=hard
serialport=0
serialspeed=19200
status=powering-off
statustime=10-06-2020 09:59:13
[root@mn01 ~] # makegocons cn01
cn01: Created
[root@mn01 ~] # systemctl restart goconserver.service
[root@mn01 ~] # systemctl status goconserver.service
● goconserver.service - goconserver console daemon
Loaded: loaded (/usr/lib/systemd/system/goconserver.service; enabled; vendor>
Active: active (running) since day yyyy-mm-dd hh:mm:ss ; 5s ago
[root@mn01 ~] # rcons cn01
[Enter `^Ec?' for help]
goconserver(yyyy-mm-dd hh:mm:ss) Hello: xx.xx.xx.xx:59230, welcome to the session of cn01
↑
It doesn't work here and I have no choice but to terminate the connection.(Ctrl+e c .)
How can I get to login to cn01?
With the same setting cn02, when I reinstalled the OS, I can connect to remote console of cn02.
[root@mn01 ~] # rcons cn02
[Enter `^Ec?' for help]
goconserver(yyyy-mm-dd hh:mm:ss) Hello: xx.xx.xx.xx:37360, welcome to the session of cn02
CentOS Linux 7 (Core)
Kernel x.xx.x-xxxx.el7.x86_64 on an x86_64
cn02 login: root
Password:
Last login: ~~~
[root@cn02 ~] #
If the settings are correct, do I need to restart the node or reinstall the OS to connect to the remote console using rcons?
I would like to connect to this without using these methods if possible. Is there any other way?
Thank you.
This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
Create a node definition in congo
. And then connect to the newly created node. BTW, the node, c910f03c01p09
is just another common Linux node, which runs Red Hat 7.4.
# congo create c910f03c01p09 driver=ssh ondemand=false \
> --params user=root,host=10.3.1.9,port=22,password=a_password
# congo console c910f03c01p09
goconserver(2017-11-08 00:16:57): Hello 127.0.0.1:57304, welcome to the session of c910f03c01p09
# <<<--- RUN `kill -STOP 3963' FROM ANOTHER TERMINAL
[1]+ Stopped congo console c910f03c01p09
#
# jobs
[1]+ Stopped congo console c910f03c01p09
# fg
congo console c910f03c01p09
unknown signal received: continued
What actually happened
There is no way to use bash
job control against congo
at this time. Thus, I try to send it a SIGSTOP
signal from another terminal session.
While the console session congo console c910f03c01p09
was running, sent it a SIGSTOP
signal would cause it stopped. And then, try to move it back to the foreground process with the job control command of bash
, fg
would cause congo
crash.
It seems the signal handler of SIGCONT
works improperly. the default signal hander of SIGCONT
will do this work properly.
What is expected
SIGSTOP
signal and SIGCONT
signal.Ctrl-E
, c
, Ctrl-Z
would be better.I'm not sure if this is caused by the instability of the AMI BMC with regards to sol activate
or if we have multiple sessions established for this node.
But I saw this behavior today:
[root@stratton01 c910env]# rcons fs3
[Enter `^Ec?' for help]
goconserver(2018-03-09T11:03:37-05:00): Hello 10.6.29.1:48462, welcome to the session of fs3
Acquiring startup lock...done
[SOL Session operational. Use ~? for help]
Info: SOL payload already de-activated
SOL session closed by BMC
Error in SOL session
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Enter `^Ec?' for help]
goconserver(2018-03-09T11:03:52-05:00): Hello 10.6.29.1:48538, welcome to the session of fs3
Acquiring startup lock...done
[SOL Session operational. Use ~? for help]
Red Hat Enterprise Linux Server 7.4 (Maipo)
Kernel 3.10.0-693.el7.ppc64le on an ppc64le
fs3 login: rooSOL session closed by BMC
Error in SOL session
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Enter `^Ec?' for help]
goconserver(2018-03-09T11:04:06-05:00): Hello 10.6.29.1:48568, welcome to the session of fs3
Acquiring startup lock...done
[SOL Session operational. Use ~? for help]
Info: SOL payload already de-activated
SOL session closed by BMC
Error in SOL session
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Enter `^Ec?' for help]
goconserver(2018-03-09T11:04:20-05:00): Hello 10.6.29.1:48594, welcome to the session of fs3
Acquiring startup lock...done
[SOL Session operational. Use ~? for help]
Red Hat Enterprise Linux Server 7.4 (Maipo)
Kernel 3.10.0-693.el7.ppc64le on an ppc64le
fs3 login: Error sending SOL data: FAIL
SOL session closed by BMC
Error in SOL session
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
Disconnected
This is on Firestone EUH boxes.
It appears that goconserver
currently only supports writing to flat files. It would be nice if I could specify an alternate output plugin, to write structured data directly to rsyslog
or logstash
.
Is goconserver licensed under the EPL like xCAT? Could a licensing statement be added to the codebase?
This bug is against goconserver
Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840
running on a ppc64le Red Hat 7.4 Linux environment.
The recreation steps
# for node in foo{000..999}
> do
> congo create $node driver=ssh ondemand=false \
> --params user=foo,host=10.3.1.9,port=22,password=something
> congo logging $node on
> done
# congo list
foo932 (host: 127.0.0.1)
foo948 (host: 127.0.0.1)
foo888 (host: 127.0.0.1)
<<...omit...>>>
foo081 (host: 127.0.0.1)
# congo list
foo941 (host: 127.0.0.1)
foo619 (host: 127.0.0.1)
foo398 (host: 127.0.0.1)
<<...omit...>>>
foo939 (host: 127.0.0.1)
What actually happened
I created 1,000 node definitions in congo
. Each time when I run congo list
, it listed all nodes in a totally different random sequence.
What is expected
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.