Git Product home page Git Product logo

goconserver's People

Contributors

besawn avatar chenglch avatar cxhong avatar gurevichmark avatar mattaezell avatar robin2008 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goconserver's Issues

After /var file system filled up, `congo console` stop working

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

# for node in foo{000..999}
> do
>     congo create $node driver=ssh ondemand=false \
>         --params user=foo,host=10.3.1.9,port=22,password=something
>     congo logging $node on
> done

I created 1,000 node definitions in congo. With 1,000 active ssh session with console log enabled, the /var file system was filled up quickly.

# df -h /var/log/goconserver
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/vg0-var  9.8G  9.8G     0 100% /var
# du -sh /var/log/goconserver/nodes
9.7G	/var/log/goconserver/nodes

After that, connect to any of the node with congo console will fail.

# congo console foo997
The connection is disconnected
Session is teminated unexpectedly, retrying....
The connection is disconnected
Session is teminated unexpectedly, retrying....
The connection is disconnected
Session is teminated unexpectedly, retrying....

What actually happened

It seems goconserver refuse to work when it cannot to write console log any more. As the /var file system was filled up.

# ps ax
  PID TTY      STAT   TIME COMMAND
<<<...omit...>>>
16290 ?        Ssl  394:43 /usr/local/bin/goconserver
<<<...omit...>>>
# lsof -p 16290 | wc -l
446
# lsof -p 16290 | wc -l
396
# lsof -p 16290 | wc -l
399
# lsof -p 16290 | wc -l
501
# lsof -p 16290 | wc -l
580
# lsof -p 16290 | wc -l
490
# lsof -p 16290 | wc -l
393

What is expected

  • Some kind of error tolerance is expected. Even if console logging is failing, please make congo console work.

goconserver broadcast mode failed to support xCAT hierarchy mode

This issue is against goconserver version 0.3.1. While goconserver introduced broadcast node in pull request #45, it failed to support xCAT hierarchy mode.

# goconserver --version
Version: 0.3.1, BuildTime: 2018-08-20T21:15:20-0400 Commit: 1a3762c6de1dab60f5f2dedac659c532c1eda76f

In my test environment, c910f03c01p09 act as xCAT management node, c910f03c01p10 act as xCAT service node, and c910f03c01p19 act as the compute node.

# lsdef c910f03c01p10 -i setupconserver
Object name: c910f03c01p10
    setupconserver=2
# lsdef c910f03c01p19 -i conserver,servicenode,xcatmaster
Object name: c910f03c01p19
    conserver=c910f03c01p10
    servicenode=c910f03c01p10
    xcatmaster=c910f03c01p10

In the configuration above, Run rcons c910f03c01p10,c910f03c01p19 will fail. See details below.

# rcons c910f03c01p10,c910f03c01p19
Could not connect to c910f03c01p10, error: Node not exist
Connection to c910f03c01p10 closed.
# rcons c910f03c01p19,c910f03c01p10
Could not connect to c910f03c01p10, error: Node not exist
Connection to c910f03c01p10 closed.

Ability to specify multiple IPs for 'host:' parameter in config

I'm having an issue where I need the goconserver to listen on multiple IPs (local IP, HA IP and 127.0.0.1), but not listen on all IPs present on the system.

It appears that the host: parameter in the configuration file only accepts one IP address.

I would like to be able to specify multiple listener IPs for goconserver.

Warnings when building goconserver on Ubuntu 18

When building goconserver on Ubuntu 18, the following warnings are issued:

npm WARN deprecated [email protected]: Deprecated. Please use https://github.com/webpack-contrib/mini-css-extract-plugin
npm WARN deprecated [email protected]: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
npm WARN deprecated [email protected]: gulp-util is deprecated - replace it, following the guidelines at https://medium.com/gulpjs/gulp-util-ca3b1f9f9ac5
npm WARN deprecated [email protected]: https://github.com/lydell/resolve-url#deprecated
npm WARN deprecated [email protected]: Please see https://github.com/lydell/urix#deprecated
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: please upgrade to graceful-fs 4 for compatibility with current and future versions of Node.js
npm WARN deprecated [email protected]: This module relies on Node.js's internals and will break at some point. Do not use it, and update to [email protected].
npm WARN deprecated [email protected]: request has been deprecated, see https://github.com/request/request/issues/3142
WARN engine [email protected]: wanted: {"node":"<8.10.0"} (current: {"node":"8.10.0","npm":"3.5.2"})
WARN engine [email protected]: wanted: {"node":"^8.16.0 || ^10.6.0 || >=11.0.0"} (current: {"node":"8.10.0","npm":"3.5.2"})
npm WARN deprecated [email protected]: Chokidar 2 will break on node v14+. Upgrade to chokidar 3 with 15x less dependencies.
npm WARN prefer global [email protected] should be installed with -g

File output timestamps are output before their content

I'm using 0.2.0 with xCAT. My console section looks like:

console:
  datadir: /var/lib/goconserver/                       # the data file to save the hosts
  port: 12430                                  # the port for console
  log_timestamp: true                                  # log the timestamp at the beginning of line
  reconnect_interval: 10                               # retry interval in second if console could not be connected
  logger:                                              # multiple logger targets could be specified
    file:                                              # file logger, valid fields: name,logdir. Accept array in yaml format
       - name: default                                 # the identity name customized by user
         logdir: /var/log/consoles                   # default log directory of xcat
    tcp:                                               # valied fields: name, host, port, timeout, ssl_key_file, ssl_cert_file, ssl_ca_cert_file, ssl_insecure
       - name: rsyslog
         host: 127.0.0.1
         port: 5140

It seems that goconserver will output the next timestamp before any output is generated. If I tail a console, I have a timestamp with no newline at the end:

[[email protected] consoles]# tail -n2 h41n01.log
[2018-01-29 16:40:10] [30595.853585] Policy zone: DMA
[2018-01-29 16:40:10] [[email protected] consoles]#

This test shows that the timestamps are incorrect:
on h41n01:

[root@h41n01 ~]# echo > /dev/console;date > /dev/console; date > /dev/console;sleep 10; date > /dev/console;echo > /dev/console

Checking the log:

[[email protected] consoles]# tail -n6 h41n01.log
[2018-01-29 16:40:10] 
[2018-01-29 16:47:01] Mon Jan 29 16:47:01 EST 2018
[2018-01-29 16:47:01] Mon Jan 29 16:47:01 EST 2018
[2018-01-29 16:47:01] Mon Jan 29 16:47:11 EST 2018
[2018-01-29 16:47:11] 
[2018-01-29 16:47:11] [[email protected] consoles]#

The third date should have a timestamp of 11 seconds, not 1.

escape key configurable

Dear GoConServer experts,

Since I could not find a configuration option for the "escape key": will that eventually become a configuration option? I suppose not only our environment does not work well with "Ctrl-E C ."

Thanks!

If the console session is not established, list users command may fail with nil error

If the console session is not established, list users command may fail with nil error

{"file":"github.com/xcat2/goconserver/api/node/command.go (51)","level":"debug","msg":"Receive GET request /command/user/bulknode2 map[node:bulknode2] from 127.0.0.1:52125.","time":"2018-03-26T16:57:45+08:00"}
2018/03/26 16:57:45 http: panic serving 127.0.0.1:52125: runtime error: invalid memory address or nil pointer dereference
goroutine 211 [running]:
net/http.(*conn).serve.func1(0xc42039c640)
       	/usr/local/go/src/net/http/server.go:1697 +0xd0
panic(0x15dbaa0, 0x19f1410)
       	/usr/local/go/src/runtime/panic.go:491 +0x283
github.com/xcat2/goconserver/console.(*Console).ListSessionUser(0x0, 0xc4201bc330, 0xc4202d4612, 0x9)
       	/Users/longcheng/Project/golang/src/github.com/xcat2/goconserver/console/console.go:246 +0x49
github.com/xcat2/goconserver/console.(*NodeManager).ListUser(0xc42005aa00, 0xc4202d4612, 0x9, 0x4, 0xc42018b288, 0xc420021020, 0x55)
       	/Users/longcheng/Project/golang/src/github.com/xcat2/goconserver/console/server.go:939 +0x494
github.com/xcat2/goconserver/api.(*CommandApi).listUser(0xc4201b8960, 0x19b2300, 0xc420448460, 0xc42017b800)
       	/Users/longcheng/Project/golang/src/github.com/xcat2/goconserver/api/command.go:52 +0x224
github.com/xcat2/goconserver/api.(*CommandApi).(github.com/xcat2/goconserver/api.listUser)-fm(0x19b2300, 0xc420448460, 0xc42017b800)
       	/Users/longcheng/Project/golang/src/github.com/xcat2/goconserver/api/command.go:20 +0x48
net/http.HandlerFunc.ServeHTTP(0xc4201ba710, 0x19b2300, 0xc420448460, 0xc42017b800)
       	/usr/local/go/src/net/http/server.go:1918 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc42006de60, 0x19b2300, 0xc420448460, 0xc42017b800)
       	/Users/longcheng/Project/golang/src/github.com/gorilla/mux/mux.go:133 +0xed
net/http.serverHandler.ServeHTTP(0xc420184dd0, 0x19b2300, 0xc420448460, 0xc42017b600)
       	/usr/local/go/src/net/http/server.go:2619 +0xb4
net/http.(*conn).serve(0xc42039c640, 0x19b2c00, 0xc4201ef140)
       	/usr/local/go/src/net/http/server.go:1801 +0x71d
created by net/http.(*Server).Serve
       	/usr/local/go/src/net/http/server.go:2720 +0x288

goconserver missing `\n` at the end of the logged messages, the newline added in the front of the timestamp

This issue was brought to our attention from a user of xCAT where a 3rd party tool scanning the output of the console logs running goconsever did not detect the log entry until the next log entry was written.

This causes a problem because an entry from a certain day will not be caught until a next entry is written to the log due to the placement of the newline. If no entry gets logged, it would be potentially multiple days before the issue is caught by the monitoring tool.

We do see references to the newline being added before the timestamp in the code.

buf.WriteString("\n[" + time.Now().Format(common.RFC3339_SECOND) + "] ")

When console is not created, the error message doesn't help user

Should this be opened on xcat-core? or here?

Trying to open console on mic05tor12cn15.

[root@briggs01 ~]# rcons mid05tor12cn15


Could not find node mid05tor12cn15

Maybe we could do better....

Could not find node mid05tor12cn15, did you run 'makegocons mid05tor12cn15'?

After running, it is OK

[root@briggs01 ~]# makegocons mid05tor12cn15
mid05tor12cn15: Created
[root@briggs01 ~]# rcons mid05tor12cn15
[Enter `^Ec?' for help]
goconserver(2017-12-08 14:45:22): Hello 172.10.253.27:34166, welcome to the session of mid05tor12cn15

Red Hat Enterprise Linux Server 7.4 Beta (Maipo)
Kernel 4.11.0-39.el7a.ppc64le on an ppc64le

mid05tor12cn15 login:

TCP output contains carriage returns at the end of message

I have TCP hooked up to rsyslog. It seems that newlines are not included in the output, but carriage returns are. I would expect both to be stripped.

{"type": "console","message":"[30595.853585] Policy zone: DMA\r","node":"h41n01","date":"2018-01-29 16:40:10.62790"}

Need better command line parsing for the `cmd` driver of `congo`

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

# congo create c910f03c17 driver=cmd ondemand=false --params cmd="ipmitool -I lanplus -H 50.3.17.1 -U '' -P PASSW0RD sol activate"
Created
# congo console c910f03c17
goconserver(2017-11-08 03:41:59): Hello 127.0.0.1:57410, welcome to the session of c910f03c18
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:09): Hello 127.0.0.1:57412, welcome to the session of c910f03c18
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:19): Hello 127.0.0.1:57414, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:29): Hello 127.0.0.1:57416, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:39): Hello 127.0.0.1:57418, welcome to the session of c910f03c17
Error: Unable to establish IPMI v2 / RMCP+ session
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:49): Hello 127.0.0.1:57420, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:42:59): Hello 127.0.0.1:57422, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:43:09): Hello 127.0.0.1:57424, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
goconserver(2017-11-08 03:43:19): Hello 127.0.0.1:57426, welcome to the session of c910f03c17
Could not receive message, error: EOF
Could not receive message, error: EOF
Session is teminated unexpectedly, retrying....
The connection is disconnected

What actually happened

The intention is to run the command, ipmitool -I lanplus -H 50.3.17.1 -U '' -P PASSW0RD sol activate, under the cmd driver of congo. Please notice, in the ipmitool command, there is an empty string as one of its command line argument. For pass such an empty string in bash, it can be done with single quotes. But for congo, there is no way to do this.

Currently, there is no way to pass a space character inside the command line argument. Space character is always used as the separation character.

What is expected

  • Escape character support for command line parse, or single quotes, double quotes support.
  • Maybe, use bash -c 'exec run a command' to run the command, in this way, bash will do the command line parsing works.

Escape sequence is sent to the remote side, which should be avoid

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-02T06:57:35-0400 Commit: 7e9278e88eb4f3035707b41f0a717fede7291498 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

Create a node definition in congo. And then connect to the newly created node. BTW, the node, c910f03c01p09 is just another common Linux node, which runs Red Hat 7.4.

# congo create c910f03c01p09 driver=ssh ondemand=false \
> --params user=root,host=10.3.1.9,port=22,password=a_password
# congo console c910f03c01p09
goconserver(2017-11-07 23:47:16): Hello 127.0.0.1:57278, welcome to the session of c910f03c01p09

[root@c910f03c01p09 ~]# showkey -a

Press any keys - Ctrl-D will terminate this program
                                <<<-- PRESS `Ctrl-E', `c', AND `.' HERE 
^E 	  5 0005 0x05
c 	 99 0143 0x63

What actually happened

It seems, the escape sequence, Ctrl-E, c was sent to the remote side through ssh session.

What is expected

Here is the expect behavior, in this way, the escape sequence will not send to the remote side.

  • While Ctrl-E is pressed by the user, hold it in the buffer, do not sent it to the remote side.
  • If the next input character is not c, sent Ctrl-E and this input character to the remote side.
  • Or, If the input character is c, hold it in the buffer as well.
  • If the next input character is not ., sent Ctrl-E, c and this input character to the remote side.
  • Or, If the input character is ., clear the buffer, disconnect the session.

When console is not created, the error message is not very clear..

When console is not created, this is what we see..

[root@csm03 xcat]# rcons f5n12
{"file":"github.com/xcat2/goconserver/console/client.go (296)","level":"error","msg":"Fatal error: Could not connect to f5n12\n","node":"f5n12","time":"2018-08-03T21:47:50-04:00"}
The connection is disconnected
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
{"file":"github.com/xcat2/goconserver/console/client.go (296)","level":"error","msg":"Fatal error: Could not connect to f5n12\n","node":"f5n12","time":"2018-08-03T21:48:00-04:00"}
The connection is disconnected
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
Disconnected

Then after creating the console:

[root@csm03 xcat]# makegocons f5n12
f5n12: Created
[root@csm03 xcat]# rcons f5n12
[Enter `^Ec?' for help]
goconserver(2018-08-03T21:48:12-04:00): Hello 192.168.10.1:47132, welcome to the session of f5n12

Can we improve it?

console log rotation

Is it possible to implement standard log rotation mechanism using signal handling?

An example of a good mechanism for log rotation is nginx, which does log rotation via SIGUSR1: when the nginx binary receives this signal, it closes and reopens the log file descriptors. This is used in tandem with logrotate by creating a configuration for nginx in /etc/logrotate.d which contains a postrotate script kill -USR1 $NGINX_PID. logrotate rotates the logs, then signals nginx to close and reopen logs so that nginx has the correct logs open for writing.

Blank console with wcons/rcons, but IPMI Serial-on-LAN works

Hi!

I have the strangest issue where rcons and wcons don't display anything anymore for some nodes (not all), whereas using ipmitool to access the serial-on-lan console works perfectly.

Things used to work perfectly, but now, I can't see some of those consoles anymore using goconserver.

For instance, for node sh03-sn07:

# makegocons -q sh03-sn07

NODE                             SERVER                           STATE
sh03-sn07                        sh02-hn01.SUNet                  connected

The goconserver service is running on sh02-hn01.SUNet

# systemctl status goconserver.service
● goconserver.service - goconserver console daemon
   Loaded: loaded (/usr/lib/systemd/system/goconserver.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/goconserver.service.d
           └─override.conf
   Active: active (running) since Wed 2022-11-30 16:46:25 PST; 14min ago
     Docs: https://github.com/xcat2/goconserver
  Process: 39625 ExecStop=/bin/kill -TERM $MAINPID (code=exited, status=0/SUCCESS)
 Main PID: 39627 (goconserver)
   CGroup: /system.slice/goconserver.service
           ├─14798 perl /opt/xcat/share/xcat/cons/ipmi sh03-sn07
           ├─39627 /usr/bin/goconserver
           └─39671 perl /opt/xcat/share/xcat/cons/kvm sh-vm-sl-test02

Nov 30 16:46:25 sh02-hn01.SUNet systemd[1]: Started goconserver console daemon.

And using ipmitool to access the serial console works well (to rule out a node BIOS/IPMI configuration problem):

# ipmitool -I lanplus -U $IPMI_USER -P $IPMI_PASSWORD -H sh03-sn07.infra sol activate
[SOL Session operational.  Use ~? for help]

CentOS Linux 7 (Core)
Kernel 3.10.0-1160.80.1.el7.x86_64 on an x86_64

sh03-sn07 login:
~. [terminated ipmitool]

But rcons just displays a blank screen after initiating the connection:

# rcons sh03-sn07

[Enter `^Ec?' for help]
goconserver(2022-11-30T17:08:04-08:00): Hello 10.18.0.1:38424, welcome to the session of sh03-sn07









[Disconnected]
Shared connection to sh-hn01 closed.

As indicated above, the console state is connected so things seem to work.

Activating "debug" logging in /etc/goconserver/server.conf, the following message are logged during the rcons session:

{"file":"github.com/xcat2/goconserver/console/server.go (289)","level":"debug","msg":"New client connection received.","time":"2022-11-30T17:10:25-08:00"}
{"file":"github.com/xcat2/goconserver/console/proto.go (153)","level":"debug","msg":"Receive connection from client: {\"action\":1,\"node\":\"sh03-sn07\"}","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/server.go (332)","level":"info","msg":"Register client connection successfully.","node":"sh03-sn07","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (43)","level":"debug","msg":"Accept connection from client","node":"sh03-sn07","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (138)","level":"debug","msg":"Create new connection to write message to client.","node":"sh03-sn07","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (101)","level":"debug","msg":"Create new connection to read message from client.","node":"sh03-sn07","time":"2022-11-30T17:10:26-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (203)","level":"warning","msg":"Could not receive message from remote. Error:%!(EXTRA string=read /dev/ptmx: input/output error)","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (177)","level":"debug","msg":"readTarget goroutine quit","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/server.go (196)","level":"info","msg":"Start console again due to the ondemand setting.","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (286)","level":"debug","msg":"Close console session.","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (168)","level":"info","msg":"Failed to send message to client. Error:tls: use of closed connection","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (116)","level":"warning","msg":"Failed to receive message head from client. Error:read tcp 10.18.0.1:12430-\u003e10.18.0.1:39008: use of closed network connection.","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (103)","level":"debug","msg":"writeTarget goroutine quit","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (140)","level":"debug","msg":"writeClient goroutine quit","node":"sh03-sn07","time":"2022-11-30T17:10:31-08:00"}
{"file":"github.com/xcat2/goconserver/console/server.go (142)","level":"debug","msg":"Restart console session.","node":"sh03-sn07","time":"2022-11-30T17:10:41-08:00"}
{"file":"github.com/xcat2/goconserver/plugins/cmd.go (64)","level":"debug","msg":"Could not get tty size, use 80,80 as default","node":"sh03-sn07","time":"2022-11-30T17:10:41-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (240)","level":"debug","msg":"Start console session.","node":"sh03-sn07","time":"2022-11-30T17:10:41-08:00"}
{"file":"github.com/xcat2/goconserver/console/console.go (175)","level":"debug","msg":"Read target session has been initialized.","node":"sh03-sn07","time":"2022-11-30T17:10:41-08:00"}

which doesn't seem to show any obvious problem.

Other (identical) nodes still work normally:

# rcons sh03-sn01

[Enter `^Ec?' for help]
goconserver(2022-11-30T17:12:37-08:00): Hello 10.18.0.1:39220, welcome to the session of sh03-sn01

CentOS Linux 7 (Core)
Kernel 3.10.0-1160.80.1.el7.x86_64 on an x86_64

sh03-sn01 login:

Would there be any way to debug this further, and identify the issue?
Thanks!

Usability improvements for goconserver as we prepare to package with xCAT 2.13.11

Without much documentation , I am giving the goconserver a try and seeing what I can figure out from a usability perspective. Will edit/add to this issue as I find more...

  1. Any reason why we choose congo instead of gocons as the command name?

  2. Message when conserver or goconserver is running can be better

    When conserver is running, and trying to execute makegocons, the following message comes out:

    [root@stratton01 ~]# makegocons 
    Error: conserver is started, please stop it at first.
    

    Suggest something like:

    [root@stratton01 ~]# makegocons
    Error: conserver is running, did you mean 'makeconservercf'? If not, stop conserver and retry.
    

    The reverse:

    [root@fs2vm110 ~]# makeconservercf f6u17
    Error: goconserver is started, please stop it at first.
    

    Suggest something like:

    [root@fs2vm110 ~]# makeconservercf f6u17
    Error: goconserver is running, did you mean `makegocon'? If not, stop goconserver and retry.
    

    This may help those who have muscle memory and typing the wrong command. Also would help those who are un-aware the admin switched to goconserver, and trying to run makeconservercf on a console before running rcons.

  3. congo list doesn't seem to work

    [root@fs2vm112 ~]# congo list
    Could not list resources, Get http://127.0.0.1:12429/nodes: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
    [root@fs2vm112 ~]#
    
  4. Other congo commands have a similar problem

    [root@fs2vm111 gurevich]# congo create testnode driver=ssh ondemand=false --params user=root,host=10.6.7.254,port=22,password=xxxxx
    Post http://127.0.0.1:12429/nodes: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02\x16"
    
    [root@fs2vm111 gurevich]# congo create testnode driver=cmd ondemand=false --params cmd="ssh -l root -p 22 10.6.7.254"
    Post http://127.0.0.1:12429/nodes: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02\x16"
    [root@fs2vm111 gurevich]#
    
  5. Why do we create a new server.conf file https://github.com/xcat2/xcat-core/blob/master/xCAT-server/lib/perl/xCAT/Goconserver.pm#L524. Wouldn't it be better to sed the values instead of creating a static file. If goconserver ships a newer version of the server.conf file, it will revert back to this version hard coded in xcat-core...

Status of goconserver clustering with etcd

It looks like #38 added some support for clustering, but there isn't much documentation. What features are currently supported, and what is deferred to future implementation?

For example:
If I setup 4 goconservers and add 40 nodes, could I expect each server to (approximately) handle 10 nodes each? If one of the goconservers were to fail, would its ~10 nodes be evenly distributed among the other 3? When it recovers, would that server take its ~10 nodes back over?

Is there any plan to integrate cluster-mode/etcd into xCAT?

Create the same node definition in `congo` twice cause unexpected error

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

# congo create c910f03c01p09 driver=ssh ondemand=false --params user=root,host=10.3.1.9,port=22,password=a_password
Created
# congo create c910f03c01p09 driver=ssh ondemand=false --params user=root,host=10.3.1.9,port=22,password=a_password

Error: unexpected response status code

What actually happened

Create the same node definition in congo twice cause an unexpected error.

What is expected

  • An understandable error message is expected.

console connection state is 'error' if consoleondemand is true

Enroll the node

[root@c910f05c01bc02k74 ~]# chdef kvmguest1 consoleondemand=1
1 object definitions have been created or modified.
[root@c910f05c01bc02k74 ~]# makegocons kvmguest1
kvmguest1: Created
[root@c910f05c01bc02k74 ~]# makegocons -q

NODE                             SERVER                           STATE
kvmguest1                        c910f05c01bc02k74                enroll

Connect to the node

[root@c910f05c01bc02k74 ~]# rcons kvmguest1
[Enter `^Ec?' for help]
goconserver(2018-03-18T23:26:08-04:00): Hello 10.5.102.74:36146, welcome to the session of kvmguest1

kvmguest1 login:
Ubuntu 16.04.1 LTS kvmguest1 ttyS0

Then disconnect

kvmguest1 login: [Disconnected]

Check the status

[root@c910f05c01bc02k74 ~]# makegocons -q
NODE                             SERVER                           STATE
kvmguest1                        c910f05c01bc02k74                error

The console state should be available.

Option parsing should be more strict and only allow supported options and block all others

When trying to delete a console... I was not sure if it was -d or -D and so i tried -D first, the output I saw is:

[root@stratton01 ~]# makegocons -D c910f3zz01
In preprocess_request, request is $VAR1 = {
          '_xcat_clientfqdn' => [
                                  'localhost'
                                ],
          'arg' => [
                     '-D'
                   ],
          '_xcat_authname' => [
                                'root'
                              ],
          '_xcat_clientport' => [
                                  45572
                                ],
          'node' => [
                      'c910f3zz01'
                    ],
          'username' => [
                          'root'
                        ],
          'noderange' => [
                           'c910f3zz01'
                         ],
          '_xcatdest' => '10.6.29.1',
          'cwd' => [
                   '/root'
                 ],
          '_allnodes' => [
                           0
                         ],
          'clienttype' => [
                            'cli'
                          ],
          '_xcat_clienthost' => [
                                  'localhost'
                                ],
          'command' => [
                         'makegocons'
                       ],
          '_xcatpreprocessed' => [
                                   1
                                 ]
        };

c910f3zz01: Created

So seems like some verbose (DEBUG?) data is printed, but the console is "Created"

The correct command is -d

[root@stratton01 ~]# makegocons -d c910f3zz01
c910f3zz01: Deleted

Failed to create ubuntu package on amd64 platform

Env:

root@c910f05c01bc02k70:~/goconserver/build# uname -a
Linux c910f05c01bc02k70 4.4.0-75-generic #96-Ubuntu SMP Thu Apr 20 09:56:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Output:

root@c910f05c01bc02k70:~/goconserver# cd build && VERSION=0.2.1 ../dirty-debuild goconserver_linux_amd64.tar.gz
-rw-r--r-- 1 root root 9422625 Feb 26 04:36 goconserver-repack-amd64.tar.gz
drwxr-xr-x root/root         0 2018-02-26 04:36 etc/
drwxr-xr-x root/root         0 2018-02-26 04:36 etc/profile.d/
-rw-r--r-- root/root       318 2018-02-26 04:36 etc/profile.d/congo.sh
drwx------ root/root         0 2018-02-26 04:36 etc/goconserver/
-rw-r--r-- root/root      3060 2018-02-26 04:36 etc/goconserver/server.conf
drwxr-xr-x root/root         0 2018-02-26 04:36 usr/
drwxr-xr-x root/root         0 2018-02-26 04:36 usr/bin/
-rwxr-xr-x root/root  15515876 2018-02-26 04:36 usr/bin/congo
-rwxr-xr-x root/root  16034720 2018-02-26 04:36 usr/bin/goconserver
drwxr-xr-x root/root         0 2018-02-26 04:36 usr/lib/
drwxr-xr-x root/root         0 2018-02-26 04:36 usr/lib/systemd/
drwxr-xr-x root/root         0 2018-02-26 04:36 usr/lib/systemd/system/
-rw-r--r-- root/root       301 2018-02-26 04:36 usr/lib/systemd/system/goconserver.service
drwxr-xr-x root/root         0 2018-02-26 04:36 var/
drwxr-xr-x root/root         0 2018-02-26 04:36 var/log/
drwx------ root/root         0 2018-02-26 04:36 var/log/goconserver/
drwx------ root/root         0 2018-02-26 04:36 var/log/goconserver/nodes/
drwxr-xr-x root/root         0 2018-02-26 04:36 var/lib/
drwx------ root/root         0 2018-02-26 04:36 var/lib/goconserver/
Directories goconserver-0.2.1 and goconserver-0.2.1.orig prepared.
dh_testdir
dh_testdir
dh_testroot
dh_prep
dh_installdirs
dh_installdocs
dh_installchangelogs
find . -maxdepth 1 -mindepth 1 -not -name debian -print0 | \
       	xargs -0 -r -i cp -a {} debian/
dh_compress
dh_makeshlibs
dh_installdeb
dh_shlibdeps
dh_gencontrol
dh_md5sums
dh_builddeb

Check the deb package, no result:

root@c910f05c01bc02k70:~/goconserver# find ./ -name "*.deb"
root@c910f05c01bc02k70:~/goconserver#

`rcons` makes stdin, stdout, and stderr O_ASYNC | O_NONBLOCK

@neo954 commented on Tue May 21 2019

Here is the bug recreation steps.

Run rcons against a compute node, and then, press Ctrl-E C ?. After that, run tftp client, tftp. The tftp client will print out command line prompt repeatedly and endlessly.

root@f6u13k13:~# tftp
tftp>
tftp> 
root@f6u13k13:~# rcons f6u13k15
[Enter `^Ec?' for help]
goconserver(2019-05-21T04:12:05-04:00): Hello 10.6.13.13:33358, welcome to the session of f6u13k15
[Disconnected]
root@f6u13k13:~# tftp
tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp> tftp>
... <<< omit thousands of lines >>> ...

Additional information

The xCAT management node runs Ubuntu 18.04.2 on a ppc64el node. It has xCAT 2.15-snap201905170621 installed. The xCAT compute node f6u13k15 is a regular KVM guest.

root@f6u13k13:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
root@f6u13k13:~# go-xcat check
Operating system:   linux
Architecture:       ppc64le
Linux Distribution: ubuntu
Version:            18.04
go-xcat Version:    1.0.38


Reading repositories ...... done

xCAT Core Packages
==================

Package Name                Installed                      In Repository
------------                ---------                      -------------
perl-xcat                   2.15-snap201905170621          2.15-snap201905170621
xcat                        2.15-snap201905170621          2.15-snap201905170621
xcat-buildkit               2.15-snap201905170621          2.15-snap201905170621
xcat-client                 2.15-snap201905170621          2.15-snap201905170621
xcat-confluent              (not installed)                2.15-snap201905170621
xcat-genesis-scripts-amd64  2.15-snap201905170621          2.15-snap201905170621
xcat-genesis-scripts-ppc64  2.15-snap201905170621          2.15-snap201905170621
xcat-probe                  2.15-snap201905170621          2.15-snap201905170621
xcat-server                 2.15-snap201905170621          2.15-snap201905170621
xcat-test                   2.15-snap201905170621          2.15-snap201905170621
xcat-vlan                   (not installed)                2.15-snap201905170621
xcatsn                      (not installed)                2.15-snap201905170621

xCAT Dependency Packages
========================

Package Name                Installed                      In Repository
------------                ---------                      -------------
elilo-xcat                  3.14-4                         3.14-4
goconserver                 0.3.2-snap201811080419         0.3.2-snap201811080419
grub2-xcat                  2.02-0.76.el7.1.snap2019051602 2.02-0.76.el7.1.snap2019051602
ipmitool-xcat               1.8.18                         1.8.18
syslinux-xcat               3.86-2                         3.86-2
xcat-genesis-base-amd64     2.14.5-snap201811190037        2.14.5-snap201811190037
xcat-genesis-base-ppc64     2.14.5-snap201811160710        2.14.5-snap201811160710
xnba-undi                   1.0.3-7                        1.0.3-7
root@f6u13k13:~# lsdef f6u13k15 -z
# <xCAT data object stanza file>

f6u13k15:
    objtype=node
    addkcmdline=console=tty0 console=hvc0,115200
    arch=ppc64el
    cons=kvm
    consoleenabled=1
    currchain=boot
    currstate=netboot ubuntu18.04.2-ppc64el-compute
    groups=all
    ip=10.6.13.15
    mac=42:11:0a:06:0d:0f|42:02:0a:06:0d:0f!*NOIP*|42:a2:0a:06:0d:0f!*NOIP*
    mgt=kvm
    monserver=f6u13k13
    netboot=grub2
    nfsserver=f6u13k13
    os=ubuntu18.04.2
    profile=compute
    provmethod=ubuntu18.04.2-ppc64el-netboot-compute
    serialport=0
    serialspeed=115200
    status=powering-on
    statustime=05-21-2019 03:44:56
    tftpserver=f6u13k13
    updatestatus=failed
    updatestatustime=05-20-2019 14:18:04
    vmcpus=2
    vmhost=f6u13
    vmmemory=4096
    vmnicnicmodel=virtio-net-pci
    vmnics=br0,private_br0,private_br1
    vmstorage=phy:/dev/mapper/vdiskvg00-vdisk00n15
    xcatmaster=f6u13k13

@neo954 commented on Tue May 21 2019

I tried to get the tty state with stty -a before and after I run rcons. But the two of outputs looked exactly same.

root@f6u13k13:~# stty -a >stty.out.good
root@f6u13k13:~# rcons f6u13k15
[Enter `^Ec?' for help]
goconserver(2019-05-21T04:31:18-04:00): Hello 10.6.13.13:33358, welcome to the session of f6u13k15
[Disconnected]
root@f6u13k13:~# stty -a >stty.out.1
root@f6u13k13:~# diff -u stty.out.good stty.out.1
root@f6u13k13:~# echo $?
0
root@f6u13k13:~# cat stty.out.good
speed 38400 baud; rows 62; columns 135; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = M-^?; eol2 = M-^?;
swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W;
lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff
-iuclc ixany imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke -flusho -extproc
root@f6u13k13:~# cat stty.out.1
speed 38400 baud; rows 62; columns 135; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = M-^?; eol2 = M-^?;
swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W;
lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff
-iuclc ixany imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke -flusho -extproc

@neo954 commented on Tue May 21 2019

I tried to run strace tftp when the tty is in the broken state. It seems the read() system calls of tftp client process failed continuously and the errno was set to EAGAIN.

... <<< omit thousands of lines >>> ...
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
read(0, 0x397cdea1410, 1024)            = -1 EAGAIN (Resource temporarily unavailable)
write(1, "tftp> ", 6tftp> )                   = 6
... <<< omit thousands of lines >>> ...

@neo954 commented on Tue May 21 2019

Enclosed please find the strace outputs.
strace.tftp.out.good.txt
strace.tftp.out.problem.txt


@neo954 commented on Tue May 21 2019

This problem can be recreated with cat.

root@f6u13k13:~# rcons f6u13k14
[Enter `^Ec?' for help]
goconserver(2019-05-21T04:59:08-04:00): Hello 10.6.13.13:38404, welcome to the session of f6u13k14
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Disconnected]
root@f6u13k13:~# cat
cat: -: Resource temporarily unavailable

@neo954 commented on Tue May 21 2019

root@f6u13k13:~# strace cat
execve("/bin/cat", ["cat"], 0x7fffc7e70f20 /* 28 vars */) = 0
brk(NULL)                               = 0xd9e3c350000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=28770, ...}) = 0
mmap(NULL, 28770, PROT_READ, MAP_PRIVATE, 3, 0) = 0x794af1900000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/powerpc64le-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\25\0\1\0\0\0\20G\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2181704, ...}) = 0
mmap(NULL, 2250384, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x794af16d0000
mmap(0x794af18e0000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x200000) = 0x794af18e0000
close(3)                                = 0
mprotect(0x794af18e0000, 65536, PROT_READ) = 0
mprotect(0xd9e01270000, 65536, PROT_READ) = 0
mprotect(0x794af1970000, 65536, PROT_READ) = 0
munmap(0x794af1900000, 28770)           = 0
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1991136, ...}) = 0
mmap(NULL, 1991136, PROT_READ, MAP_PRIVATE, 3, 0) = 0x794af14e0000
close(3)                                = 0
brk(NULL)                               = 0xd9e3c350000
brk(0xd9e3c380000)                      = 0xd9e3c380000
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 5), ...}) = 0
fstat(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 5), ...}) = 0
fadvise64(0, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x794af14a0000
read(0, 0x794af14b0000, 131072)         = -1 EAGAIN (Resource temporarily unavailable)
write(2, "cat: ", 5cat: )                    = 5
write(2, "-", 1-)                        = 1
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
read(3, "# Locale name alias data base.\n#"..., 4096) = 2995
read(3, "", 4096)                       = 0
close(3)                                = 0
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, ": Resource temporarily unavailab"..., 34: Resource temporarily unavailable) = 34
write(2, "\n", 1
)                       = 1
munmap(0x794af14a0000, 262144)          = 0
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

@neo954 commented on Tue May 21 2019

Okay, here is the problem.

root@f6u13k13:~# cat /proc/self/fdinfo/0
pos:	0
flags:	02
mnt_id:	26
root@f6u13k13:~# rcons f6u13k14
[Enter `^Ec?' for help]
goconserver(2019-05-21T05:21:47-04:00): Hello 10.6.13.13:41330, welcome to the session of f6u13k14
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Disconnected]
root@f6u13k13:~# cat /proc/self/fdinfo/0
pos:	0
flags:	024002
mnt_id:	26

@neo954 commented on Tue May 21 2019

It seems the problem affected all three file descriptors 0, 1, and 2.

root@f6u13k13:~# head -n 99 /proc/self/fdinfo/{0,1,2}
==> /proc/self/fdinfo/0 <==
pos:	0
flags:	02
mnt_id:	26

==> /proc/self/fdinfo/1 <==
pos:	0
flags:	02
mnt_id:	26

==> /proc/self/fdinfo/2 <==
pos:	0
flags:	02
mnt_id:	26
root@f6u13k13:~# rcons f6u13k14
[Enter `^Ec?' for help]
goconserver(2019-05-21T05:23:16-04:00): Hello 10.6.13.13:41500, welcome to the session of f6u13k14
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Disconnected]
root@f6u13k13:~# head -n 99 /proc/self/fdinfo/{0,1,2}
==> /proc/self/fdinfo/0 <==
pos:	0
flags:	024002
mnt_id:	26

==> /proc/self/fdinfo/1 <==
pos:	0
flags:	024002
mnt_id:	26

==> /proc/self/fdinfo/2 <==
pos:	0
flags:	024002
mnt_id:	26

@neo954 commented on Tue May 21 2019

See http://man7.org/linux/man-pages/man5/proc.5.html for details of the fdinfo subdirectory.

flags
This is an octal number that displays the file access mode and file status flags (see open(2)). If the close-on-exec file descriptor flag is set, then flags will also include the value O_CLOEXEC.
Before Linux 3.1, this field incorrectly displayed the setting of O_CLOEXEC at the time the file was opened, rather than the current setting of the close-on-exec flag.


@neo954 commented on Tue May 21 2019

In file /usr/src/linux-headers-4.15.0-47/include/uapi/asm-generic/fcntl.h

#define O_RDWR          00000002
#ifndef O_NONBLOCK
#define O_NONBLOCK      00004000
#endif
#ifndef FASYNC
#define FASYNC          00020000        /* fcntl, for BSD compatibility */
#endif

@neo954 commented on Tue May 21 2019

root@f6u13k13:~# uname -a
Linux f6u13k13 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:40:40 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

@neo954 commented on Tue May 21 2019

This is what happened on a RHEL 8 testing environment.

[root@c910f03c01p19 ~]# head -n 99 /proc/self/fdinfo/{0,1,2}
==> /proc/self/fdinfo/0 <==
pos:	0
flags:	02
mnt_id:	25

==> /proc/self/fdinfo/1 <==
pos:	0
flags:	02
mnt_id:	25

==> /proc/self/fdinfo/2 <==
pos:	0
flags:	02
mnt_id:	25
[root@c910f03c01p19 ~]# rcons c910f03c01p10
[Enter `^Ec?' for help]
goconserver(2019-05-21T05:47:27-04:00): Hello 10.3.1.19:59050, welcome to the session of c910f03c01p10
done
[Disconnected]
[root@c910f03c01p19 ~]# head -n 99 /proc/self/fdinfo/{0,1,2}
==> /proc/self/fdinfo/0 <==
pos:	0
flags:	020002
mnt_id:	25

==> /proc/self/fdinfo/1 <==
pos:	0
flags:	020002
mnt_id:	25

==> /proc/self/fdinfo/2 <==
pos:	0
flags:	020002
mnt_id:	25
[root@c910f03c01p19 ~]# uname -a
Linux c910f03c01p19 4.18.0-80.el8.ppc64le #1 SMP Wed Mar 13 11:26:21 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

@neo954 commented on Tue May 21 2019

In the latest goconserver source, v0.3.2.

$ grep -n -r NONBLOCK .
./console/client.go:132:	err := common.Fcntl(in, syscall.F_SETFL, syscall.O_ASYNC|syscall.O_NONBLOCK)
./console/client.go:173:	err := common.Fcntl(in, syscall.F_SETFL, syscall.O_ASYNC|syscall.O_NONBLOCK)
./console/cli.go:320:	err = common.Fcntl(in, syscall.F_SETFL, syscall.O_ASYNC|syscall.O_NONBLOCK)

@neo954 commented on Wed May 22 2019

@chenglch, Do you have any idea about this issue? :-/

Update of goconserver is not tracked with xCAT-core/dep changes

Looks like there's no automatic update of goconserver per our instructions for updating xCAT product.

Currently we document, to update xCAT (using yum):

yum update "*xCAT*" "*xcat*" 

This will not pull in any goconserver updates...

How do we feel we should handle this?

systemd unit file issues

I see the following messages in the system journal:

Feb 08 10:09:58 mgmt1.peak.olcf.ornl.gov systemd[1]: [/usr/lib/systemd/system/goconserver.service:8] Unknown lvalue 'killMode' in section 'Service'
Feb 08 10:09:58 mgmt1.peak.olcf.ornl.gov systemd[1]: [/usr/lib/systemd/system/goconserver.service:9] Unknown lvalue 'After' in section 'Service'

The After parameter should go in the [Unit] section, not [Service]
killMode needs to be KillMode

We are also seeing shutdown hangs that appear to be because goconserver is writing to a NFS share. Still looking into the cause...

Consider log rotate

This is a feature request.

Currently, there is no way to do log rotate against the console logs. goconserver need to provide a method to let the end user to log rotate against the console logs. Otherwise, all the logs will fill up /var file system, eventually.

Either provide a mechanism to work with logrotate or do the log rotate inside goconserver will be fine.

To work with logrotate. Do the following ...

  • Register a signal handler for, say SIGHUP or SIGUSR1.
  • When the corresponding signal is caught, close all the opened log files, and then reopen all the log files with the same file names.
    • Don't lose log entries in this step.
    • The main log file, should rotate as well.

The SIGWINCH signal

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

Create a node definition in congo. And then connect to the newly created node. BTW, the node, c910f03c01p09 is just another common Linux node, which runs Red Hat 7.4.

# congo create c910f03c01p09 driver=ssh ondemand=false \
> --params user=root,host=10.3.1.9,port=22,password=a_password
# congo console c910f03c01p09
goconserver(2017-11-08 00:16:57): Hello 127.0.0.1:57304, welcome to the session of c910f03c01p09

Run nano through the console session.

screen shot 2017-11-15 at 16 43 51

Resize the terminal window. And all the screen display messed up.

screen shot 2017-11-15 at 16 44 41

What actually happened

The SIGWINCH signal was not handled properly

What is expected

With 1,000 ssh connections, 426 lost after five days

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

# for node in foo{000..999}
> do
>     congo create ${node} driver=ssh ondemand=false \
>         --params user=foo,host=10.3.1.9,port=22,password=something
>     ln -s /dev/null /var/log/goconserver/nodes/${node}.log
>     congo logging ${node} on
> done

I created 1,000 node definitions in congo. On the ssh server side, a bash script will run after the user log in, which will generate outputs continuously. Plus, the console log files, /var/log/goconserver/nodes/foo{000..999}.log are all symbolic linked to /dev/null. In order to avoid fill up the /var file system.

# ps auwx
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
<<<...omit...>>>
root     18953 42.3  0.4 514368 93696 ?        Ssl  05:02   0:07 /usr/local/bin/goconserver
<<<...omit...>>>

There are 1,000 active ssh connections.

# lsof -p 18953 | grep TCP | grep ssh | grep ESTABLISHED | wc -l
1000

Then, I left this test environment run for a handful of days.

What actually happened

After around five days...

See the TIME slot of the ps output. It shows goconserver run for about 1,532 minutes and 41 seconds "accumulated cpu time".

# ps auwx
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
<<<...omit...>>>
root     18953 20.1  0.6 781120 129664 ?       Ssl  Nov08 1532:41 /usr/local/bin/goconserver
<<<...omit...>>>

I noticed some of the ssh connections were disconnected.

# lsof -p 18953 | grep TCP | grep ssh | grep ESTABLISHED | wc -l
574

What is expected

  • The common used OpenSSH ssh client has a client side configuration TCPKeepAlive, which can be set to yes. I quote the relevant part of man ssh_config below. Maybe, similar thing can be implemented in goconserver.
     TCPKeepAlive
             Specifies whether the system should send TCP keepalive messages
             to the other side.  If they are sent, death of the connection or
             crash of one of the machines will be properly noticed.  However,
             this means that connections will die if the route is down tempo‐
             rarily, and some people find it annoying.

             The default is yes (to send TCP keepalive messages), and the
             client will notice if the network goes down or the remote host
             dies.  This is important in scripts, and many users want it too.

             To disable TCP keepalive messages, the value should be set to no.

`congo delete .` always success

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

# congo list
c910f03c01p09 (host: 127.0.0.1)
c910f03c17 (host: 127.0.0.1)
# congo delete .
Deleted
# echo $?
0
# congo list
c910f03c01p09 (host: 127.0.0.1)
c910f03c17 (host: 127.0.0.1)

What actually happened

The command, congo delete . actually delete nothing. But it get a positive response. And the exit code is zero.

What is expected

  • A proper error message.
  • Non-zero exit code.

goconserver need to restart after upgraded

In v0.3.1, we have include the fix #46 , but fix will only work after goconserver is restarted.

We need to add prerm script to cleanup goconserver service, and postinst to restart goconserver after updated.

Need RETRY in `congo` session, if something went wrong and disconnected the ssh session

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-02T06:57:35-0400 Commit: 7e9278e88eb4f3035707b41f0a717fede7291498 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

Create a node definition in congo. And then connect to the newly created node. BTW, the node, c910f03c01p09 is just another common Linux node, which runs Red Hat 7.4.

# congo create c910f03c01p09 driver=ssh ondemand=false \
> --params user=root,host=10.3.1.9,port=22,password=a_password
# congo console c910f03c01p09

While the console session is running, on the node c910f03c01p09, kill the sshd process which serve the connection, with a KILL signal.

[root@c910f03c01p09 ~]# ps axf
  PID TTY      STAT   TIME COMMAND
<<<...omit...>>>
18489 ?        Ss     0:00 /usr/sbin/sshd -D
18964 ?        Ss     0:00  \_ sshd: root@pts/0
18966 pts/0    Ss+    0:00      \_ -bash
<<<...omit...>>>
[root@c910f03c01p09 ~]# kill -KILL 18964

And then, back to the congo session, the session get EOF` on both directions and was dropped.

# congo console c910f03c01p09
goconserver(2017-11-07 03:20:19): Hello 127.0.0.1:57208, welcome to the session of c910f03c01p09

[root@c910f03c01p09 ~]# date
Tue Nov  7 03:20:26 EST 2017
[root@c910f03c01p09 ~]# EOF
                           EOF

What actually happened

From the end user point of view, the congo session was dropped when network problem occurred, or remote server reached problem.

What is expected

  • While a network problem occur, or the remote server reach problem and disconnect the ssh session, goconserver should try to reconnect the ssh session behind the screen, and maintain the congo session.
  • If the network problem last for a quite long period of time, use binary exponential backoff might be better.

clean up consoles that no longer have xCAT node definitions

Some sort of cleanup is required to remove consoles that are not there anymore.

In discovery, the node-XXX discovered definitions are automatically removed from the xCAT database, but a user could have ran makegocons and created consoles for these....

[root@briggs01 log]# makegocons -d node-8335-gtc-785262a
Error: Invalid nodes and/or groups in noderange: node-8335-gtc-785262a
[root@briggs01 log]# nodels | grep node-8335-gtc-785262a
[root@briggs01 log]#

However, the console configuration still exists so I am able to open an rcons session:

[root@briggs01 log]# rcons node-8335-gtc-785262a
{"file":"github.com/xcat2/goconserver/console/client.go (296)","level":"error","msg":"Fatal error: Could not connect to node-8335-gtc-785262a\n","node":"node-8335-gtc-785262a","time":"2018-02-27T16:11:45-05:00"}
The connection is disconnected
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....

How do we get rid of these in gocons?
The good thing is that no logs write to here, so the log does not keep growing....

We had a similar thing in makeconservercf --cleanup function...

can't connect to the remote console using rcons

I can't connect to the remote console using rcons.
The settings are as follows:

management node : mn01
client node : cn01,cn02

[root@ccm51 ~]# lsdef cn01
Object name: cn01a
arch=x86_64
bmc=cn01a
consoleenabled=1
currchain=boot
currstate=boot
groups=aaa,bbb
interface=eth0
ip=xx.xx.xx.xx
mac=aa:aa:aa:aa:aa:aa
mgt=ipmi
netboot=xnba
os=centos7.7
postbootscripts=otherpkgs
power=ipmi
profile=compute
provmethod=centos7.7-x86_64-install-compute
serialflow=hard
serialport=0
serialspeed=19200
status=powering-off
statustime=10-06-2020 09:59:13

[root@mn01 ~] # makegocons cn01
cn01: Created

[root@mn01 ~] # systemctl restart goconserver.service
[root@mn01 ~] # systemctl status goconserver.service
● goconserver.service - goconserver console daemon
Loaded: loaded (/usr/lib/systemd/system/goconserver.service; enabled; vendor>
Active: active (running) since day yyyy-mm-dd hh:mm:ss ; 5s ago

[root@mn01 ~] # rcons cn01
[Enter `^Ec?' for help]
goconserver(yyyy-mm-dd hh:mm:ss) Hello: xx.xx.xx.xx:59230, welcome to the session of cn01


It doesn't work here and I have no choice but to terminate the connection.(Ctrl+e c .)
How can I get to login to cn01?
With the same setting cn02, when I reinstalled the OS, I can connect to remote console of cn02.

[root@mn01 ~] # rcons cn02
[Enter `^Ec?' for help]
goconserver(yyyy-mm-dd hh:mm:ss) Hello: xx.xx.xx.xx:37360, welcome to the session of cn02

CentOS Linux 7 (Core)
Kernel x.xx.x-xxxx.el7.x86_64 on an x86_64

cn02 login: root
Password:
Last login: ~~~
[root@cn02 ~] #

If the settings are correct, do I need to restart the node or reinstall the OS to connect to the remote console using rcons?
I would like to connect to this without using these methods if possible. Is there any other way?

Thank you.

Send SIGSTOP and then SIGCONT to `congo` cause it crash

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

Create a node definition in congo. And then connect to the newly created node. BTW, the node, c910f03c01p09 is just another common Linux node, which runs Red Hat 7.4.

# congo create c910f03c01p09 driver=ssh ondemand=false \
> --params user=root,host=10.3.1.9,port=22,password=a_password
# congo console c910f03c01p09
goconserver(2017-11-08 00:16:57): Hello 127.0.0.1:57304, welcome to the session of c910f03c01p09

#        <<<--- RUN `kill -STOP 3963' FROM ANOTHER TERMINAL
[1]+  Stopped                 congo console c910f03c01p09
#
# jobs
[1]+  Stopped                 congo console c910f03c01p09
# fg
congo console c910f03c01p09
unknown signal received: continued

What actually happened

There is no way to use bash job control against congo at this time. Thus, I try to send it a SIGSTOP signal from another terminal session.

While the console session congo console c910f03c01p09 was running, sent it a SIGSTOP signal would cause it stopped. And then, try to move it back to the foreground process with the job control command of bash, fg would cause congo crash.

It seems the signal handler of SIGCONT works improperly. the default signal hander of SIGCONT will do this work properly.

What is expected

  • Proper behavior against SIGSTOP signal and SIGCONT signal.
  • Implement escape sequence Ctrl-E, c, Ctrl-Z would be better.

IPMI consoles disconnect issue with AMI BMC

I'm not sure if this is caused by the instability of the AMI BMC with regards to sol activate or if we have multiple sessions established for this node.

But I saw this behavior today:

[root@stratton01 c910env]# rcons fs3
[Enter `^Ec?' for help]
goconserver(2018-03-09T11:03:37-05:00): Hello 10.6.29.1:48462, welcome to the session of fs3
Acquiring startup lock...done
[SOL Session operational.  Use ~? for help]
Info: SOL payload already de-activated
SOL session closed by BMC
Error in SOL session
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Enter `^Ec?' for help]
goconserver(2018-03-09T11:03:52-05:00): Hello 10.6.29.1:48538, welcome to the session of fs3
Acquiring startup lock...done
[SOL Session operational.  Use ~? for help]

Red Hat Enterprise Linux Server 7.4 (Maipo)
Kernel 3.10.0-693.el7.ppc64le on an ppc64le

fs3 login: rooSOL session closed by BMC
Error in SOL session
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Enter `^Ec?' for help]
goconserver(2018-03-09T11:04:06-05:00): Hello 10.6.29.1:48568, welcome to the session of fs3
Acquiring startup lock...done
[SOL Session operational.  Use ~? for help]
Info: SOL payload already de-activated
SOL session closed by BMC
Error in SOL session
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
[Enter `^Ec?' for help]
goconserver(2018-03-09T11:04:20-05:00): Hello 10.6.29.1:48594, welcome to the session of fs3
Acquiring startup lock...done

[SOL Session operational.  Use ~? for help]

Red Hat Enterprise Linux Server 7.4 (Maipo)
Kernel 3.10.0-693.el7.ppc64le on an ppc64le

fs3 login: Error sending SOL data: FAIL
                                       SOL session closed by BMC
Error in SOL session
Could not receive message, error: EOF.
[Enter `^Ec.' to exit]
Session is teminated unexpectedly, retrying....
Disconnected

This is on Firestone EUH boxes.

Output plugins

It appears that goconserver currently only supports writing to flat files. It would be nice if I could specify an alternate output plugin, to write structured data directly to rsyslog or logstash.

goconserver license

Is goconserver licensed under the EPL like xCAT? Could a licensing statement be added to the codebase?

Nodes listed in random sequence with `congo list`

This bug is against goconserver Version: 0.1, BuildTime: 2017-11-07T23:00:03-0500 Commit: 906cd810f009dd1b0b22a75ea0c4efc1cbb2e840 running on a ppc64le Red Hat 7.4 Linux environment.

The recreation steps

# for node in foo{000..999}
> do
>     congo create $node driver=ssh ondemand=false \
>         --params user=foo,host=10.3.1.9,port=22,password=something
>     congo logging $node on
> done
# congo list
foo932 (host: 127.0.0.1)
foo948 (host: 127.0.0.1)
foo888 (host: 127.0.0.1)
<<...omit...>>>
foo081 (host: 127.0.0.1)
# congo list
foo941 (host: 127.0.0.1)
foo619 (host: 127.0.0.1)
foo398 (host: 127.0.0.1)
<<...omit...>>>
foo939 (host: 127.0.0.1)

What actually happened

I created 1,000 node definitions in congo. Each time when I run congo list, it listed all nodes in a totally different random sequence.

What is expected

  • Some kind of sorting algorithm may helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.