ceph / ceph-salt Goto Github PK
View Code? Open in Web Editor NEWDeploy Ceph clusters using cephadm
License: MIT License
Deploy Ceph clusters using cephadm
License: MIT License
/Containers/Images/ceph
should not have default value.
$ ceph mgr dump
...
"active_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "127.0.0.1:6800",
"nonce": 1
},
{
"type": "v1",
"addr": "127.0.0.1:6801",
"nonce": 1
}
]
},
We should make sure, MGRs use proper addresses.
ATM, ceph-common
and /etc/ceph/ceph.client.admin.keyring
are installed on all "mons".
Instead, we should have a dedicated "Admin" role, and only install ceph tools on minions that have that role.
Links
ceph/ceph#33793
Due to a recently introduced regression, ceph-salt stopped deploying OSDs.
The "Deploying OSD groups 1/1" step finishes immediately, and then sesdev create octopus ... --qa-test ...
fails because the cluster has zero OSDs.
ceph-salt uses a command like the following to deploy OSDs:
echo '{\"testing_dg_admin\": {\"host_pattern\": \"admin*\", \"data_devices\": {\"all\": true}}}' | ceph orch osd create -i -
When I try this command manually, it fails:
admin:~ # echo '{\"testing_dg_admin\": {\"host_pattern\": \"admin*\", \"data_devices\": {\"all\": true}}}' | ceph orch osd create -i -
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1070, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 191, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 309, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 153, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 144, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 437, in _create_osd
dgs = DriveGroupSpecs(yaml.load(inbuf))
File "/usr/lib/python3.6/site-packages/ceph/deployment/drive_group.py", line 117, in __init__
self.build_drive_groups()
File "/usr/lib/python3.6/site-packages/ceph/deployment/drive_group.py", line 122, in build_drive_groups
(drive_group_spec, name=drive_group_name))
File "/usr/lib/python3.6/site-packages/ceph/deployment/drive_group.py", line 228, in from_json
"Feature <{}> is not supported".format(applied_filter))
ceph.deployment.drive_group.DriveGroupValidationError: Failed to validate Drive Group: Feature <\"host_pattern\"> is not supported
admin:~ # echo $?
22
When I change the single quotes that are around the JSON in the echo
statement to double quotes, it succeeds. It also succeeds with the single-quotes, provided I remove the backslashes from the JSON.
After updating packages, ceph-bootstrap
checks if a reboot is needed.
ATM this check is only implemented for SUSE (zypper ps
), but other distributions should also be supported.
When I try to remove one minion that has roles I receive an error message, but the internal state was partially changed.
/Cluster/Minions> ls /Cluster
o- Cluster ..................................................................................................... [...]
o- Minions ............................................................................................ [Minions: 2]
| o- node1.ceph.com ..................................................................................... [no roles]
| o- node2.ceph.com .......................................................................................... [mgr]
o- Roles ..................................................................................... [Minions w/ roles: 1]
o- Mgr .............................................................................................. [Minions: 1]
| o- node2.ceph.com ............................................................................. [no other roles]
o- Mon .............................................................................................. [no minions]
/Cluster/Minions> rm node2*
Cannot remove host 'node2.ceph.com' because it has roles defined: {'mgr'}
/Cluster/Minions> rm node2*
No minions matched "node2*".
We should only set the "bootstrap_minion" value if minion has both "mon" and "mgr" roles.
The ceph-bootstrap status
command should preform some validations to guarantee that ceph-bootstrap can work correctly in the installed system, and show the status of those checks, and if something fails, suggest possible actions to users.
List of validations:
top.sls
file
ceph-bootstrap init
(Issue #8) that should take care of making sureI have the following configuration, without roles:
admin:~ # ceph-bootstrap config ls
o- / ........................................................................................................... [...]
o- Cluster ................................................................................................... [...]
| o- Minions .......................................................................................... [Minions: 1]
| | o- node1.octopusipv6.com ............................................................................ [no roles]
| o- Roles ................................................................................... [Minions w/ roles: 0]
| o- Mgr ............................................................................................ [no minions]
| o- Mon ............................................................................................ [no minions]
o- Containers ................................................................................................ [...]
| o- Images .................................................................................................. [...]
| o- ceph .................................................................... [docker.io/ceph/daemon-base:latest]
o- Deployment ................................................................................................ [...]
| o- Bootstrap .......................................................................................... [disabled]
| o- Dashboard ............................................................................................... [...]
| | o- password ............................................................................... [randomly generated]
| | o- username ............................................................................................ [admin]
| o- Mgr ................................................................................................ [disabled]
| o- Mon ................................................................................................ [disabled]
| o- OSD ................................................................................................ [disabled]
o- Network ................................................................................................... [...]
| o- Address_Family .......................................................................................... [ip4]
o- SSH ........................................................................................... [no key pair set]
| o- Private_Key .............................................................................. [no private key set]
| o- Public_Key ................................................................................ [no public key set]
o- Storage ................................................................................................... [...]
| o- Drive_Groups .......................................................................................... [empty]
o- Time_Server ........................................................................................... [enabled]
o- External_Servers ...................................................................................... [empty]
o- Server_Hostname ....................................................................... [node1.octopusipv6.com]
When applying the ceph-salt
state, I'm expecting ceph-bootstrap
to configure timeserver:
salt -G 'ceph-salt:member' state.apply ceph-salt
But I get the following error:
admin:~ # salt -G 'ceph-salt:member' state.apply ceph-salt
node1.octopusipv6.com:
Data failed to compile:
----------
Rendering SLS 'base:ceph-salt' failed: Jinja variable 'dict object' has no attribute 'bootstrap_mon'
ERROR: Minions returned with non-zero exit code
It should be possible to set if we want to use IPv4 or IPv6.
A new config entry should be created:
/Network/Address_Family
whith the following commands available:
set ip4
set ip6
Salt is very silent when running a salt formula or any salt state. The objective of this feature is to use the salt-event bus to notify about the execution progress of the several steps preformed by ceph-salt formula.
The event tags should have the following structure ceph-salt/<step_name>/[started, running, finished]
The payload of each event should depend on the <step_name>
and the type of event [started, running, finished]
. Each step should always trigger a started
event when it starts, and a finished
event when it finishes. The running
event is for communicating with progress information, for instance, when the step takes a long time to execute and is possible to send some kind of progress percentage.
The finished
event should include the information of whether the operation was successful or not, and if not, it should describe the failure.
"bootstrap_mon" is the cluster’s first manager
and monitor
, so it should be renamed to "bootstrap_minion".
As podman is no longer a dependency of cephadm, we have to make sure it is installed on the minions.
Add a Apparmor
option group to the config shell. This group should be able to configure all options required by ceph-salt-formula:apparmor state.
Currently the apparmor state is not doing much. We should check what was being done in DeepSea (https://github.com/SUSE/DeepSea/tree/master/srv/salt/ceph/apparmor) and add those things to ceph-salt formula.
Also, we need to check what cephadm is doing in this regard, and make sure that what ceph-salt formula preforms is compatible with cephadm.
Removing the last MON should remove the "bootstrap_mon" from Pillar.
How to reproduce:
admin:~ # ceph-bootstrap config /Cluster ls
o- Cluster ............................................................................. [...]
o- Minions .................................................................... [no minions]
o- Roles ............................................................. [Minions w/ roles: 0]
o- Mgr ...................................................................... [no minions]
o- Mon ...................................................................... [no minions]
admin:~ # salt 'node1.ceph.com' pillar.get ceph-salt
node1.ceph.com:
admin:~ # ceph-bootstrap config /Cluster/Minions add node1.ceph.com
1 minion added.
admin:~ # ceph-bootstrap config /Cluster/Roles/Mon add node1.ceph.com
1 minion added.
admin:~ # salt 'node1.ceph.com' pillar.get ceph-salt
node1.ceph.com:
----------
bootstrap_mon:
node1.ceph.com
minions:
----------
all:
- node1
mgr:
mon:
----------
node1:
10.20.39.201
admin:~ # ceph-bootstrap config /Cluster/Roles/Mon rm node1.ceph.com
1 minion removed.
admin:~ # salt 'node1.ceph.com' pillar.get ceph-salt
node1.ceph.com:
----------
bootstrap_mon:
node1.ceph.com
minions:
----------
all:
mgr:
mon:
----------
There is a lot of duplicate code to fetch data from the pillar, especially due to all the handling of non-existent keys in the pillar.
The objective is to create some jinja macros that can be used to reduce the amount of duplicate code.
See https://jinja.palletsprojects.com/en/2.10.x/templates/#macros
admin:~ # ceph-bootstrap config
/> /Cluster/Roles/Mmg add node1*
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/configshell_fb/shell.py", line 811, in _execute_command
target = self._current_node.get_node(path)
File "/usr/lib/python3.6/site-packages/configshell_fb/node.py", line 1854, in get_node
return next_node.get_node(next_path)
File "/usr/lib/python3.6/site-packages/configshell_fb/node.py", line 1862, in get_node
return next_node.get_node(next_path)
File "/usr/lib/python3.6/site-packages/configshell_fb/node.py", line 1862, in get_node
return next_node.get_node(next_path)
File "/usr/lib/python3.6/site-packages/configshell_fb/node.py", line 1865, in get_node
return adjacent_node(path)
File "/usr/lib/python3.6/site-packages/configshell_fb/node.py", line 1827, in adjacent_node
return self.get_child(name)
File "/usr/lib/python3.6/site-packages/configshell_fb/node.py", line 1796, in get_child
% (self.path.rstrip('/'), name))
ValueError: No such path /Cluster/Roles/Mmg
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/bin/ceph-bootstrap", line 11, in <module>
load_entry_point('ceph-bootstrap==15.0.2+1580743520.g1c1e49b', 'console_scripts', 'ceph-bootstrap')()
File "/usr/lib/python3.6/site-packages/ceph_bootstrap/__init__.py", line 55, in ceph_bootstrap_main
cli(prog_name='ceph-bootstrap')
File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/ceph_bootstrap/__init__.py", line 85, in config_shell
if not run_config_shell():
File "/usr/lib/python3.6/site-packages/ceph_bootstrap/config_shell.py", line 803, in run_config_shell
shell.run_interactive()
File "/usr/lib/python3.6/site-packages/configshell_fb/shell.py", line 905, in run_interactive
self._cli_loop()
File "/usr/lib/python3.6/site-packages/configshell_fb/shell.py", line 734, in _cli_loop
self.run_cmdline(cmdline)
File "/usr/lib/python3.6/site-packages/configshell_fb/shell.py", line 848, in run_cmdline
self._execute_command(path, command, pparams, kparams)
File "/usr/lib/python3.6/site-packages/configshell_fb/shell.py", line 813, in _execute_command
raise ExecutionError(str(msg))
configshell_fb.node.ExecutionError: No such path /Cluster/Roles/Mmg
admin:~ #
Looks like we should catch this exception and just print out the error.
If the user does not has root privileges, then ceph-bootstrap
should return immediately.
See: https://github.com/SUSE/DeepSea/blob/master/cli/common.py#L48-L56
ATM, ceph-bootstrap always uses the IP address from the fqdn_ip4
grain to configure Monitors.
This address can be used by default, but it should be possible to explicitly change it to a different value.
This can be done in the /Deployment
group, e.g.:
...
o- Deployment ............................................................... [...]
| ...
| o- Mon ............................................................... [enabled]
| | o- node1.ceph.com .......................................... [192.169.100.201]
...
When implementing this we should remove the CephNodeFqdnResolvesToLoopback
validation when adding a minion
, and only validate IPs before the deployment.
ATM "bootstrap_minion" value is automatically set by "ceph-bootstrap config", but this value is not available in the UI.
Please implement a way to dump the configuration database in JSON. For example, a --format json
switch for ceph-bootstrap config ls
. Thanks!
In case someone manually starts cephadm
on a minion for debugging purposes, there is a chance that a user will not specify an image.
If we could specify the default image for cephadm in a config file, we no longer need to worry about users accidentally pulling the default image.
Changing the default image doesn't have an impact on the image used by calls from the ceph-mgr, as the mgr always specifies the image when calling cephadm.
Thoughts?
scalability-master:~ # ceph-bootstrap deploy
Traceback (most recent call last):
File "/usr/bin/ceph-bootstrap", line 11, in <module>
load_entry_point('ceph-bootstrap==15.1.0+1581935293.g7a3134c', 'console_scripts', 'ceph-bootstrap')()
File "/usr/lib/python3.6/site-packages/ceph_bootstrap/__init__.py", line 55, in ceph_bootstrap_main
cli(prog_name='ceph-bootstrap')
File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 676, in main
_verify_python3_env()
File "/usr/lib/python3.6/site-packages/click/_unicodefun.py", line 118, in _verify_python3_env
'for mitigation steps.' + extra)
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Consult http://click.pocoo.org/python3/for mitigation steps.
This system supports the C.UTF-8 locale which is recommended.
You might be able to resolve your issue by exporting the
following environment variables:
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
Click discovered that you exported a UTF-8 locale
but the locale system could not pick up from it because
it does not exist. The exported locale is "de_DE.UTF-8" but it
is not supported
scalability-master:~ # ceph-bootstrap deploy --non-interactive
Checking if ceph-salt formula is available...
Syncing minions with the master...
Checking existing deployment...
Cluster is already deployed, please apply the deployment to a single minion at a time: "ceph-bootstrap deploy <minion_id>"
But the cluster actually doesn't exist:
scalability-master:~ # salt '*' cmd.run 'cephadm'
scalability-monitor-2.openstack.local:
/bin/sh: cephadm: command not found
scalability-osd-5.openstack.local:
/bin/sh: cephadm: command not found
scalability-osd-1.openstack.local:
/bin/sh: cephadm: command not found
scalability-osd-4.openstack.local:
/bin/sh: cephadm: command not found
scalability-monitor-1.openstack.local:
/bin/sh: cephadm: command not found
scalability-monitor-3.openstack.local:
/bin/sh: cephadm: command not found
scalability-osd-3.openstack.local:
/bin/sh: cephadm: command not found
scalability-osd-2.openstack.local:
/bin/sh: cephadm: command not found
scalability-master.openstack.local:
/bin/sh: cephadm: command not found
ERROR: Minions returned with non-zero exit code
scalability-master:~ # salt '*' cmd.run 'ceph'
scalability-monitor-2.openstack.local:
/bin/sh: ceph: command not found
scalability-osd-5.openstack.local:
/bin/sh: ceph: command not found
scalability-osd-1.openstack.local:
/bin/sh: ceph: command not found
scalability-osd-4.openstack.local:
/bin/sh: ceph: command not found
scalability-monitor-3.openstack.local:
/bin/sh: ceph: command not found
scalability-osd-2.openstack.local:
/bin/sh: ceph: command not found
scalability-monitor-1.openstack.local:
/bin/sh: ceph: command not found
scalability-osd-3.openstack.local:
/bin/sh: ceph: command not found
scalability-master.openstack.local:
/bin/sh: ceph: command not found
ERROR: Minions returned with non-zero exit code
config:
scalability-master:~ # ceph-bootstrap config
/> ls
o- / ......................................................................................................................... [...]
o- Cluster ................................................................................................................. [...]
| o- Minions ........................................................................................................ [Minions: 9]
| | o- scalability-master.openstack.local ............................................................................. [no roles]
| | o- scalability-monitor-1.openstack.local .......................................................................... [mgr, mon]
| | o- scalability-monitor-2.openstack.local .......................................................................... [no roles]
| | o- scalability-monitor-3.openstack.local .......................................................................... [no roles]
| | o- scalability-osd-1.openstack.local .............................................................................. [no roles]
| | o- scalability-osd-2.openstack.local .............................................................................. [no roles]
| | o- scalability-osd-3.openstack.local .............................................................................. [no roles]
| | o- scalability-osd-4.openstack.local .............................................................................. [no roles]
| | o- scalability-osd-5.openstack.local .............................................................................. [no roles]
| o- Roles ........................................ [Bootstrap minion: scalability-monitor-1.openstack.local, Minions w/ roles: 1]
| o- Mgr .......................................................................................................... [Minions: 1]
| | o- scalability-monitor-1.openstack.local ................................................................ [other roles: mon]
| o- Mon .......................................................................................................... [Minions: 1]
| o- scalability-monitor-1.openstack.local ................................................................ [other roles: mgr]
o- Containers .............................................................................................................. [...]
| o- Images ................................................................................................................ [...]
| o- ceph ........................ [registry.suse.de/suse/sle-15-sp2/update/products/ses7/milestones/containers/ses/7/ceph/ceph]
o- Deployment .............................................................................................................. [...]
| o- Bootstrap ......................................................................................................... [enabled]
| o- Dashboard ............................................................................................................. [...]
| | o- password ............................................................................................. [randomly generated]
| | o- username .......................................................................................................... [admin]
| o- Mgr .............................................................................................................. [disabled]
| o- Mon .............................................................................................................. [disabled]
| o- OSD .............................................................................................................. [disabled]
o- SSH ............................................................................................................ [Key Pair set]
| o- Private_Key ............................................................... [00:34:a1:d6:44:a6:4a:35:f1:38:88:47:cc:bc:04:42]
| o- Public_Key ................................................................ [00:34:a1:d6:44:a6:4a:35:f1:38:88:47:cc:bc:04:42]
o- Storage ................................................................................................................. [...]
| o- Drive_Groups ........................................................................................................ [empty]
o- System_Update ........................................................................................................... [...]
| o- Packages .......................................................................................................... [enabled]
| o- Reboot ............................................................................................................ [enabled]
o- Time_Server ......................................................................................................... [enabled]
o- External_Servers ........................................................................................................ [1]
| o- ntp.suse.cz ......................................................................................................... [...]
o- Server_Hostname ........................................................................ [scalability-master.openstack.local]
/> exit
After cephadm boostrap succeeds, it prints some valuable infos:
INFO:cephadm:Ceph Dashboard is now available at:
URL: https://ubuntu1804:8443/
User: admin
Password: wgcrj2t3ka
INFO:cephadm:You can access the Ceph CLI with:
sudo ./cephadm shell --fsid 146d0150-4e66-11ea-a110-5254005c2d4e -c ceph.conf -k ceph.client.admin.keyring
INFO:cephadm:Bootstrap complete.
We should forward users the information about the dashboard password and provide a convenience wrapper for cephadm shell
Due to a recently introduced regression, ceph-salt stopped deploying OSDs, but ceph-salt-formula ignores this and reports that OSD groups were created successfully.
The "Deploying OSD groups 1/1" step finishes immediately (with success), yet sesdev create octopus ... --qa-test ...
fails because the cluster has zero OSDs.
ceph-salt uses a command like the following to deploy OSDs:
echo '{\"testing_dg_admin\": {\"host_pattern\": \"admin*\", \"data_devices\": {\"all\": true}}}' | ceph orch osd create -i -
When I try this command manually, it fails:
admin:~ # echo '{\"testing_dg_admin\": {\"host_pattern\": \"admin*\", \"data_devices\": {\"all\": true}}}' | ceph orch osd create -i -
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1070, in _handle_command
return self.handle_command(inbuf, cmd)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 191, in handle_command
return dispatch[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph/mgr/mgr_module.py", line 309, in call
return self.func(mgr, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 153, in <lambda>
wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 144, in wrapper
return func(*args, **kwargs)
File "/usr/share/ceph/mgr/orchestrator/module.py", line 437, in _create_osd
dgs = DriveGroupSpecs(yaml.load(inbuf))
File "/usr/lib/python3.6/site-packages/ceph/deployment/drive_group.py", line 117, in __init__
self.build_drive_groups()
File "/usr/lib/python3.6/site-packages/ceph/deployment/drive_group.py", line 122, in build_drive_groups
(drive_group_spec, name=drive_group_name))
File "/usr/lib/python3.6/site-packages/ceph/deployment/drive_group.py", line 228, in from_json
"Feature <{}> is not supported".format(applied_filter))
ceph.deployment.drive_group.DriveGroupValidationError: Failed to validate Drive Group: Feature <\"host_pattern\"> is not supported
admin:~ # echo $?
22
However, ceph-salt-formula ignores this error.
Before running ceph-bootstrap we need to make sure that the salt pillar is correctly configured to find the ceph-salt.sls
file created by ceph-bootstrap. This should be done automatically by the ceph-bootstrap init
command.
Currently these are the steps we run manually before running ceph-bootstrap config
:
cat <<EOF > /srv/pillar/top.sls
base:
'*':
- ceph-salt
EOF
touch /srv/pillar/ceph-salt.sls
chown -R salt:salt /srv/pillar
salt \* saltutil.pillar_refresh
Since ceph/ceph#33131 ceph orchestrator
was renamed to ceph orch
, so ceph-bootstrap
must be adapted accordingly.
Nowadays, Ceph has a MON store for cluster configuration, but certain options still have to be set in ceph.conf
before the cluster is bootstrapped.
One example is osd crush chooseleaf type = 0
. If this is not provided on the cephadm bootstrap
command line via the -c
option, the initial CRUSH map created by cephadm bootstrap
will have the failure domain set to "host" and there is no easy way to change that.
It's possible that there are other options like this one, which must be set via cephadm bootstrap -c
in order to properly take effect.
Therefore, I am proposing that ceph-salt provide a mechanism for setting these options.
UPDATE: I found another situation where this is needed (and I think it's quite likely that there are more):
If someone needs to run cephadm bootstrap
with MGR debugging turned up, the only way to do this is via cephadm bootstrap -c
.
When deploying the following configuration, without external time server:
o- Time_Server ................................................................. [enabled]
o- External_Servers ............................................................ [empty]
o- Server_Hostname ................................................. [node1.octopus.com]
I'm getting the following error:
----------
ID: /etc/chrony.conf
Function: file.managed
Result: False
Comment: Unable to manage file: Jinja variable 'dict object' has no attribute 'external_time_servers'
Started: 10:37:14.223639
Duration: 47.736 ms
Changes:
----------
To fix this, we should check if any external time server was configured before setting it up.
When running ceph-bootstrap deploy
we need to make sure that the ceph-salt-formula state files are already loaded by the salt-master, otherwise the deployment will fail.
We can check if ceph-salt formula is loaded by running the following command: salt \* state.sls_exists ceph-salt
.
We should also always sync any state, modules or pillar changes before starting the deployment using the following command: salt \* saltutil.sync_all
Make use of cephadm pull
command to pull images before executing cephadm bootstrap
.
imaster:~ # ceph-bootstrap config "/Cluster/Minions ls"
o- Minions ............................................................... [Minions: 9]
o- imaster.ceph ....................................................... [no roles]
o- imonitor1.ceph ..................................................... [mon, mgr]
o- imonitor2.ceph ..................................................... [mon, mgr]
o- imonitor3.ceph ..................................................... [mon, mgr]
o- iosd-node1.ceph .................................................... [no roles]
o- iosd-node2.ceph .................................................... [no roles]
o- iosd-node3.ceph .................................................... [no roles]
o- iosd-node4.ceph .................................................... [no roles]
o- iosd-node5.ceph .................................................... [no roles]
OK
imonitor1:~ # ceph orchestrator host ls
HOST LABELS
imonitor3
imonitor2
imonitor1
By default, roles should be disabled, so the following entries should not be visible:
/Cluster/Roles
/Deployment
/Storage
The following new entry should be available:
/Cluster/Bootstrap_Minion
Setting the /Cluster/Bootstrap_Minion
will:
bootstrap_mon
valuebootstrap_mon
mon
and mgr
roles to the bootstrap_mon
When running in "advanced mode", all entries should be visible except /Cluster/Bootstrap_Minion
.
Adding a role to a minion should set the bootstrap_mon
value with a minion that has both mgr
and mon
roles.
$ sesdev ssh octopus_test1
Warning: Permanently added '192.168.121.4' (ECDSA) to the list of known hosts.
Have a lot of fun...
admin:~ # ceph -s
cluster:
id: be09d766-42d5-11ea-bbb8-52540088717e
health: HEALTH_WARN
3 stray host(s) with 10 service(s) not managed by cephadm
services:
mon: 3 daemons, quorum node1.octopus_test1.com,node2,node3 (age 23m)
mgr: hwboak(active, since 23m), standbys: icdsnv, ytvgxh
osd: 6 osds: 6 up (since 22m), 6 in (since 22m)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 6.0 GiB used, 42 GiB / 48 GiB avail
pgs:
admin:~ # ceph health detail
HEALTH_WARN 3 stray host(s) with 10 service(s) not managed by cephadm
[WRN] CEPHADM_STRAY_HOST: 3 stray host(s) with 10 service(s) not managed by cephadm
stray host node1.octopus_test1.com has 4 stray daemons: ['mgr.hwboak', 'mon.node1.octopus_test1.com', 'osd.0', 'osd.1']
stray host node2.octopus_test1.com has 3 stray daemons: ['mgr.icdsnv', 'osd.2', 'osd.3']
stray host node3.octopus_test1.com has 3 stray daemons: ['mgr.ytvgxh', 'osd.4', 'osd.5']
admin:~ # ceph versions
{
"mon": {
"ceph version 15.0.0-9865-gd2c6620fea (d2c6620fea8e44e5b0bc24a0effaa6347315be7e) octopus (dev)": 3
},
"mgr": {
"ceph version 15.0.0-9865-gd2c6620fea (d2c6620fea8e44e5b0bc24a0effaa6347315be7e) octopus (dev)": 3
},
"osd": {
"ceph version 15.0.0-9865-gd2c6620fea (d2c6620fea8e44e5b0bc24a0effaa6347315be7e) octopus (dev)": 6
},
"mds": {},
"overall": {
"ceph version 15.0.0-9865-gd2c6620fea (d2c6620fea8e44e5b0bc24a0effaa6347315be7e) octopus (dev)": 12
}
}
admin:~ # ceph --version
ceph version 15.0.0-9865-gd2c6620fea (d2c6620fea8e44e5b0bc24a0effaa6347315be7e) octopus (dev)
UPDATE: The workaround is to explicitly add the hosts - e.g.:
# ceph orchestrator host add admin.octopus_test1.com
Added host 'admin.octopus_test1.com'
ceph-bootstrap config displaying Dashboard password:
# ceph-bootstrap config /Deployment/Dashboard/password set admin
# ceph-bootstrap config /Deployment/Dashboard ls
o- Dashboard ........................................ [...]
o- password ..................................... [admin]
o- username ..................................... [admin]
Add a System_Update
option group to the config shell. This group should be able to configure all options required by ceph-salt-formula:software state.
Currently the only thing the software state file does is to run pkg.upgrade
module function, which will upgrade all packages of the system to the latest available version.
We should allow a more finer grain configuration, to allow kernel upgrades to be disabled/enabled, or any other packages.
Also, we should find a solution to the problem of package upgrades that require a reboot afterwards.
Solution proposal:
Have a flag that enables/disables the reboot of the machine in case of any package requires a reboot. If the flag is enabled, it should send a message in the salt-event bus stating that the minion will reboot, and then finish the state execution, and issue a reboot.
Since ceph-salt formula should be idempotent, the user of the formula after being notified that the node is rebooting, it can run again the formula in all minions.
After the initial deployment, we should:
ceph orchestrator inventory
?), e.g.:o- / ....................................................................................... [...]
o- Cluster ............................................................................... [...]
| o- Minions ...................................................................... [Minions: 4]
| | o- node1.octopus.com ............................................. [Managed by orchestrator]
| | ...
/Cluster/Roles
, and "Bootstrap minion" should not be displayed:o- / ....................................................................................... [...]
o- Cluster ............................................................................... [...]
| o- Minions ...................................................................... [Minions: 5]
| | o- node1.octopus.com ............................................. [Managed by orchestrator]
| | ...
| | o- node5.octopus.com ............................................................ [mgr, mon]
| o- Roles ............................................................... [Minions w/ roles: 1]
| o- Mgr ........................................................................ [Minions: 1]
| | o- node5.octopus.com .................................................. [other roles: mon]
| o- Mon ........................................................................ [Minions: 2]
| o- node5.octopus.com .................................................. [other roles: mgr]
/Cluster/Deployment/Bootstrap
/Cluster/Deployment/Dashboard
/SSH
After the initial deployment, the following operations should be "disabled":
/Cluster/Roles
/Deployment
/Storage
/SSH
ATM, default values are duplicated in "configshell code" and "ceph-salt-formulas".
It would be great if we have a way to share constants/default values between "configshell code", and "ceph-salt-formulas" (e.g., default dashboard username).
Very odd. . .
sesdev command line:
sesdev create octopus --ceph-salt-repo https://github.com/smithfarm/ceph-salt.git --ceph-salt-branch wip-fix-broken-osd-deploy --ceph-container-image="registry.opensuse.org/filesystems/ceph/master/upstream/images/ceph/ceph" --no-deploy-mons --no-deploy-mgrs --no-deploy-osds octopus_test1
Results in:
admin: | o- Mgr .......................................................................................................... [Minions: 3]
admin: | | o- node1.octopus_test1.com .............................................................................. [other roles: mon]
admin: | | o- node2.octopus_test1.com .............................................................................. [other roles: mon]
admin: | | o- node3.octopus_test1.com .............................................................................. [other roles: mon]
...
admin: o- Deployment .............................................................................................................. [...]
admin: | o- Bootstrap ......................................................................................................... [enabled]
admin: | o- Dashboard ............................................................................................................. [...]
admin: | | o- password ............................................................................................................ [***]
admin: | | o- username .......................................................................................................... [admin]
admin: | o- Mgr ............................................................................................................... [enabled]
admin: | o- Mon .............................................................................................................. [disabled]
admin: | o- OSD .............................................................................................................. [disabled]
and:
admin: [2020-02-25 16:49:11.286782] [node1.octopus_te] Finished with failures
admin: Failure in minion: node1.octopus_test1.com
admin: __id__: deploy remaining mgrs
admin: __run_num__: 55
admin: __sls__: ceph-salt.ceph-mgr
admin: changes:
admin: pid: 14979
admin: retcode: 22
admin: stderr: "Error EINVAL: Traceback (most recent call last):\n File \"/usr/share/ceph/mgr/mgr_module.py\"\
admin: , line 1070, in _handle_command\n return self.handle_command(inbuf, cmd)\n\
admin: \ File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 191, in handle_command\n\
admin: \ return dispatch[cmd['prefix']].call(self, cmd, inbuf)\n File \"/usr/share/ceph/mgr/mgr_module.py\"\
admin: , line 309, in call\n return self.func(mgr, **kwargs)\n File \"/usr/share/ceph/mgr/orchestrator/_interf
ace.py\"\
admin: , line 153, in <lambda>\n wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args,\
admin: \ **l_kwargs)\n File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line\
admin: \ 144, in wrapper\n return func(*args, **kwargs)\n File \"/usr/share/ceph/mgr/orchestrator/module.py\"\
admin: , line 668, in _apply_mgr\n completion = self.apply_mgr(spec)\n File \"/usr/share/ceph/mgr/orchestrator
/_interface.py\"\
admin: , line 1694, in inner\n completion = self._oremote(method_name, args, kwargs)\n\
admin: \ File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1764, in _oremote\n\
admin: \ return mgr.remote(o, meth, *args, **kwargs)\n File \"/usr/share/ceph/mgr/mgr_module.py\"\
admin: , line 1432, in remote\n args, kwargs)\nRuntimeError: Remote method threw exception:\
admin: \ Traceback (most recent call last):\n File \"/usr/share/ceph/mgr/cephadm/module.py\"\
admin: , line 2155, in apply_mgr\n len(spec.placement.hosts), num_new_mgrs))\nRuntimeError:\
admin: \ Error: 2 hosts provided, expected 3"
admin: stdout: ''
admin: comment: 'Command "ceph orch apply mgr 3 node2 node3
admin:
admin: " run'
admin: duration: 403.004
admin: name: 'ceph orch apply mgr 3 node2 node3
admin:
admin: '
admin: result: false
admin: start_time: '17:49:10.863246'
admin: state: 'cmd_|-deploy remaining mgrs_|-ceph orch apply mgr 3 node2 node3
admin:
admin: _|-run'
admin: Finished execution of ceph-salt formula
admin: Summary: Total=4 Succeeded=3 Failed=1
Command '['vagrant', 'up']' failed: ret=1 stderr:
Currently whenever ceph-bootrap runs a salt command through the salt python API, the salt python API prints to the terminal some messages in some situations. Example:
No minions matched the target. No command was sent, no jid was assigned.
No minions matched the target. No command was sent, no jid was assigned.
No minions matched the target. No command was sent, no jid was assigned.
This is a problem in the salt python API, that should output those messages to the logger instead of directly to stdout, but since we can't change that part here, ceph-bootstrap should wrap the salt python API calls in way that captures the stdout outputs avoid them to be propagated to the user stdout.
We have an example of how that can be achieved here: https://github.com/SUSE/DeepSea/blob/master/cli/common.py#L19-L45
In issue #42 a minion is tagged as bootstrapping role only when it has both MON and MGR roles, if this is a requirement, should we document it (maybe in sesdev or downstream doc) or add a check before deploying?
I tested sesdev with the following command:
sesdev create octopus --roles="[admin], [mon], [mgr], [storage]" dev
The deployment failed with:
admin: ++ ceph-bootstrap config ls
admin: o- / ......................................................................................................................... [...]
admin: o- Cluster ................................................................................................................. [...]
admin: | o- Minions ........................................................................................................ [Minions: 4]
admin: | | o- admin.dev.com .................................................................................................. [no roles]
admin: | | o- node1.dev.com ....................................................................................................... [mon]
admin: | | o- node2.dev.com ....................................................................................................... [mgr]
admin: | | o- node3.dev.com .................................................................................................. [no roles]
admin: | o- Roles ......................................................................... [Bootstrap minion: None, Minions w/ roles: 2]
admin: | o- Mgr .......................................................................................................... [Minions: 1]
admin: | | o- node2.dev.com .......................................................................................... [no other roles]
admin: | o- Mon .......................................................................................................... [Minions: 1]
admin: | o- node1.dev.com .......................................................................................... [no other roles]
admin: o- Containers .............................................................................................................. [...]
admin: | o- Images ................................................................................................................ [...]
admin: | o- ceph ..................................................................... [docker.io/ceph/daemon-base:latest-master-devel]
admin: o- Deployment .............................................................................................................. [...]
admin: | o- Bootstrap ......................................................................................................... [enabled]
admin: | o- Dashboard ............................................................................................................. [...]
admin: | | o- password ............................................................................................................ [***]
admin: | | o- username .......................................................................................................... [admin]
admin: | o- Mgr ............................................................................................................... [enabled]
admin: | o- Mon ............................................................................................................... [enabled]
admin: | o- OSD ............................................................................................................... [enabled]
admin: o- SSH ............................................................................................................ [Key Pair set]
admin: | o- Private_Key ............................................................... [a4:b9:b1:3e:e7:a2:ba:fe:c6:d6:e5:82:e6:99:d3:24]
admin: | o- Public_Key ................................................................ [a4:b9:b1:3e:e7:a2:ba:fe:c6:d6:e5:82:e6:99:d3:24]
admin: o- Storage ................................................................................................................. [...]
admin: | o- Drive_Groups ............................................................................................................ [1]
admin: | o- {"testing_dg_node3": {"host_pattern": "node3*", "data_devices": {"all": true}}} ..................................... [...]
admin: o- System_Update ........................................................................................................... [...]
admin: | o- Packages .......................................................................................................... [enabled]
admin: | o- Reboot ............................................................................................................ [enabled]
admin: o- Time_Server ......................................................................................................... [enabled]
admin: o- External_Servers ........................................................................................................ [1]
admin: | o- 0.pt.pool.ntp.org ................................................................................................... [...]
admin: o- Server_Hostname ............................................................................................. [admin.dev.com]
admin: ++ zypper lr -upEP
admin: # | Alias | Name | Enabled | GPG Check | Refresh | Priority | URI
admin: ---+---------------------+-----------------------------+---------+-----------+---------+----------+--------------------------------------------------------------------------------------------------
admin: 1 | octopus-repo1 | octopus-repo1 | Yes | (r ) Yes | No | 98 | https://download.opensuse.org/repositories/filesystems:/ceph:/master:/upstream/openSUSE_Leap_15.2
admin: 6 | repo-non-oss | Non-OSS Repository | Yes | (r ) Yes | No | 99 | http://download.opensuse.org/distribution/leap/15.2/repo/non-oss/
admin: 7 | repo-oss | Main Repository | Yes | (r ) Yes | No | 99 | http://download.opensuse.org/distribution/leap/15.2/repo/oss/
admin: 10 | repo-update | Main Update Repository | Yes | (r ) Yes | No | 99 | http://download.opensuse.org/update/leap/15.2/oss/
admin: 11 | repo-update-non-oss | Update Repository (Non-Oss) | Yes | (r ) Yes | No | 99 | http://download.opensuse.org/update/leap/15.2/non-oss/
admin: ++ zypper info cephadm
admin: ++ grep -E '(^Repo|^Version)'
admin: Repository : octopus-repo1
admin: Version : 15.1.0-lp152.833.1
admin: ++ ceph-bootstrap --version
admin: ceph-bootstrap 15.1.0+1581935293.g7a3134c
admin: ++ stdbuf -o0 ceph-bootstrap -ldebug deploy --non-interactive
admin: Checking if ceph-salt formula is available...
admin: salt-master will be restarted to load ceph-salt formula
admin: Could not find ceph-salt formula. Please check if ceph-salt-formula package is installed
Command '['vagrant', 'up']' failed: ret=1 stderr:
==> admin: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.
The error message doesn't make sense because the formula was installed correctly. After digging for a while, the minion log file provides some insight to the cause:
2020-02-19 08:41:27,500 [salt.minion :1491][INFO ][7109] User sudo_vagrant Executing command state.sls_exists with jid 20200219084127497627
2020-02-19 08:41:27,551 [salt.minion :1618][INFO ][11686] Starting a new job 20200219084127497627 with PID 11686
2020-02-19 08:41:27,701 [salt.state :967 ][INFO ][11686] Loading fresh modules for state activity
2020-02-19 08:41:27,764 [salt.utils.templates:180 ][ERROR ][11686] Rendering exception occurred
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/utils/templates.py", line 394, in render_jinja_tmpl
output = template.render(**decoded_context)
File "/usr/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render
return original_render(self, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/usr/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise
raise value.with_traceback(tb)
File "<template>", line 13, in top-level template code
jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'bootstrap_minion'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/utils/templates.py", line 169, in render_tmpl
output = render_str(tmplstr, context, tmplpath)
File "/usr/lib/python3.6/site-packages/salt/utils/templates.py", line 404, in render_jinja_tmpl
buf=tmplstr)
salt.exceptions.SaltRenderError: Jinja variable 'dict object' has no attribute 'bootstrap_minion'
2020-02-19 08:41:27,764 [salt.state :3516][CRITICAL][11686] Rendering SLS 'base:ceph-salt' failed: Jinja variable 'dict object' has no attribute 'bootstrap_minion'
2020-02-19 08:41:27,764 [salt.minion :1946][INFO ][11686] Returning information for job: 20200219084127497627
It should be possible to install Node-Exporter from ceph-bootstrap.
Currently after using ceph-bootstrap to configure all the options required by ceph-salt-formula we run ceph-salt-formula by issue the following salt command:
salt -G 'ceph-salt:member' state.apply ceph-salt
The above command is completely silent until the minions start responding after running the whole formula. The objective of the ceph-bootstrap deploy
command is to run the ceph-salt formula but give real-time execution progress feedback to the user.
The idea for the implementation is to listen the salt-event bus for events generated by ceph-salt formula, and show the current status on the terminal.
Related issue in ceph-salt-formula: SUSE/ceph-salt-formula#2
We can re-use the code in DeepSea CLI (https://github.com/SUSE/DeepSea/blob/master/cli/salt_event.py) to listen for the salt-event bus using Listener pattern.
It should be possible to install and configure Prometheus
and Grafana
from ceph-bootstrap
.
Usefull links:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.