Git Product home page Git Product logo

cstar's Introduction

cstar

CircleCI License

cstar is an Apache Cassandra cluster orchestration tool for the command line.

asciicast

Why not simply use Ansible or Fabric?

Ansible does not have the primitives required to run things in a topology aware fashion. One could split the C* cluster into groups that can be safely executed in parallel and run one group at a time. But unless the job takes almost exactly the same amount of time to run on every host, such a solution would run with a significantly lower rate of parallelism, not to mention it would be kludgy enough to be unpleasant to work with.

Unfortunately, Fabric is not thread safe, so the same type of limitations apply. Fabric allows one to run a job in parallel on many machines, but with similar restrictions as those of Ansible groups. It’s possibly to use fabric and celery together to do what is needed, but it’s a very complicated solution.

Requirements

All involved machines are assumed to be some sort of UNIX-like system like OS X or Linux. The machine running cstar must have python3, the Cassandra hosts must have a Bourne style shell.

Installing

You need to have Python3 and run an updated version of pip (9.0.1).

# pip3 install cstar

It's also possible to install straight from repo. This installs the latest version that may not be pushed to pypi:

# pip install git+https://github.com/spotify/cstar.git

Code of conduct

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

CLI

CStar is run through the cstar command, like so

# cstar COMMAND [HOST-SPEC] [PARAMETERS]

The HOST-SPEC specifies what nodes to run the script on. There are three ways to specify a the spec:

  1. The --seed-host switch tells cstar to connect to a specific host and fetch the full ring topology from there, and then run the script on all nodes in the cluster. --seed-host can be specified multiple times, and multiple hosts can be specified as a comma-separated list in order to run a script across multiple clusters.
  2. The --host switch specifies an exact list of hosts to use. --host can be specified multiple times, and multiple hosts can be specified as a comma-separated list.
  3. The --host-file switch points to a file name containing a newline separated list of hosts. This can be used together with process substitution, e.g. --host-file <(dig -t srv ...)

The command is the name of a script located in either /usr/lib/cstar/commands or in ~/.cstar/commands. This script will be uploaded to all nodes in the cluster and executed. File suffixes are stripped. The requirements of the script are described below. Cstar comes pre-packaged with one script file called run which takes a single parameter --command - see examples below.

Some additional switches to control cstar:

  • One can override the parallelism specified in a script by setting the switches --cluster-parallel, --dc-parallel and --strategy.

There are two special case invocations:

  • One can skip the script name and instead use the continue command to specify a previously halted job to resume.

  • One can skip the script name and instead use the cleanup-jobs. See Cleaning up old jobs.

  • If you need to access the remote cluster with a specific username, add --ssh-username=remote_username to your cstar command line. A private key file can also be specified using --ssh-identity-file=my_key_file.pem.

  • To use plain text authentication, please add --ssh-password=my_password to the command line.

  • In order to run the command first on a single node and then stop execution to verify everything worked as expected, add the following flag to your command line : --stop-after=1. cstar will stop after the first node executed the command and print out the appropriate resume command to continue the execution when ready : cstar continue <JOB_ID>

A script file can specify additional parameters.

Command syntax

In order to run a command, it is first uploaded to the relevant host, and then executed from there.

Commands can be written in any scripting language in which the hash symbol starts a line comment, e.g. shell-script, python, perl or ruby.

The first line must be a valid shebang. After that, commented lines containing key value pairs may be used to override how the script is parallelised as well as providing additional parameters for the script, e.g. # C* dc-parallel: true

The possible keys are:

cluster-parallel, can the script be run on multiple clusters in parallel. Default value is true.

dc-parallel, can the script be run on multiple data centers in the same cluster in parallel. Default value is false.

strategy, how many nodes within one data center can the script be run on. Default is topology. Can be one of:

  • one, only one node per data center
  • topology, inspect topology and run on as many nodes as the topology allows
  • all, can be run on all nodes at once

description, specifies a description for the script used in the help message.

argument, specifies an additional input parameter for the script, as well as a help text and an optional default value.

Job output

Cstar automatically saves the job status to file during operation.

Standard output, standard error and exit status of each command run against a Cassandra host is saved locally on machine where cstar is running. They are available under the users home directory in .cstar/jobs/JOB_ID/HOSTNAME

How jobs are run

When a new cstar job is created, it is assigned an id. (It's a UUID)

Cstar stores intermediate job output in the directory ~/.cstar/remote_jobs/<JOB_ID>. This directory contains files with the stdout, stderr and PID of the script, and once it finishes, it will also contain a file with the exit status of the script.

Once the job finishes, these files will be moved over to the original host and put in the directory ~/.cstar/jobs/<JOB_ID>/<REMOTE_HOST_NAME>.

Cstar jobs are run nohuped, this means that even if the ssh connection is severed, the job will proceed. In order to kill a cstar script invocation on a specific host, you will need ssh to the host and kill the proccess.

If a job is halted half-way, either by pressing ^C or by using the --stop-after parameter, it can be restarted using cstar continue <JOB_ID>. If the script was finished or already running when cstar shut down, it will not be rerun.

Cleaning up old jobs

Even on successful completion, the output of a cstar job is not deleted. This means it's easy to check what the output of a script was after it completed. The downside of this is that you can get a lot of data lying around in ~/.cstar/jobs. In order to clean things up, you can use cstar cleanup-jobs. By default it will remove all jobs older than one week. You can override the maximum age of a job before it's deleted by using the --max-job-age parameter.

Examples

# cstar run --command='service cassandra restart' --seed-host some-host

Explanation: Run the local cli command service cassandra restart on a cluster. If necessary, add sudo to the command.

# cstar puppet-upgrade-cassandra --seed-host some-host --puppet-branch=cass-2.2-upgrade

Explanation: Run the command puppet-upgrade-cassandra on a cluster. The puppet-upgrade-cassandra command expects a parameter, the puppet branch to run in order to perform the Cassandra upgrade. See the puppet-upgrade-cassandra example below.

# cstar puppet-upgrade-cassandra --help

Explanation: Show help for the puppet-upgrade-cassandra command. This includes documentation for any additional command-specific switches for the puppet-upgrade-cassandra command.

# cstar continue 90642c11-4714-44c4-a13a-94b86f09e3bb

Explanation: Resume previously created job with job id 90642c11-4714-44c4-a13a-94b86f09e3bb. The job id is the first line written on any executed job.

Example script file

This is an example script file that would saved to ~/.cstar/commands/puppet-upgrade-cassandra.sh. It upgrades a Cassandra cluster by running puppet on a different branch, then restarting the node, then upgrading the sstables.

# !/usr/bin/env bash
# C* cluster-parallel: true                                                                                                                                                                                    
# C* dc-parallel: true                                                                                                                                                                                         
# C* strategy: topology                                                                                                                                                                                        
# C* description: Upgrade one or more clusters by switching to a different puppet branch                                                                                                                       
# C* argument: {"option":"--snapshot-name", "name":"SNAPSHOT_NAME", "description":"Name of pre-upgrade snapshot", "default":"preupgrade"}                                                                      
# C* argument: {"option":"--puppet-branch", "name":"PUPPET_BRANCH", "description":"Name of puppet branch to switch to", "required":true}                                                                       

nodetool snapshot -t $SNAPSHOT_NAME
sudo puppet --branch $PUPPET_BRANCH
sudo service cassandra restart
nodetool upgradesstables

cstar's People

Contributors

adejanovski avatar arodrime avatar bj0rnen avatar eedgar avatar emmmile avatar gizem969 avatar ivanmp91 avatar kant avatar liljencrantz avatar michaelsembwever avatar nicholaspeshek avatar protocol7 avatar rjablonovsky avatar rzvoncek avatar smarsching avatar yakir-taboola avatar yarin78 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cstar's Issues

cstar set max-concurrency default based on ulimits

From @Yarin78 on July 12, 2018 12:29

To prevent issues with running out filehandles, we should set a default value to max-concurrency based on ulimits etc.

Also some informational messages when overwriting those sane defaults and giving hints how to change the ulimit

Cstar fails when trying to parse data coming from a Cassandra cluster hosted on an AWS Rack

When running Cstar, attempting to get node information from the tool causes an issue if the underlying nodetool call has incorrect formatting. The culprit is this line here:

if len(words) == 8 and re.match(_ip_re, words[0]) and re.match(_status_re, words[2]) and re.match(_state_re, words[3]) and re.match(_token_re, words[7]):

An example of the output that can come from Cassandra which triggers this issue:


==========
Address         Rack        Status State   Load            Owns                Token
                                                                               
10.***.***.***  aws-us-east-1aUp     Normal  244.32 GB       12.53%            foobar
10.***.***.***  aws-us-east-1bUp     Normal  266.45 GB       12.50%            foobar
10.***.***.***  aws-us-east-1cUp     Normal  276.46 GB       12.47%            foobar

The parsing fails and only counts seven words in the row, not eight. Would be nice if this had some flexibility to handle this edge case.

local variable 'has_error' referenced before assignment

hello
when run the cstar by this command :
cstar run --command "nodetool status" --host 192.168.1.1 --ssh-username amirio --ssh-password foobar

receive this error :
Generating endpoint mapping
Traceback (most recent call last):
File "/usr/local/bin/cstar", line 11, in
sys.exit(main())
File "/usr/local/lib/python3.5/dist-packages/cstar/cstarcli.py", line 131, in main
namespace.func(namespace)
File "/usr/local/lib/python3.5/dist-packages/cstar/args.py", line 98, in
command_parser.set_defaults(func=lambda args: execute_command(args), command=command)
File "/usr/local/lib/python3.5/dist-packages/cstar/cstarcli.py", line 115, in execute_command
ssh_identity_file = args.ssh_identity_file)
File "/usr/local/lib/python3.5/dist-packages/cstar/job.py", line 215, in setup
endpoint_mapping = self.get_endpoint_mapping(current_topology)
File "/usr/local/lib/python3.5/dist-packages/cstar/job.py", line 155, in get_endpoint_mapping
if not has_error:
UnboundLocalError: local variable 'has_error' referenced before assignment

"cstar.exceptions.HostIsDown: ('Could not find any working host while fetching endpoint mapping. Tried the following hosts:

Hi
I perform #19 (comment) for fix @Amirioelmos bug report in #19 issue but we get this error:
"cstar.exceptions.HostIsDown: ('Could not find any working host while fetching endpoint mapping. Tried the following hosts:','192.168.21.25')
also before that get exception we see following message:
Command nodetool describering system_schema failed with status 2 on host 192.168.21.25

Bump paramiko to 2.X

Hi,
Right now Cstar using paramiko version 2.7.1 and it's causing for use issues because this bug: paramiko/paramiko#1723
The issue fixed on 2.7.2 but in general Cstar working fine also on latest 2.9.2
Change paramiko from ==2.7.1 to ~=2.7 will give the option to upgrade version of paramiko up to 2.X .

Fails to get topology on locally authenticated nodes

It appears that cstar can't get the topology of the ring when using local JMX authentication.

Command nodetool describecluster failed with status 1 on host 10.52.31.4
Command nodetool ring failed with status 1 on host 10.52.31.4
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/cstar/cstarcli.py", line 116, in execute_command
ssh_lib=args.ssh_lib)
File "/usr/local/lib/python3.7/site-packages/cstar/job.py", line 203, in setup
current_topology = current_topology | self.get_cluster_topology((seed,))
File "/usr/local/lib/python3.7/site-packages/cstar/job.py", line 104, in get_cluster_topology
", ".join(tried_hosts))
cstar.exceptions.HostIsDown: ('Could not find any working host while fetching topology. Is Cassandra actually running? Tried the following hosts:', '10.52.31.4')
Error: ('Could not find any working host while fetching topology. Is Cassandra actually running? Tried the following hosts:', '10.52.31.4')

`sudo` commands won't work with ssh2-python

Using ssh2-python as ssh lib for cstar will break all commands using sudo and output the following message : stderr: sudo: no tty present and no askpass program specified

I haven't yet been able to find a workaround for this and on short term I think we should revert the default to paramiko, keeping ssh2 as experimental until a proper way of running sudo commands is found.

@emmmile @Bj0rnen, wdyt?

The cstar issue related to using parameter --host-file

"The Last Pickle" suggested to use cstar (https://github.com/spotify/cstar) as cassandra orchestration tool. The tool was installed on Cassandra LAB servers using the recommended pip3 installation. Testing was done for simple command run (nodetool status), Nodes were identified by parameters: --seed_host, --host, --host-file.
When the parameter --host-file was used with list of nodes in file (separated by newline), a error was observed.
Error details:

cat list_nodes.txt
aaaa-aaaa-xca2
baaa-aaaa-xca2

cstar run --command='sudo systemctl restart cassandra' --host-file=list_nodes.txt
Job id is 65316fa8-ffe5-461d-ae7c-9161997ae588
Running /usr/local/lib/python3.6/site-packages/cstar/resources/commands/run.sh
Starting setup
Strategy: topology
DC parallel: True
Cluster parallel: True
Loading cluster topology
Traceback (most recent call last):
  File "/usr/local/bin/cstar", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/cstar/cstarcli.py", line 132, in main
    namespace.func(namespace)
  File "/usr/local/lib/python3.6/site-packages/cstar/args.py", line 100, in <lambda>
    command_parser.set_defaults(func=lambda args: execute_command(args), command=command)
  File "/usr/local/lib/python3.6/site-packages/cstar/cstarcli.py", line 116, in execute_command
    ssh_lib=args.ssh_lib)
  File "/usr/local/lib/python3.6/site-packages/cstar/job.py", line 209, in setup
    hosts_ip_set = set(socket.gethostbyname(host) for host in hosts)
  File "/usr/local/lib/python3.6/site-packages/cstar/job.py", line 209, in <genexpr>
    hosts_ip_set = set(socket.gethostbyname(host) for host in hosts)
socket.gaierror: [Errno -2] Name or service not known

After closer investigation, it appears the error is related to related to retrieving newline as part of host (name, or IP). Ad hock solution was replace the the line 81 in file "/usr/local/lib/python3.6/site-packages/cstar/cstarcli.py" with reading lines without newline: "hosts = f.read().splitlines()"
Diff details:

diff /usr/local/lib/python3.6/site-packages/cstar/cstarcli.py /usr/local/lib/python3.6/site-packages/cstar/cstarcli.py_bak
81,82c81,82
<             hosts = f.read().splitlines()
<
---
>             hosts = f.readlines()
>

[devel][jablonovskyr.dba@lcoa-lbif-xca1 ~]$ grep -n "f.read().splitlines()" /usr/local/lib/python3.6/site-packages/cstar/cstarcli.py
81:            hosts = f.read().splitlines()

Test modification:

cat list_nodes.txt
aaaa-aaaa-xca2
baaa-aaaa-xca2

cstar run --command='nodetool status' --host-file=list_nodes.txt
Job id is 322bb609-8d14-498f-9d78-b0f33221b2fe
Running /usr/local/lib/python3.6/site-packages/cstar/resources/commands/run.sh
Starting setup
Strategy: topology
DC parallel: True
Cluster parallel: True
Loading cluster topology
Preheating DNS cache
Preheating done
Done loading cluster topology
Generating endpoint mapping
Done generating endpoint mapping
Setup done
 +  Done, up      * Executing, up      !  Failed, up      . Waiting, up
 -  Done, down    / Executing, down    X  Failed, down    : Waiting, down
Cluster: Cluster_DBA
DC: A
+
DC: B
+
2 done, 0 failed, 0 executing
Job 322bb609-8d14-498f-9d78-b0f33221b2fe finished successfully

cstar fails on getting topology

Hi
I am trying to use cstar from jumping host ; installed cstar version 0.8.0
first try to connect with seed-host i get this message :
sigaliths@packer:/home/sigaliths/awsDba >cstar run --command='df -hP' --seed-host 10.162.XX.XX --ssh-username=ec2-user --ssh-identity-file=../my_file -v
Job id is cd4ff9f9-711b-43a1-814d-f9610ecd479c
Running /usr/lib/python3.6/site-packages/cstar/resources/commands/run.sh
Starting setup
Strategy: topology
Cluster parallel: True
DC parallel: False
Loading cluster topology
Command nodetool describecluster failed with status 127 on host 10.162.XX.XX
Command nodetool status failed with status 127 on host 10.162.XX.XX
Traceback (most recent call last):
File "/usr/bin/cstar", line 10, in
sys.exit(main())
File "/usr/lib/python3.6/site-packages/cstar/cstarcli.py", line 214, in main
namespace.func(namespace)
File "/usr/lib/python3.6/site-packages/cstar/args.py", line 117, in
command_parser.set_defaults(func=lambda args: execute_command(args), command=command)
File "/usr/lib/python3.6/site-packages/cstar/cstarcli.py", line 132, in execute_command
resolve_hostnames=args.resolve_hostnames)
File "/usr/lib/python3.6/site-packages/cstar/job.py", line 269, in setup
current_topology = current_topology | self.get_cluster_topology(seeds)
File "/usr/lib/python3.6/site-packages/cstar/job.py", line 126, in get_cluster_topology
", ".join([x.ip for x in tried_hosts]))
File "/usr/lib/python3.6/site-packages/cstar/job.py", line 126, in
", ".join([x.ip for x in tried_hosts]))
AttributeError: 'str' object has no attribute 'ip'

ip address is available , and connection to the server with the ssh key is working.
running nodetool status locally on that machine also scucceeds.
what am i missing?
shouldt it connect by ssh to the server to retrieve the nodetool status output? it doing it remotely?

Cstarpar hosts_variables error

I've been unable to get cstarpar to run any commands due to what looked like a mis-assigned variable in the job initialisation:

$ cstarpar --seed-host $CASS_HOST --strategy=all  "hostname"
Job id is 29e97c20-e68e-4110-9764-f7f91aca797c
Starting setup
Strategy: all
Cluster parallel: False
DC parallel: False
Loading cluster topology
Traceback (most recent call last):
  File "/Users/et/.local/bin/cstarpar", line 8, in <module>
    sys.exit(main())
  File "/Users/et/.local/pipx/venvs/cstar/lib/python3.9/site-packages/cstar/cstarparcli.py", line 78, in main
    job.setup(
  File "/Users/et/.local/pipx/venvs/cstar/lib/python3.9/site-packages/cstar/job.py", line 271, in setup
    current_topology = current_topology | self.get_cluster_topology(seeds)
  File "/Users/et/.local/pipx/venvs/cstar/lib/python3.9/site-packages/cstar/job.py", line 105, in get_cluster_topology
    conn = self._connection(host)
  File "/Users/et/.local/pipx/venvs/cstar/lib/python3.9/site-packages/cstar/job.py", line 456, in _connection
    self._connections[host] = cstar.remote.Remote(host, self.ssh_username, self.ssh_password, self.ssh_identity_file, self.ssh_lib, self.get_host_variables(host))
  File "/Users/et/.local/pipx/venvs/cstar/lib/python3.9/site-packages/cstar/job.py", line 471, in get_host_variables
    if hostname in self.hosts_variables.keys():
AttributeError: 'NoneType' object has no attribute 'keys'

As far as I can tell this can be fixed by adding the same processing of --hosts-variables to cstarparcli.py as already exists in cstarcli.py:

    if args.hosts_variables:
        with open(args.hosts_variables) as f:
            hosts_variables = json.loads(f.read())

I'm happy to submit a PR for this, but I wanted to check if I'm along the right lines, as I've never used the host-variables arg and don't know if this would produce the intended behaviour

cstar failing on "Failed getting data from cache" during "continue"

Hi,
I'm using ctar 0.8.0
I executed cstar and tried to resume/continue but the command failed with "Failed getting data from cache".
First run:

/usr/local/bin/cstar run --command=/home/dba/cassandra/scripts/fstrim_run.sh --seed-host=reco001.tab.com --ignore-down-nodes --topology-per-dc --max-concurrency=14 --ssh-lib=ssh2 --ssh-identity-file=/var/lib/jenkins/.ssh/id_rsa  --ssh-username=root -v
Job id is e8507178-a99e-4765-a967-458e5f3d8892

I see that cache file created:

/var/lib/jenkins/.cstar/cache/endpoint_mapping-a566699b-a182-3247-a9e6-95dba8308b58-b164c8933dc0fdc44c4f030609bc25f6

As i understand, the cache file created based on schema_versions and status_topology_hash:

[root@dba e8507178-a99e-4765-a967-458e5f3d8892]# grep schema_versions -A1 job.json
    "schema_versions": [
        "a566699b-a182-3247-a9e6-95dba8308b58"
[root@dba e8507178-a99e-4765-a967-458e5f3d8892]# grep status_topology_hash -A1 job.json
    "status_topology_hash": [
        "b164c8933dc0fdc44c4f030609bc25f6"

And "continue" failed:

/usr/local/bin/cstar continue e8507178-a99e-4765-a967-458e5f3d8892 -v

The error:

08:30:22 Failed getting data from cache : <traceback object at 0x7f3eb3bd3688>
08:35:22 Traceback (most recent call last):
08:35:22   File "/usr/local/bin/cstar", line 8, in <module>
08:35:22     sys.exit(main())
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/cstarcli.py", line 214, in main
08:35:22     namespace.func(namespace)
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/cstarcli.py", line 57, in execute_continue
08:35:22     output_directory=args.output_directory, retry=args.retry_failed)
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/jobreader.py", line 35, in read
08:35:22     return _parse(f.read(), file, output_directory, job, job_id, stop_after, max_days, endpoint_mapper, retry)
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/jobreader.py", line 92, in _parse
08:35:22     endpoint_mapping = endpoint_mapper(original_topology)
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/job.py", line 222, in get_endpoint_mapping
08:35:22     pickle.dump(dict(endpoint_mappings), open(self.get_cache_file_path("endpoint_mapping"), 'wb'))
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/job.py", line 130, in get_cache_file_path
08:35:22     return os.path.join(self.cache_directory, "{}-{}-{}".format(cache_type, "-".join(sorted(self.schema_versions)), "-".join(sorted(self.status_topology_hash))))
08:35:22 AttributeError: 'Job' object has no attribute 'cache_directory'
...
08:35:22 Summary:
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/jobreader.py", line 35, in read
08:35:22     return _parse(f.read(), file, output_directory, job, job_id, stop_after, max_days, endpoint_mapper, retry)
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/jobreader.py", line 92, in _parse
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/job.py", line 222, in get_endpoint_mapping
08:35:22   File "/usr/local/lib/python3.6/site-packages/cstar/job.py", line 130, in get_cache_file_path
08:35:22 AttributeError: 'Job' object has no attribute 'cache_directory'

Thank you,
Yakir Gibraltar

Nice error when dead nodes are passed to cstar

From @Yarin78 on July 12, 2018 12:29

When a dead node is passed to cstar, cstar will complain about empty JSON, not about the dead node. Dead nodes are kinda common, they happen when they are taken out of ring, but kept alive, which allows disco to report them, and when such disco gets into cstar, we get a JSON complain.

Add commands for nodetool operations that make sense

From @Bj0rnen on July 23, 2018 11:12

We only install one command, run, with cstar. Of course, it's the most versatile kind of command, but having some commands that do something more specific would showcase the intended way of using cstar (not just using cstar run all the time).

In theory there are lots of commands that could make sense, like nodetool cleanup, nodetool disableautocompaction, nodetool enableautocompaction, nodetool stop COMPACTION, nodetool flush, etc.

Not sure if we have to strike a balance here, not being too spammy with commands, or if more is simply better.

Resume/Continue failing

Hi @adejanovski ,
Continue failing with Python error:

10:37:25 unbuffer /usr/local/bin/cstar continue 470ef4f5-d6a2-4372-a9e6-2430c30e22a9 -v 2>&1 | tee -a /var/log/cstar/cstar_RESUME_20210210-083724.log
10:37:25 Retry :  False
10:37:25 Resuming job 470ef4f5-d6a2-4372-a9e6-2430c30e22a9
10:37:25 Running  /usr/local/lib/python3.6/site-packages/cstar/resources/commands/run.sh
10:37:54 Traceback (most recent call last):
10:37:54   File "/usr/local/bin/cstar", line 8, in <module>
10:37:54     sys.exit(main())
10:37:54   File "/usr/local/lib/python3.6/site-packages/cstar/cstarcli.py", line 225, in main
10:37:54     namespace.func(namespace)
10:37:54   File "/usr/local/lib/python3.6/site-packages/cstar/cstarcli.py", line 70, in execute_continue
10:37:54     job.resume()
10:37:54   File "/usr/local/lib/python3.6/site-packages/cstar/job.py", line 332, in resume
10:37:54     self.run()
10:37:54   File "/usr/local/lib/python3.6/site-packages/cstar/job.py", line 344, in run
10:37:54     self.schedule_all_runnable_jobs()
10:37:54   File "/usr/local/lib/python3.6/site-packages/cstar/job.py", line 433, in schedule_all_runnable_jobs
10:37:54     next_host = self.state.find_next_host()
10:37:54   File "/usr/local/lib/python3.6/site-packages/cstar/state.py", line 75, in find_next_host
10:37:54     ignore_down_nodes=self.ignore_down_nodes)
10:37:54   File "/usr/local/lib/python3.6/site-packages/cstar/strategy.py", line 69, in find_next_host
10:37:54     return _strategy_mapping[strategy](remaining, endpoint_mapping, progress.running)
10:37:54   File "/usr/local/lib/python3.6/site-packages/cstar/strategy.py", line 83, in _topology_find_next_host
10:37:54     for next in endpoint_mapping[h]:
10:37:54 KeyError: Host(fqdn='1.1.1.1', ip='1.1.1.1', dc='UND', cluster='UND Cluster', rack='RAC1', is_up=True, host_id='259fc144-0a35-4e92-9cb6-09f099c911df')
12:32:46 
12:32:46 Aborted

job.json file looks like:

{
    "cache_directory": "/var/lib/jk/.cstar/cache",
    "command": "/usr/local/lib/python3.6/site-packages/cstar/resources/commands/run.sh",
    "creation_timestamp": 1612946274,
    "env": {
        "COMMAND": "/home/cassandra/scripts/cassandra_restart.sh \"/home/cassandra/scripts/find_expired_files.sh\""
    },
    "errors": [],
    "hosts_variables": {},
    "is_preheated": false,
    "jmx_username": null,
    "job_runner": "RemoteJobRunner",
    "key_space": null,
    "output_directory": "/var/lib/jk/.cstar/jobs/470ef4f5-d6a2-4372-a9e6-2430c30e22a9",
    "resolve_hostnames": false,
    "returned_jobs": [],
    "schema_versions": [
        "b1066b3a-c020-3e13-afde-b977369eb723"
    ],
    "sleep_after_done": null,
    "sleep_on_new_runner": 0.5,
    "ssh_identity_file": "/var/lib/jk/.ssh/id_rsa",
    "ssh_lib": "ssh2",
    "ssh_password": null,
    "ssh_username": "root",
    "state": {
        "cluster_parallel": true,
        "current_topology": [

....long list
                [
                    "4.4.4.4",
                    "4.4.4.4",
                    "UND",
                    "UND Cluster",
                    "RAC1",
                    true,
                    "9544d7e9-ebdc-40fa-9532-ca37b0ed7d80"
                ],
                [
                    "3.3.3.3",
                    "3.3.3.3",
                    "UND",
                    "UND Cluster",
                    "RAC1",
                    true,
                    "4e8b75c9-e3c0-4447-bc5f-a5198a51d416"
                ]
            ],
            "failed": [
                [
                    "2.2.2.2",
                    "2.2.2.2",
                    "UND",
                    "UND Cluster",
                    "RAC1",
                    true,
                    "52660ead-5bd4-4bb5-9d27-7dc5f31b8d1a"
                ]
            ],
            "running": [
                [
                    "1.1.1.1",
                    "1.1.1.1",
                    "UND",
                    "UND Cluster",
                    "RAC1",
                    true,
                    "259fc144-0a35-4e92-9cb6-09f099c911df"
                ]
            ]
        },
        "strategy": "topology"
    },
    "status_topology_hash": [
        "1bac2bdafcc8201f0fbf2f1f023f8cba"
    ],
    "timeout": null,
    "version": 8
}

Thank you, Yakir Gibraltar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.