The yugabyte-boshrelease from aegershman

yb test techniques and profiles

profiling and sample apps for clusters in this blog post: https://blog.yugabyte.com/achieving-sub-ms-latencies-on-large-data-sets-in-public-clouds

sample apps could work for testing https://github.com/yugabyte/yb-sample-apps

peruse dockerhub for load tests? https://hub.docker.com/u/yugabytedb

their explore ysql guide too https://docs.yugabyte.com/latest/quick-start/explore-ysql/

break out python packaging types in folder hierarchy

don't do python/python3..., do python3/... and python2/..., etc. in file blobstore, makes it easier to organize

similar to #73

evaluate and add in ulimits, process limits, sysctl reqs, etc.

As according to yugabyte doc under 'manual deployment', and with bpm. see this comment in this PR for more details about limits and bpm

also, the security-checklist may have helpful info https://docs.yugabyte.com/latest/secure/security-checklist/

see:

https://ro-che.info/articles/2017-03-26-increase-open-files-limit
https://gemfire.docs.pivotal.io/99/geode/managing/heap_use/lock_memory.html
usage of file-nr?
need to bump the timeout on monit way up? https://starkandwayne.com/blog/quick-guide-to-using-monit-in-bosh/
https://github.com/pivotal-cf/ulimit-release/blob/master/jobs/ulimit/spec maybe it needs to be specified in /etc/limits.conf
cloudfoundry-incubator/docker-boshrelease#37
maartensl/cf-release-ulimits@f78dae0
some interesting things that bosh cassandra release does: https://github.com/orange-cloudfoundry/cassandra-boshrelease/blob/master/jobs/cassandra/templates/bpm-prestart
and pxc-release does: https://github.com/cloudfoundry-incubator/pxc-release/blob/master/jobs/pxc-mysql/spec
good inspiration: https://github.com/cloudfoundry/nats-release/blob/develop/jobs/nats/templates/pre-start.erb

fail if certain configurations are wrong/change, e.g. scaling masters to even number

optionally fail if config doesn't seem right, help put in some validation safeguards

https://github.com/cloudfoundry-incubator/bits-service-release/blob/master/jobs/bits-service/templates/bits_config.yml.erb

erb count

fix formatting with editorconfig

probably should swap out editorconfig to tabs, remove redundant declarations, reformat all files, etc.

consider switching from flagfiles to pure cli args

because the flags library that yugabyte uses will FAIL on validation if using an --argument=like_this passed directly to the yb-{master,tserver} binary via args, but will ALLOW for unknown or invalid flags when resolving a flagfile

not a huge deal

see #78 for an interesting rationale of this (--use-cassandra-auth to masters)

prometheus and/or indicator protocol integration

see:

review ubuntu support checklist from yb

https://github.com/yugabyte/yugabyte-db/blob/1d669db4f8062c3064006aa71ef5d4d448d5ce9a/docs/content/latest/quick-start/binary/linux-install.md

nodes reporting in have hosts of localhost

doesn't appear to be something which is affecting the cluster at this exact moment, but am curious why that's happening

EDIT spotted in the wild during an upgrade. Notice how it's using bosh-dns hostname here. Interesting.

break out yb-sample-apps.jar from yugabyte blobstore subfolder

maybe. I don't know.

cockroachdb ideas

https://github.com/cppforlife/cockroachdb-release/blob/master/jobs/cockroachdb/spec

Use bosh -d cockroachdb ssh cockroachdb/0 --opts=" -L 8080:127.0.0.1:8080" to expose service locally.

aside:

https://www.cockroachlabs.com/blog/unpacking-competitive-benchmarks/

consider adding cql binding flags, cql transactional by default, etc.

contacting follower master nodes on :7000 hangs, only resolves on leader

It might be intended, or might have something to do with #53 because it's doing a redirect of some sort. doesn't really matter, tbh, but it means you have to guess and check each master to see which resolves to the leader

actually useful collection of operators and manifest

see:

#30
#49

remove yugabyted and yb-ctl jobs

In favor of just yb-master and yb-tserver for the time being. It makes more sense to just simplify and cut out other stuff for the time being.

yugabyted and yb-ctl are for local clusters, and we could do some interesting stuff like bosh ssh options to forward on localhost connections to a locally spun up yugabyted cluster and such, but tbh, just get rid of it in favor of a single-master single-tserver deployment option for goofing

cleanup packaging scripts, make them less ugly, etc.

use globbing and such

This will make #30 much easier since it'll be easier to grab packages associated to jobs, etc.

remove capabilties like NET, no need since ports above 1000

consider adding yedis proxy binding flags, etc

note, from an example tserver, looks like its filled in

--cql_proxy_bind_address=q-m97997n3s0.q-g96704.bosh:9042
--cql_proxy_webserver_port=12000
--enable_direct_local_tablet_server_call=true
--inbound_rpc_memory_limit=0
--pgsql_proxy_bind_address=
--redis_proxy_bind_address=q-m97997n3s0.q-g96704.bosh:6379

actual doc

cleanup pre/post start interrobangs and set lines

just use set -euxo pipefail instead of individual set -e -u lines
prefer /usr/bin/env bash

https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html

yb sample app logging redirection

this could be a different issue, but also this applies to anything like yb-admin, etc.

YSQL general

some reading material which may be generally beneficial:

tserver/e954712b-3feb-47cf-b917-c730eca00895:/var/vcap/jobs/yb-tserver# /var/vcap/packages/yugabyte/bin/ysqlsh -h 10.156.89.41 -p 5433
ysqlsh: FATAL:  Not found: Error loading table with oid 1260 in database with oid 1: The object does not exist: table_id: "000000010000300080000000000004ec"

https://www.postgresql.org/docs/11/config-setting.html#CONFIG-SETTING-CONFIGURATION-FILE

auth, general

Inbound connection requests coming from CF syslog-scheduler

inbound yb_rpc calls from 10.156.86.21, which appears to be from syslog_scheduler/59ac6012-0da8-4c84-9e7d-fe016f2e92fd from cf deployment

from tserver logs at http://10.156.89.37:9000/logs

W0211 18:23:02.524586    12 connection.cc:281] Connection (0x000000000222f8d0) server 10.156.86.21:35906 => 10.156.89.37:9100: Command sequence failure: Network error (yb/rpc/yb_rpc.cc:141): Invalid connection header: 1603010101010000FD03033052F1AB8F4DC20D88135C77B735F13E19090F441CAB71D7C905FC09079A3D0320D8F867C2AED349FD20E14760971EFFC749FD904C2204DB46BC9C40B3119102E80026C02FC030C02BC02CCCA8CCA9C013C009C014C00A009C009D002F0035C012000A1301130313020100008E00000013001100000E73797374656D2D6D657472696373000500050100000000000A000A0008001D001700180019000B00020100000D001A0018080404030807080508060401050106010503060302010203FF0100010000120000002B00050403040303003300260024001D00203FD19BC43DFD73E0DBEF13E59A04BC8B9618DF8EB2AB0B99CECC15F020B96276
W0211 18:23:02.524672    12 tcp_stream.cc:130] { local: 10.156.89.37:9100 remote: 10.156.86.21:35906 }: Shutting down with pending inbound data ({ capacity: 131072 pos: 0 size: 262 }, status = Network error (yb/rpc/yb_rpc.cc:141): Invalid connection header: 1603010101010000FD03033052F1AB8F4DC20D88135C77B735F13E19090F441CAB71D7C905FC09079A3D0320D8F867C2AED349FD20E14760971EFFC749FD904C2204DB46BC9C40B3119102E80026C02FC030C02BC02CCCA8CCA9C013C009C014C00A009C009D002F0035C012000A1301130313020100008E00000013001100000E73797374656D2D6D657472696373000500050100000000000A000A0008001D001700180019000B00020100000D001A0018080404030807080508060401050106010503060302010203FF0100010000120000002B00050403040303003300260024001D00203FD19BC43DFD73E0DBEF13E59A04BC8B9618DF8EB2AB0B99CECC15F020B96276)
W0211 18:23:02.524732    12 tcp_stream.cc:130] { local: 10.156.89.37:9100 remote: 10.156.86.21:35906 }: Shutting down with pending inbound data ({ capacity: 131072 pos: 0 size: 262 }, status = Service unavailable (yb/rpc/reactor.cc:91): Shutdown connection (system error 108))

Found logs from the scheduler, that's hilarious, it pings on :9100 and I believe this is what causes the tservers to puke:

<14>1 2020-02-11T18:31:02.866711Z 10.156.86.21 loggr-metric-scraper rs2 - [instance@47450 director="" deployment="cf-52b8aeeeda6f562e05f9" group="syslog_scheduler" az="us-west-2a" id="59ac6012-0da8-4c84-9e7d-fe016f2e92fd"] [id: syslog_scheduler, instance_id: , metric_url: https://10.156.89.37:9100/metrics]: Get https://10.156.89.37:9100/metrics: EOF

So... I think we could try changing the binding ports to communicate on something different? Or find some way to not fail on those requests?

status, wipe restart, etc., "generic-cmd" as errands

https://bosh.io/docs/errands/

just pass along the ability to configure commands as errands which can be configured with properties which templatize the run.sh

can do the same thing but using yb-admin https://docs.yugabyte.com/latest/admin/yb-admin/

validate whether masters are confused about connecting to themselves

seeing these kinds of log lines on master servers:

I0212 20:52:53.386559    16 reactor.cc:450] Master_R001: Timing out connection Connection (0x000000000291e010) server 10.156.89.36:55155 => 10.156.89.36:7100 - it has been idle for 65.0004s (delta: 65.0004, current time: 996.106, last activity time: 931.106)

makes me wonder if we need to be more clever about the master connection string and have it filter it's own hostname out and replace it with localhost? just thoughts

does the yugabyte helm chart do it? https://github.com/yugabyte/charts/blob/master/stable/yugabyte/templates/_helpers.tpl#L57

override node/universe uuids to match bosh-managed uuids

No clue if this is a good or awful idea, but would be nice if these params matched the instance/deployment uuid

--instance_uuid_override=
--cluster_uuid=

see:

eval if services should bind to all network interfaces or also localhost

currently we have every service bind to the private IP, but perhaps we should have it just bind to 0.0.0.0 or to more configurable options than just the private IP

for example if using the yedis-cli, you can get on a tserver node and connect to that tserver's yedis api using the private ip of the host, but not localhost; is that a problem? probably not, but worth making a little note about

https://docs.yugabyte.com/latest/troubleshoot/cluster/connect-yedis/#root

get prestart logs

check if bpm hooks provides same functionality

https://bosh.io/docs/pre-start/#logs

consume/provide links could be updated to have named peers

https://github.com/cppforlife/cockroachdb-release/blob/master/jobs/cockroachdb/spec#L13-L26

would enable implicit linking

package python for clis such as cqlsh

if going to use cqlsh on tservers directly, they'll need python.

if not, as in if we're going to run initial setup as a job/errand in a different instance group, then that job will need python

will figure it out a bit later

./cqlsh 
No appropriate python interpreter found.

cat cqlsh

# bash code here; finds a suitable python interpreter and execs this file.
# prefer unqualified "python" if suitable:
python -c 'import sys; sys.exit(not (0x020700b0 < sys.hexversion < 0x03000000))' 2>/dev/null \
    && exec python "`python -c "import os;print(os.path.dirname(os.path.realpath('$0')))"`/cqlsh.py" "$@"
for pyver in 2.7; do
    which python$pyver > /dev/null 2>&1 && exec python$pyver "`python$pyver -c "import os;print(os.path.dirname(os.path.realpath('$0')))"`/cqlsh.py" "$@"
done
echo "No appropriate python interpreter found." >&2
exit 1

which makes sense since it calls the file cqlsh.py which is the basis of the cassandra cli

https://pypi.org/project/cqlsh/

Originally posted by @aegershman in #56 (comment)

go back to yugabyte 2.0.0 and create bosh-releases to test upgrade chain to latest

low priority, would be a way just to validate upgradeability

optional deactivation of diagnostic reporting being sent to YB

see: https://docs.yugabyte.com/latest/manage/diagnostics-reporting/#configuration-options

evaluate backup options in general

bpm config on ulimits, locale, open files, etc. as according to YB advice

actual ci/cd, blobs location, etc.

errand to deactivate the redis setup... is that a thing? do we care?

need to perform setup_redis_table via yb-admin to enable yedis, but can we disable it? does it matter?

eval other stemcell lines

might be irrelevant, tbh.

bump update/canary timeout even higher

in case they need a little extra time to come back up

yb-admin as a job/errand

remove unnecessary "ephemeral_disk: false" lines in bpm

it should be off by default and it's useless and it adds confusion, just ... get rid of them

finish yb-sample-apps

running them on CF doesn't work, why gdangit

#19

make sure your bosh-dns links are actually returning bosh-dns values via features

https://bosh.io/docs/manifest-v2/#features

you should test at least having the following in the bosh deployment manifest:

features:
  use_dns_addresses: true

but these are all fun:

features:
  use_dns_addresses: true
  randomize_az_placement: true
  use_short_dns_addresses: true

Packages are compiled on demand during the deployment. The director first checks to see if there already is a compiled version of the package for the stemcell version it is being deployed to, and if it doesn't already exist a compiled version, the director will instantiate a compile VM (using the same stemcell version it is going to be deployed to) which will get the package source from the blobstore, compile it, and then package the resulting binaries and store it in the blobstore.

packaging script that is responsible for the compilation, and is run on the compile VM. The script gets two environment variables set from the BOSH agent:

BOSH_INSTALL_TARGET : Tells where to install the files the package generates. It is set to /var/vcap/data/packages//.

BOSH_COMPILE_TARGET : Tells the the directory containing the source (it is the current directory when the packaging script is invoked).

When the package is installed a symlink is created from /var/vcap/packages/ which points to the latest version of the package. This link should be used when referring to another package in the packaging script.

https://docs.yugabyte.com/latest/contribute/core-database/build-from-src/#ubuntu18

here's a bunch of symlinking happening in the relase manifest https://github.com/yugabyte/yugabyte-db/blob/master/yb_release_manifest.json

sample apps should be able to override hosts property

in order to re-use the sample apps as smoke tests or as standalone jobs, they shouldn't be required to consume yb-tserver links directly and should be able to accept manually configured host strings

consider ability to compile yb from source in packaging

not a priority at all, just writing down as a reminder that it could be an option

aegershman / yugabyte-boshrelease Goto Github PK

yugabyte-boshrelease's People

Contributors

Stargazers

Watchers

yugabyte-boshrelease's Issues

Recommend Projects

Recommend Topics

Recommend Org