etcd-io / zetcd Goto Github PK

View Code? Open in Web Editor NEW

1.1K 49.0 91.0 2.71 MB

Serve the Apache Zookeeper API but back it with an etcd cluster

License: Apache License 2.0

Go 93.35% Shell 2.86% Makefile 0.76% Dockerfile 2.79% Procfile 0.24%

zookeeper etcd

zetcd's People

Contributors

Stargazers

Watchers

Forkers

julienbalestra rajdavies heyitsanthony devopsbox gophersgang opsline cecuesta wrouesnel zhanglei nicholasjackson joe2hpimn etsangsplk zhangf911 hadoop835 hanyouyou13 521314 kruppel will2love xiaopal ibuystuff zhwentao mnjstwins anatolyzimin ignoreant jrossi rushit tsuraan oleewere dforsyth mirans ericchiang lihua8712051 alexxnica kryndex infini-loop cluo wakwanza evilmcjerkface jpiper stone-sail huoyinghui witgo skymysky edgao trustedkeep dalavancloud ph14 namehaibinzhang kshvakov iamnewspecies yangliang9004 angopher lloydmeta rzs840707 lucidworks jangocheng neoblackcap andrewhao888 jackxueman dalent hackeren raghu999 wang-shun laashub-soa wyl-hit real420og buni papayofen isgasho devbrom ggordan lmangani hsaputra piyushmor rrepo isabella232 renowncoder sighingnow lenosi kumkeehyun frankfanslc samkenxstream my-git9 lwandsyj iq-scm xuanswe luis-pinto-fanduel zhaoruze fanyang89

zetcd's Issues

Mesos-slave cannot register

Thanks for 13e6ff383b4; now Leader Election in Mesos seems to work.

The next step is attaching a Mesos-slave to the Master, but the slave never successfully registers. I tried using zetcd in zkbridge mode (didn't specify -oracle, don't know if it's relevant?), but couldn't find any error or warning in zetcd logs while doing so, and the slave registered correctly.

Mesos-master log:

I1118 21:29:46.485221    76 main.cpp:263] Build: 2016-08-26 23:00:07 by ubuntu
I1118 21:29:46.485282    76 main.cpp:264] Version: 1.0.1
I1118 21:29:46.485285    76 main.cpp:267] Git tag: 1.0.1
I1118 21:29:46.485287    76 main.cpp:271] Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
I1118 21:29:46.486155    76 main.cpp:370] Using 'HierarchicalDRF' allocator
I1118 21:29:46.486438    76 leveldb.cpp:174] Opened db in 227954ns
I1118 21:29:46.486568    76 leveldb.cpp:181] Compacted db in 116695ns
I1118 21:29:46.486580    76 leveldb.cpp:196] Created db iterator in 4884ns
I1118 21:29:46.486585    76 leveldb.cpp:202] Seeked to beginning of db in 370ns
I1118 21:29:46.486589    76 leveldb.cpp:271] Iterated through 0 keys in the db in 193ns
I1118 21:29:46.486613    76 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1118 21:29:46.486932    86 log.cpp:107] Attempting to join replica to ZooKeeper group
2016-11-18 21:29:46,486:76(0x7f73e021a700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-11-18 21:29:46,486:76(0x7f73e021a700):ZOO_INFO@log_env@730: Client environment:host.name=mesosmaster-master-1
2016-11-18 21:29:46,486:76(0x7f73e021a700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-11-18 21:29:46,486:76(0x7f73e021a700):ZOO_INFO@log_env@738: Client environment:os.arch=4.4.0-47-generic
2016-11-18 21:29:46,486:76(0x7f73e021a700):ZOO_INFO@log_env@739: Client environment:os.version=#68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016
2016-11-18 21:29:46,487:76(0x7f73e121c700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-11-18 21:29:46,487:76(0x7f73e121c700):ZOO_INFO@log_env@730: Client environment:host.name=mesosmaster-master-1
I1118 21:29:46.487156    88 recover.cpp:451] Starting replica recovery
2016-11-18 21:29:46,487:76(0x7f73e121c700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-11-18 21:29:46,487:76(0x7f73e121c700):ZOO_INFO@log_env@738: Client environment:os.arch=4.4.0-47-generic
2016-11-18 21:29:46,487:76(0x7f73e121c700):ZOO_INFO@log_env@739: Client environment:os.version=#68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016
2016-11-18 21:29:46,487:76(0x7f73dea17700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-11-18 21:29:46,487:76(0x7f73dea17700):ZOO_INFO@log_env@730: Client environment:host.name=mesosmaster-master-1
2016-11-18 21:29:46,487:76(0x7f73dea17700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-11-18 21:29:46,487:76(0x7f73dea17700):ZOO_INFO@log_env@738: Client environment:os.arch=4.4.0-47-generic
2016-11-18 21:29:46,487:76(0x7f73dea17700):ZOO_INFO@log_env@739: Client environment:os.version=#68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016
I1118 21:29:46.487267    76 main.cpp:543] Starting Mesos master
2016-11-18 21:29:46,487:76(0x7f73e1a1d700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-11-18 21:29:46,487:76(0x7f73e1a1d700):ZOO_INFO@log_env@730: Client environment:host.name=mesosmaster-master-1
2016-11-18 21:29:46,487:76(0x7f73e1a1d700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-11-18 21:29:46,487:76(0x7f73e1a1d700):ZOO_INFO@log_env@738: Client environment:os.arch=4.4.0-47-generic
2016-11-18 21:29:46,487:76(0x7f73e021a700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-11-18 21:29:46,487:76(0x7f73e1a1d700):ZOO_INFO@log_env@739: Client environment:os.version=#68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016
2016-11-18 21:29:46,487:76(0x7f73dea17700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-11-18 21:29:46,487:76(0x7f73e121c700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-11-18 21:29:46,487:76(0x7f73e021a700):ZOO_INFO@log_env@755: Client environment:user.home=/home/mesos
2016-11-18 21:29:46,487:76(0x7f73e021a700):ZOO_INFO@log_env@767: Client environment:user.dir=/etc/service/mesos
2016-11-18 21:29:46,487:76(0x7f73e021a700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=zookeeper:2181 sessionTimeout=10000 watcher=0x7f73e9f706d0 sessionId=0 sessionPasswd=<null> context=0x7f73c0001060 flags=0
2016-11-18 21:29:46,487:76(0x7f73e1a1d700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-11-18 21:29:46,487:76(0x7f73e121c700):ZOO_INFO@log_env@755: Client environment:user.home=/home/mesos
2016-11-18 21:29:46,487:76(0x7f73dea17700):ZOO_INFO@log_env@755: Client environment:user.home=/home/mesos
2016-11-18 21:29:46,487:76(0x7f73e121c700):ZOO_INFO@log_env@767: Client environment:user.dir=/etc/service/mesos
2016-11-18 21:29:46,487:76(0x7f73e121c700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=zookeeper:2181 sessionTimeout=10000 watcher=0x7f73e9f706d0 sessionId=0 sessionPasswd=<null> context=0x7f73bc000930 flags=0
2016-11-18 21:29:46,487:76(0x7f73dea17700):ZOO_INFO@log_env@767: Client environment:user.dir=/etc/service/mesos
2016-11-18 21:29:46,487:76(0x7f73dea17700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=zookeeper:2181 sessionTimeout=10000 watcher=0x7f73e9f706d0 sessionId=0 sessionPasswd=<null> context=0x7f73c8001960 flags=0
2016-11-18 21:29:46,487:76(0x7f73e1a1d700):ZOO_INFO@log_env@755: Client environment:user.home=/home/mesos
2016-11-18 21:29:46,487:76(0x7f73e1a1d700):ZOO_INFO@log_env@767: Client environment:user.dir=/etc/service/mesos
2016-11-18 21:29:46,487:76(0x7f73e1a1d700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=zookeeper:2181 sessionTimeout=10000 watcher=0x7f73e9f706d0 sessionId=0 sessionPasswd=<null> context=0x7f73d0001740 flags=0
I1118 21:29:46.488404    76 master.cpp:375] Master 9f8b1b5c-c8d4-4e4b-bbd0-c03213ab4385 (mesosmaster-master-1) started on 172.16.102.133:5050
I1118 21:29:46.488423    76 master.cpp:377] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="mesosmaster-master-1" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --port="5050" --quiet="false" --quorum="2" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/data" --zk="zk://zookeeper:2181/mesos" --zk_session_timeout="10secs"
I1118 21:29:46.488560    76 master.cpp:429] Master allowing unauthenticated frameworks to register
I1118 21:29:46.488576    76 master.cpp:443] Master allowing unauthenticated agents to register
I1118 21:29:46.488590    76 master.cpp:457] Master allowing HTTP frameworks to register without authentication
I1118 21:29:46.488608    76 master.cpp:499] Using default 'crammd5' authenticator
W1118 21:29:46.488617    76 authenticator.cpp:512] No credentials provided, authentication requests will be refused
I1118 21:29:46.488629    76 authenticator.cpp:519] Initializing server SASL
I1118 21:29:46.491000    92 recover.cpp:477] Replica is in EMPTY status
2016-11-18 21:29:46,491:76(0x7f73b6423700):ZOO_INFO@check_events@1728: initiated connection to server [172.16.88.196:2181]
2016-11-18 21:29:46,491:76(0x7f73b6c24700):ZOO_INFO@check_events@1728: initiated connection to server [172.16.88.196:2181]
2016-11-18 21:29:46,491:76(0x7f73affff700):ZOO_INFO@check_events@1728: initiated connection to server [172.16.52.4:2181]
2016-11-18 21:29:46,491:76(0x7f73b4c20700):ZOO_INFO@check_events@1728: initiated connection to server [172.16.52.4:2181]
I1118 21:29:46.493103    90 contender.cpp:152] Joining the ZK group
2016-11-18 21:29:46,493:76(0x7f73b6c24700):ZOO_INFO@check_events@1775: session establishment complete on server [172.16.88.196:2181], sessionId=0x4cfa587954761aab, negotiated timeout=10000
2016-11-18 21:29:46,493:76(0x7f73b6423700):ZOO_INFO@check_events@1775: session establishment complete on server [172.16.88.196:2181], sessionId=0x4cfa587954761aad, negotiated timeout=10000
I1118 21:29:46.493445    90 group.cpp:349] Group process (group(3)@172.16.102.133:5050) connected to ZooKeeper
I1118 21:29:46.493461    90 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
2016-11-18 21:29:46,493:76(0x7f73affff700):ZOO_INFO@check_events@1775: session establishment complete on server [172.16.52.4:2181], sessionId=0x6bf85879547643ef, negotiated timeout=10000
2016-11-18 21:29:46,493:76(0x7f73b4c20700):ZOO_INFO@check_events@1775: session establishment complete on server [172.16.52.4:2181], sessionId=0x6bf85879547643f1, negotiated timeout=10000
I1118 21:29:46.493489    90 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I1118 21:29:46.493490    91 group.cpp:349] Group process (group(4)@172.16.102.133:5050) connected to ZooKeeper
I1118 21:29:46.493501    91 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1118 21:29:46.493505    91 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I1118 21:29:46.493578    89 group.cpp:349] Group process (group(1)@172.16.102.133:5050) connected to ZooKeeper
I1118 21:29:46.493587    88 group.cpp:349] Group process (group(2)@172.16.102.133:5050) connected to ZooKeeper
I1118 21:29:46.493743    89 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1118 21:29:46.493754    89 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
I1118 21:29:46.493784    88 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
I1118 21:29:46.493793    88 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
I1118 21:29:46.497033    91 detector.cpp:152] Detected a new leader: (id='1')
I1118 21:29:46.497112    92 group.cpp:706] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I1118 21:29:46.499739    86 zookeeper.cpp:259] A new leading master ([email protected]:5050) is detected
I1118 21:29:46.499851    86 master.cpp:1847] The newly elected leader is [email protected]:5050 with id 01d0998e-8d9b-4791-af19-227693b85d0c
I1118 21:29:46.504482    93 contender.cpp:268] New candidate (id='4') has entered the contest for leadership
E1118 21:29:48.699439    94 process.cpp:2022] Failed to shutdown socket with fd 23: Transport endpoint is not connected
I1118 21:29:55.822985    90 detector.cpp:152] Detected a new leader: (id='4')
I1118 21:29:55.823264    89 group.cpp:706] Trying to get '/mesos/json.info_0000000004' in ZooKeeper
I1118 21:29:55.830425    91 zookeeper.cpp:259] A new leading master ([email protected]:5050) is detected
I1118 21:29:55.830564    89 master.cpp:1847] The newly elected leader is [email protected]:5050 with id 9f8b1b5c-c8d4-4e4b-bbd0-c03213ab4385
I1118 21:29:55.830605    89 master.cpp:1860] Elected as the leading master!
I1118 21:29:55.830620    89 master.cpp:1547] Recovering from registrar
I1118 21:29:55.830691    91 registrar.cpp:332] Recovering registrar
E1118 21:29:56.479151    94 process.cpp:2022] Failed to shutdown socket with fd 25: Transport endpoint is not connected
I1118 21:29:56.492329    94 recover.cpp:110] Unable to finish the recover protocol in 10secs, retrying
E1118 21:30:06.721837    94 process.cpp:2022] Failed to shutdown socket with fd 25: Transport endpoint is not connected
E1118 21:30:26.533833    94 process.cpp:2022] Failed to shutdown socket with fd 25: Transport endpoint is not connected
E1118 21:30:34.553061    94 process.cpp:2022] Failed to shutdown socket with fd 25: Transport endpoint is not connected
F1118 21:30:55.832010    93 master.cpp:1536] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
    @     0x7f73ea4ea39d  google::LogMessage::Fail()
    @     0x7f73ea4ec1cd  google::LogMessage::SendToLog()
    @     0x7f73ea4e9f8c  google::LogMessage::Flush()
    @     0x7f73ea4ecac9  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f73e9ad8f7c  mesos::internal::master::fail()
    @     0x7f73e9b209c0  _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EEEEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
    @           0x42a426  process::Future<>::fail()
    @     0x7f73e9b4a615  process::internal::thenf<>()
    @     0x7f73e9ba7a56  _ZN7process8internal3runISt8functionIFvRKNS_6FutureIN5mesos8internal8RegistryEEEEEJRS7_EEEvRKSt6vectorIT_SaISE_EEDpOT0_
    @     0x7f73e9bba9cf  process::Future<>::fail()
    @     0x7f73e99056c6  process::internal::run<>()
    @     0x7f73e9bba9bc  process::Future<>::fail()
    @     0x7f73e9b9d854  mesos::internal::master::RegistrarProcess::_recover()
    @     0x7f73ea479cf1  process::ProcessManager::resume()
    @     0x7f73ea479ff7  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f73e8b1da60  (unknown)
    @     0x7f73e833a184  start_thread
    @     0x7f73e806737d  (unknown)

Mesos-slave log:

ARNING: Logging before InitGoogleLogging() is written to STDERR
I1118 21:45:16.571516    49 main.cpp:243] Build: 2016-08-26 23:00:07 by ubuntu
I1118 21:45:16.571648    49 main.cpp:244] Version: 1.0.1
I1118 21:45:16.571655    49 main.cpp:247] Git tag: 1.0.1
I1118 21:45:16.571660    49 main.cpp:251] Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
I1118 21:45:16.681454    49 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I1118 21:45:16.693532    49 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
2016-11-18 21:45:16,696:49(0x7f04a697c700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-11-18 21:45:16,696:49(0x7f04a697c700):ZOO_INFO@log_env@730: Client environment:host.name=mesosslave-worker-1
2016-11-18 21:45:16,696:49(0x7f04a697c700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-11-18 21:45:16,696:49(0x7f04a697c700):ZOO_INFO@log_env@738: Client environment:os.arch=4.4.0-47-generic
2016-11-18 21:45:16,696:49(0x7f04a697c700):ZOO_INFO@log_env@739: Client environment:os.version=#68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016
2016-11-18 21:45:16,696:49(0x7f04a697c700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
I1118 21:45:16.697005    49 main.cpp:434] Starting Mesos agent
2016-11-18 21:45:16,697:49(0x7f04a697c700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-11-18 21:45:16,697:49(0x7f04a697c700):ZOO_INFO@log_env@767: Client environment:user.dir=/etc/service/mesos
2016-11-18 21:45:16,697:49(0x7f04a697c700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=zookeeper:2181 sessionTimeout=10000 watcher=0x7f04b16d46d0 sessionId=0 sessionPasswd=<null> context=0x7f0490001090 flags=0
I1118 21:45:16.698128    62 slave.cpp:198] Agent started on 1)@172.16.169.4:5051
I1118 21:45:16.698164    62 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="10secs" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="5mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname="mesosslave-worker-1" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/usr/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://zookeeper:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/mnt/poc/cluster-data/mesosslave-worker"
I1118 21:45:16.699134    62 slave.cpp:519] Agent resources: cpus(*):8; mem(*):63313; disk(*):27048; ports(*):[31000-32000]
I1118 21:45:16.699211    62 slave.cpp:527] Agent attributes: [  ]
I1118 21:45:16.699220    62 slave.cpp:532] Agent hostname: mesosslave-worker-1
I1118 21:45:16.706522    58 state.cpp:57] Recovering state from '/mnt/poc/cluster-data/mesosslave-worker/meta'
I1118 21:45:16.706658    58 state.cpp:697] No checkpointed resources found at '/mnt/poc/cluster-data/mesosslave-worker/meta/resources/resources.info'
I1118 21:45:16.706874    58 state.cpp:100] Failed to find the latest agent from '/mnt/poc/cluster-data/mesosslave-worker/meta'
I1118 21:45:16.707129    58 status_update_manager.cpp:200] Recovering status update manager
I1118 21:45:16.707309    59 docker.cpp:775] Recovering Docker containers
I1118 21:45:16.707326    58 containerizer.cpp:522] Recovering containerizer
2016-11-18 21:45:16,711:49(0x7f049fdf9700):ZOO_INFO@check_events@1728: initiated connection to server [172.16.52.4:2181]
I1118 21:45:16.711513    60 provisioner.cpp:253] Provisioner recovery complete
2016-11-18 21:45:16,717:49(0x7f049fdf9700):ZOO_INFO@check_events@1775: session establishment complete on server [172.16.52.4:2181], sessionId=0x6bf85879547673af, negotiated timeout=10000
I1118 21:45:16.717540    56 group.cpp:349] Group process (group(1)@172.16.169.4:5051) connected to ZooKeeper
I1118 21:45:16.717618    56 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1118 21:45:16.717635    56 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I1118 21:45:16.726135    56 detector.cpp:152] Detected a new leader: (id='8')
I1118 21:45:16.726408    59 group.cpp:706] Trying to get '/mesos/json.info_0000000008' in ZooKeeper
I1118 21:45:16.731709    55 zookeeper.cpp:259] A new leading master ([email protected]:5050) is detected
I1118 21:45:16.782778    62 slave.cpp:4782] Finished recovery
I1118 21:45:16.783553    60 status_update_manager.cpp:174] Pausing sending status updates
I1118 21:45:16.783586    57 slave.cpp:895] New master detected at [email protected]:5050
I1118 21:45:16.783640    57 slave.cpp:916] No credentials provided. Attempting to register without authentication
I1118 21:45:16.783668    57 slave.cpp:927] Detecting new master
I1118 21:46:16.700013    57 slave.cpp:4591] Current disk usage 51.11%. Max allowed age: 2.722231364545590days
I1118 21:46:54.897162    62 slave.cpp:3732] [email protected]:5050 exited
W1118 21:46:54.897258    62 slave.cpp:3737] Master disconnected! Waiting for a new master to be elected
I1118 21:47:04.767259    61 detector.cpp:152] Detected a new leader: (id='9')
I1118 21:47:04.767319    61 group.cpp:706] Trying to get '/mesos/json.info_0000000009' in ZooKeeper
I1118 21:47:04.768859    60 zookeeper.cpp:259] A new leading master ([email protected]:5050) is detected
I1118 21:47:04.768920    60 slave.cpp:895] New master detected at [email protected]:5050
I1118 21:47:04.768929    60 slave.cpp:916] No credentials provided. Attempting to register without authentication
I1118 21:47:04.768939    60 slave.cpp:927] Detecting new master
I1118 21:47:04.768957    60 status_update_manager.cpp:174] Pausing sending status updates
I1118 21:47:16.701213    60 slave.cpp:4591] Current disk usage 51.13%. Max allowed age: 2.721178202893114days

Sample of zetcd log:

I1118 21:49:13.493881      42 conn.go:133] conn.Send(xid=1479504503, zxid=142, &{Children:[json.info_0000000010 json.info_0000000011 log_replicas log_replicas0000000010 log_replicas0000000011]})
I1118 21:49:13.494000      42 zketcd.go:425] GetChildren(1479505765) = (zxid=143, resp={Children:[]})
I1118 21:49:13.494552      42 conn.go:133] conn.Send(xid=1479505765, zxid=142, &{Children:[]})
I1118 21:49:13.660479      42 server.go:110] zkreq={xid:4738 req:*zetcd.GetChildren2Request:&{Path:/marathon/leader-curator Watch:false}}
I1118 21:49:13.660520      42 zklog.go:78] GetChildren2(4738,{Path:/marathon/leader-curator Watch:false})
I1118 21:49:13.660859      42 zketcd.go:165] GetChildren2(143) = (zxid=4738, resp={Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:13.661333      42 conn.go:133] conn.Send(xid=4738, zxid=142, &{Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:13.661884      42 server.go:110] zkreq={xid:4739 req:*zetcd.GetDataRequest:&{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false}}
I1118 21:49:13.661907      42 zklog.go:43] GetData(4739,{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false})
I1118 21:49:13.662279      42 zketcd.go:312] GetData(4739) = (zxid=143, resp={Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:13.662767      42 conn.go:133] conn.Send(xid=4739, zxid=142, &{Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:14.175323      42 server.go:110] zkreq={xid:4740 req:*zetcd.GetChildren2Request:&{Path:/marathon/leader-curator Watch:false}}
I1118 21:49:14.175350      42 zklog.go:78] GetChildren2(4740,{Path:/marathon/leader-curator Watch:false})
I1118 21:49:14.175721      42 zketcd.go:165] GetChildren2(143) = (zxid=4740, resp={Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:14.176448      42 conn.go:133] conn.Send(xid=4740, zxid=142, &{Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:14.177547      42 server.go:110] zkreq={xid:4741 req:*zetcd.GetDataRequest:&{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false}}
I1118 21:49:14.177575      42 zklog.go:43] GetData(4741,{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false})
I1118 21:49:14.178028      42 zketcd.go:312] GetData(4741) = (zxid=143, resp={Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:14.178265      42 conn.go:133] conn.Send(xid=4741, zxid=142, &{Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:14.700613      42 server.go:110] zkreq={xid:4742 req:*zetcd.GetChildren2Request:&{Path:/marathon/leader-curator Watch:false}}
I1118 21:49:14.700677      42 zklog.go:78] GetChildren2(4742,{Path:/marathon/leader-curator Watch:false})
I1118 21:49:14.701415      42 zketcd.go:165] GetChildren2(143) = (zxid=4742, resp={Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:14.703378      42 conn.go:133] conn.Send(xid=4742, zxid=142, &{Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:14.703934      42 server.go:110] zkreq={xid:4743 req:*zetcd.GetDataRequest:&{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false}}
I1118 21:49:14.703965      42 zklog.go:43] GetData(4743,{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false})
I1118 21:49:14.704545      42 zketcd.go:312] GetData(4743) = (zxid=143, resp={Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:14.705244      42 conn.go:133] conn.Send(xid=4743, zxid=142, &{Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:15.230430      42 server.go:110] zkreq={xid:4744 req:*zetcd.GetChildren2Request:&{Path:/marathon/leader-curator Watch:false}}
I1118 21:49:15.230481      42 zklog.go:78] GetChildren2(4744,{Path:/marathon/leader-curator Watch:false})
I1118 21:49:15.231662      42 zketcd.go:165] GetChildren2(143) = (zxid=4744, resp={Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:15.232142      42 conn.go:133] conn.Send(xid=4744, zxid=142, &{Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:15.232605      42 server.go:110] zkreq={xid:4745 req:*zetcd.GetDataRequest:&{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false}}
I1118 21:49:15.232646      42 zklog.go:43] GetData(4745,{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false})
I1118 21:49:15.233181      42 zketcd.go:312] GetData(4745) = (zxid=143, resp={Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:15.234508      42 conn.go:133] conn.Send(xid=4745, zxid=142, &{Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:15.766557      42 server.go:110] zkreq={xid:4746 req:*zetcd.GetChildren2Request:&{Path:/marathon/leader-curator Watch:false}}
I1118 21:49:15.766609      42 zklog.go:78] GetChildren2(4746,{Path:/marathon/leader-curator Watch:false})
I1118 21:49:15.767609      42 zketcd.go:165] GetChildren2(143) = (zxid=4746, resp={Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:15.769301      42 conn.go:133] conn.Send(xid=4746, zxid=142, &{Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:15.770105      42 server.go:110] zkreq={xid:4747 req:*zetcd.GetDataRequest:&{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false}}
I1118 21:49:15.770147      42 zklog.go:43] GetData(4747,{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false})
I1118 21:49:15.771021      42 zketcd.go:312] GetData(4747) = (zxid=143, resp={Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:15.771336      42 conn.go:133] conn.Send(xid=4747, zxid=142, &{Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:16.272710      42 server.go:110] zkreq={xid:-2 req:*zetcd.PingRequest:&{}}
I1118 21:49:16.272750      42 zklog.go:73] Ping(-2,{})
I1118 21:49:16.273108      42 conn.go:133] conn.Send(xid=-2, zxid=142, &{})
I1118 21:49:16.287996      42 server.go:110] zkreq={xid:4748 req:*zetcd.GetChildren2Request:&{Path:/marathon/leader-curator Watch:false}}
I1118 21:49:16.288026      42 zklog.go:78] GetChildren2(4748,{Path:/marathon/leader-curator Watch:false})
I1118 21:49:16.288454      42 zketcd.go:165] GetChildren2(143) = (zxid=4748, resp={Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:16.289076      42 conn.go:133] conn.Send(xid=4748, zxid=142, &{Children:[_c_934bdef9-90ff-42d3-a10a-4bd4f11d2913-latch-0000000005 _c_9640b680-f8fc-4e08-bf10-4465125f95c0-latch-0000000002 _c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 _c_d4d75d06-4b20-42ff-b537-876c499e5994-latch-0000000004] Stat:{Czxid:28 Mzxid:43 Ctime:1479504464966800 Mtime:1479504482688506 Version:2 Cversion:8 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:4 Pzxid:46}})
I1118 21:49:16.289585      42 server.go:110] zkreq={xid:4749 req:*zetcd.GetDataRequest:&{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false}}
I1118 21:49:16.289605      42 zklog.go:43] GetData(4749,{Path:/marathon/leader-curator/_c_b01520e1-6cad-48de-81de-23e873dc0fc8-latch-0000000000 Watch:false})
I1118 21:49:16.289953      42 zketcd.go:312] GetData(4749) = (zxid=143, resp={Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:16.290137      42 conn.go:133] conn.Send(xid=4749, zxid=142, &{Data:[109 97 114 97 116 104 111 110 109 97 115 116 101 114 45 109 97 115 116 101 114 45 51 58 56 48 56 48] Stat:{Czxid:29 Mzxid:29 Ctime:1479504464993953 Mtime:1479504464993953 Version:0 Cversion:0 Aversion:0 EphemeralOwner:7780065634413853912 DataLength:28 NumChildren:0 Pzxid:29}})
I1118 21:49:16.811115      42 server.go:110] zkreq={xid:4750 req:*zetcd.GetChildren2Request:&{Path:/marathon/leader-curator Watch:false}}
I1118 21:49:16.811164      42 zklog.go:78] GetChildren2(4750,{Path:/marathon/leader-curator Watch:false})
...

support multiple etcd endpoints

-endpoint only supports one endpoint right now

Follow semantic versioning

Per http://semver.org/, I suggest adopting MAJOR.MINOR.PATCH versioning for tagging releases.

Version numbers like 0.0.1 and 0.0.2 do not tell anything useful and easily parsable to users.

However 0.1.0 tells that it's a pre-stable release, and 0.2.0 is new features added to it, but still pre-release. 0.2.1 would mean it's a bugfix release etc.

use etcd v3.2 stm prefetching

Will save a round-trip in some cases.

etcd3 STM client does not Support Range fetch over keys?

I am trying to use STM client for my project. I need to:

1.) Query keys between particular range
2.) Query keys with specified prefix

But i wasn't able to find a function which provides this functionality in STM interface. There is Get(keys ...string) function but it does not have any OpOptions parameter.

Is there some particular reason that it has been left out and what is the best way to do this.

document examples for running kafka, etc

There are some procfiles that do this now, but they're not even very good for local development. Possible solution would be /hacks/ full of Dockerfile setups and some documentation around it.

Does Zetcd 0.0.4 compatible with Zookeeper 3.4.11

How do we figure out the compatibility matching?
Thanks.

print version and SHA on start

etcd-style boot information so it's clear what version is running

linting in CI

unused, go vet, etc

multiop

ZK's transactions. Haven't seen it used in the wild.

Keys not visible in etcd backend?

This may be me not understanding a feature, but I can't see a good reason why not - when testing zetcd against an etcd cluster, although zkctl lets me see the keys I've created in the Zookeeper side, I do not see any representation of those keys appearing as listable items in etcd, even though it seems like they should appear under /zk/*

Can not work with mesos.

I got this from /var/log/mesos/mesos-master.WARNING

W0628 11:42:02.760716 32456 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=55445ce9c488b379) expiration
W0628 11:42:02.760763 32453 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=55445ce9c488b37b) expiration
W0628 11:42:02.760797 32439 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=4ae35ce9bdcaeb28) expiration
W0628 11:42:02.761150 32441 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=4ae35ce9bdcaeb2a) expiration
W0628 11:43:43.088770 32445 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=13455ce9babd6d6a) expiration
W0628 11:43:43.088863 32454 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=13455ce9babd6d6b) expiration
W0628 11:43:43.089854 32452 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=d315ce9c1c4982b) expiration
W0628 11:43:43.090287 32450 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=d315ce9c1c4982d) expiration

Mesos cluster could init correctly, but master will change in 2 minutes, and repeat again and again.

With 5 etcd nodes proxy by 2 zetcd nodes.

Mesos version: 1.3.1

vendoring

Add vendoring so zetcd's not building against master all the time.

zetcd errors with etcd master branch

I have simple local etcd server + zetcd process and tried to send requests to zetcd but almost all requests error...

Here's how to reproduce

# build etcd
./install-zetcd.sh
./start-server-zetcd.sh
./test-zk.sh

https://github.com/gyuho/zookeeper-tests
https://github.com/gyuho/zookeeper-tests/blob/master/start-server-zetcd.sh
https://github.com/gyuho/zookeeper-tests/blob/master/test-zk.sh

Apache Drill does not work

$ zetcd -zkbridge 0.0.0.0:2181 -zkaddr 0.0.0.0:2182 -endpoint localhost:2379 -oracle zk -logtostderr -v 10

https://gist.github.com/polvi/02415f83d334ba5852b0faac29532302

This is from the drillbit.out log:

17:15:17.833 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (localhost:2182) and timeout (5000) / elapsed (29639)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) [curator-client-2.7.1.jar:na]
        at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) [curator-client-2.7.1.jar:na]
        at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.7.1.jar:na]
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:806) [curator-framework-2.7.1.jar:na]
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:792) [curator-framework-2.7.1.jar:na]
        at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:62) [curator-framework-2.7.1.jar:na]
        at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:257) [curator-framework-2.7.1.jar:na]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]

I just followed the instructions here: https://drill.apache.org/docs/starting-drill-in-distributed-mode/

sanitize inputs like zookeeper

Right now zetcd cleans up the key path before doing anything with it. Zookeeper will just reject it.

cf. https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkDataModel

3.5 opcodes

Create2
CreateContainer
DeleteContainer
CheckWatches
DeleteContainer
RemoveWatches
Reconfig

Removing a child after parent's directory has been removed

ZNode data in etcd is left while zk considers there's no more such ZNode. Does it work as intended?

$ ./go/bin/zkctl create /abc test
2017/05/23 22:09:02 Connected to 127.0.0.1:2181
2017/05/23 22:09:02 Authenticated: id=7587822333160445519, timeout=10001
$ ./go/bin/zkctl create /abc/xyz test
2017/05/23 22:09:27 Connected to 127.0.0.1:2181
2017/05/23 22:09:27 Authenticated: id=7587822333160445526, timeout=1000
$ ./go/bin/zkctl delete /abc
2017/05/23 22:09:42 Connected to 127.0.0.1:2181
2017/05/23 22:09:42 Authenticated: id=7587822333160445532, timeout=1000
$ ./go/bin/zkctl delete /abc/xyz
2017/05/23 22:09:43 Connected to 127.0.0.1:2181
2017/05/23 22:09:44 Authenticated: id=7587822333160445537, timeout=1000
zk: node does not exist
$ ./go/bin/etcdctl get --prefix /zk
/zk//abc/xyz
test
/zk//abc/xyz

/zk//

/zk//abc/xyz

/zk//abc/xyz
...

correctly account for ephemeral node expiration in parent znode stats

Spun off of #88.

zetcd uses the CVersion key's revision and version to compute the znode's Pzxid and CVersion respectively. When a child changes (e.g., creation, deletion), it touches the CVersion key to bump these values. Ephemeral key expiration uses etcd lease expiration, so it does not touch CVersion when it is deleted.

One possible solution involves extending etcd to associate a transaction with a lease (cf. etcd-io/etcd#8842). Ideally, each ephemeral key would have a lease transaction that would touch its parent's CVersion key. This is probably expecting too much since it is too invasive on the etcd side; the txn logic would have to permit multiple updates to a key in the same revision and likely require deep mvcc changes. Alternatively, new "deleted ephemeral" keys could be created in the lease txn to mark tombstones for each expired key; the tombstones would then be used for reconciling the fields. Tombstones avoid multi-updates, but would need STM extensions for ranges (a feature request made a few times in the past, but only possible in 3.3+).

An approach with reconciliation but without lease txns: maintain a per-znode list of ephemeral children (elist), a per-ephemeral node key with a matching ephemeral owner (ekey), and a global revision offset key:

When creating an ephemeral key, add name to elist and create ekey if key does not exist. Wait on reconciliation if already in the elist.
When computing Stat, fetch the elist and compare with the child keys to detect expiry and wait for reconciliation.
A reconciliation goroutine watches for ekey deletion events. For each set of deleted ekeys under the same znode, set CVersion's count to the count-1, its zxid to the deletion event zxid and the current revision offset version, remove the keys from the elist, and touch the revision offset key. Notify waiters.
The revision offset is subtracted from the current zxid to compensate for the extra revisions from reconciliation txns.
Record the current revision offset in the mtime and ctime keys for computing mzxid and czxid. Compute via etcdrev-offset.
Record a count and the current revision offset in CVersion.
Compute CVersion by adding the stored count value to the key version.
Compute PZxid by using the stored CVersion zxid if no changes since last expiry
Will need some way to handle losing the reconciliation watch due to compaction.

ruok support

Lots of tutorials use this; I assume it's baked into scripts too.

ZK MultiOp provided by zetcd not working with Kafka 2.1.0

I am trying to run zetcd(tried with both v0.0.5 and v0.0.4, getting same error) with Kafka 2.1.0. It's giving NullPointerException and Kafka

[2018-12-18 15:02:46,559] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,559] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,559] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,559] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,559] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,559] INFO Client environment:os.version=4.15.0-36-generic (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,559] INFO Client environment:user.name=arindam (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,559] INFO Client environment:user.home=/home/arindam (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,559] INFO Client environment:user.dir=/home/arindam/work/ikea/k/kafka_2.11-2.1.0 (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,561] INFO Initiating client connection, connectString=10.242.152.156:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@747ddf94 (org.apache.zookeeper.ZooKeeper)
[2018-12-18 15:02:46,576] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2018-12-18 15:02:46,578] INFO Opening socket connection to server slc06upx.us.oracle.com/10.242.152.156:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2018-12-18 15:02:46,935] INFO Socket connection established to slc06upx.us.oracle.com/10.242.152.156:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2018-12-18 15:02:47,241] WARN Connected to an old server; r-o mode will be unavailable (org.apache.zookeeper.ClientCnxnSocket)
[2018-12-18 15:02:47,242] INFO Session establishment complete on server slc06upx.us.oracle.com/10.242.152.156:2181, sessionid = 0x150e67c097898268, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2018-12-18 15:02:47,249] INFO [ZooKeeperClient] Connected. (kafka.zookeeper.ZooKeeperClient)
[2018-12-18 15:02:51,607] INFO Cluster ID = _354W2LrSw2cKpAOpbQy7g (kafka.server.KafkaServer)
[2018-12-18 15:02:51,997] INFO KafkaConfig values: 
	advertised.host.name = null
	advertised.listeners = null
	advertised.port = null
	alter.config.policy.class.name = null
	alter.log.dirs.replication.quota.window.num = 11
	alter.log.dirs.replication.quota.window.size.seconds = 1
	authorizer.class.name = 
	auto.create.topics.enable = true
	auto.leader.rebalance.enable = true
	background.threads = 10
	broker.id = 0
	broker.id.generation.enable = true
	broker.rack = null
	client.quota.callback.class = null
	compression.type = producer
	connection.failed.authentication.delay.ms = 100
	connections.max.idle.ms = 600000
	controlled.shutdown.enable = true
	controlled.shutdown.max.retries = 3
	controlled.shutdown.retry.backoff.ms = 5000
	controller.socket.timeout.ms = 30000
	create.topic.policy.class.name = null
	default.replication.factor = 1
	delegation.token.expiry.check.interval.ms = 3600000
	delegation.token.expiry.time.ms = 86400000
	delegation.token.master.key = null
	delegation.token.max.lifetime.ms = 604800000
	delete.records.purgatory.purge.interval.requests = 1
	delete.topic.enable = true
	fetch.purgatory.purge.interval.requests = 1000
	group.initial.rebalance.delay.ms = 0
	group.max.session.timeout.ms = 300000
	group.min.session.timeout.ms = 6000
	host.name = 
	inter.broker.listener.name = null
	inter.broker.protocol.version = 2.1-IV2
	kafka.metrics.polling.interval.secs = 10
	kafka.metrics.reporters = []
	leader.imbalance.check.interval.seconds = 300
	leader.imbalance.per.broker.percentage = 10
	listener.security.protocol.map = PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
	listeners = null
	log.cleaner.backoff.ms = 15000
	log.cleaner.dedupe.buffer.size = 134217728
	log.cleaner.delete.retention.ms = 86400000
	log.cleaner.enable = true
	log.cleaner.io.buffer.load.factor = 0.9
	log.cleaner.io.buffer.size = 524288
	log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
	log.cleaner.min.cleanable.ratio = 0.5
	log.cleaner.min.compaction.lag.ms = 0
	log.cleaner.threads = 1
	log.cleanup.policy = [delete]
	log.dir = /tmp/kafka-logs
	log.dirs = /tmp/kafka-logs
	log.flush.interval.messages = 9223372036854775807
	log.flush.interval.ms = null
	log.flush.offset.checkpoint.interval.ms = 60000
	log.flush.scheduler.interval.ms = 9223372036854775807
	log.flush.start.offset.checkpoint.interval.ms = 60000
	log.index.interval.bytes = 4096
	log.index.size.max.bytes = 10485760
	log.message.downconversion.enable = true
	log.message.format.version = 2.1-IV2
	log.message.timestamp.difference.max.ms = 9223372036854775807
	log.message.timestamp.type = CreateTime
	log.preallocate = false
	log.retention.bytes = -1
	log.retention.check.interval.ms = 300000
	log.retention.hours = 168
	log.retention.minutes = null
	log.retention.ms = null
	log.roll.hours = 168
	log.roll.jitter.hours = 0
	log.roll.jitter.ms = null
	log.roll.ms = null
	log.segment.bytes = 1073741824
	log.segment.delete.delay.ms = 60000
	max.connections.per.ip = 2147483647
	max.connections.per.ip.overrides = 
	max.incremental.fetch.session.cache.slots = 1000
	message.max.bytes = 1000012
	metric.reporters = []
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	min.insync.replicas = 1
	num.io.threads = 8
	num.network.threads = 3
	num.partitions = 1
	num.recovery.threads.per.data.dir = 1
	num.replica.alter.log.dirs.threads = null
	num.replica.fetchers = 1
	offset.metadata.max.bytes = 4096
	offsets.commit.required.acks = -1
	offsets.commit.timeout.ms = 5000
	offsets.load.buffer.size = 5242880
	offsets.retention.check.interval.ms = 600000
	offsets.retention.minutes = 10080
	offsets.topic.compression.codec = 0
	offsets.topic.num.partitions = 50
	offsets.topic.replication.factor = 1
	offsets.topic.segment.bytes = 104857600
	password.encoder.cipher.algorithm = AES/CBC/PKCS5Padding
	password.encoder.iterations = 4096
	password.encoder.key.length = 128
	password.encoder.keyfactory.algorithm = null
	password.encoder.old.secret = null
	password.encoder.secret = null
	port = 9092
	principal.builder.class = null
	producer.purgatory.purge.interval.requests = 1000
	queued.max.request.bytes = -1
	queued.max.requests = 500
	quota.consumer.default = 9223372036854775807
	quota.producer.default = 9223372036854775807
	quota.window.num = 11
	quota.window.size.seconds = 1
	replica.fetch.backoff.ms = 1000
	replica.fetch.max.bytes = 1048576
	replica.fetch.min.bytes = 1
	replica.fetch.response.max.bytes = 10485760
	replica.fetch.wait.max.ms = 500
	replica.high.watermark.checkpoint.interval.ms = 5000
	replica.lag.time.max.ms = 10000
	replica.socket.receive.buffer.bytes = 65536
	replica.socket.timeout.ms = 30000
	replication.quota.window.num = 11
	replication.quota.window.size.seconds = 1
	request.timeout.ms = 30000
	reserved.broker.max.id = 1000
	sasl.client.callback.handler.class = null
	sasl.enabled.mechanisms = [GSSAPI]
	sasl.jaas.config = null
	sasl.kerberos.kinit.cmd = /usr/bin/kinit
	sasl.kerberos.min.time.before.relogin = 60000
	sasl.kerberos.principal.to.local.rules = [DEFAULT]
	sasl.kerberos.service.name = null
	sasl.kerberos.ticket.renew.jitter = 0.05
	sasl.kerberos.ticket.renew.window.factor = 0.8
	sasl.login.callback.handler.class = null
	sasl.login.class = null
	sasl.login.refresh.buffer.seconds = 300
	sasl.login.refresh.min.period.seconds = 60
	sasl.login.refresh.window.factor = 0.8
	sasl.login.refresh.window.jitter = 0.05
	sasl.mechanism.inter.broker.protocol = GSSAPI
	sasl.server.callback.handler.class = null
	security.inter.broker.protocol = PLAINTEXT
	socket.receive.buffer.bytes = 102400
	socket.request.max.bytes = 104857600
	socket.send.buffer.bytes = 102400
	ssl.cipher.suites = []
	ssl.client.auth = none
	ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
	ssl.endpoint.identification.algorithm = https
	ssl.key.password = null
	ssl.keymanager.algorithm = SunX509
	ssl.keystore.location = null
	ssl.keystore.password = null
	ssl.keystore.type = JKS
	ssl.protocol = TLS
	ssl.provider = null
	ssl.secure.random.implementation = null
	ssl.trustmanager.algorithm = PKIX
	ssl.truststore.location = null
	ssl.truststore.password = null
	ssl.truststore.type = JKS
	transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
	transaction.max.timeout.ms = 900000
	transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
	transaction.state.log.load.buffer.size = 5242880
	transaction.state.log.min.isr = 1
	transaction.state.log.num.partitions = 50
	transaction.state.log.replication.factor = 1
	transaction.state.log.segment.bytes = 104857600
	transactional.id.expiration.ms = 604800000
	unclean.leader.election.enable = false
	zookeeper.connect = 10.242.152.156:2181
	zookeeper.connection.timeout.ms = 6000
	zookeeper.max.in.flight.requests = 10
	zookeeper.session.timeout.ms = 6000
	zookeeper.set.acl = false
	zookeeper.sync.time.ms = 2000
 (kafka.server.KafkaConfig)
[2018-12-18 15:02:52,269] INFO KafkaConfig values: 
	advertised.host.name = null
	advertised.listeners = null
	advertised.port = null
	alter.config.policy.class.name = null
	alter.log.dirs.replication.quota.window.num = 11
	alter.log.dirs.replication.quota.window.size.seconds = 1
	authorizer.class.name = 
	auto.create.topics.enable = true
	auto.leader.rebalance.enable = true
	background.threads = 10
	broker.id = 0
	broker.id.generation.enable = true
	broker.rack = null
	client.quota.callback.class = null
	compression.type = producer
	connection.failed.authentication.delay.ms = 100
	connections.max.idle.ms = 600000
	controlled.shutdown.enable = true
	controlled.shutdown.max.retries = 3
	controlled.shutdown.retry.backoff.ms = 5000
	controller.socket.timeout.ms = 30000
	create.topic.policy.class.name = null
	default.replication.factor = 1
	delegation.token.expiry.check.interval.ms = 3600000
	delegation.token.expiry.time.ms = 86400000
	delegation.token.master.key = null
	delegation.token.max.lifetime.ms = 604800000
	delete.records.purgatory.purge.interval.requests = 1
	delete.topic.enable = true
	fetch.purgatory.purge.interval.requests = 1000
	group.initial.rebalance.delay.ms = 0
	group.max.session.timeout.ms = 300000
	group.min.session.timeout.ms = 6000
	host.name = 
	inter.broker.listener.name = null
	inter.broker.protocol.version = 2.1-IV2
	kafka.metrics.polling.interval.secs = 10
	kafka.metrics.reporters = []
	leader.imbalance.check.interval.seconds = 300
	leader.imbalance.per.broker.percentage = 10
	listener.security.protocol.map = PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
	listeners = null
	log.cleaner.backoff.ms = 15000
	log.cleaner.dedupe.buffer.size = 134217728
	log.cleaner.delete.retention.ms = 86400000
	log.cleaner.enable = true
	log.cleaner.io.buffer.load.factor = 0.9
	log.cleaner.io.buffer.size = 524288
	log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
	log.cleaner.min.cleanable.ratio = 0.5
	log.cleaner.min.compaction.lag.ms = 0
	log.cleaner.threads = 1
	log.cleanup.policy = [delete]
	log.dir = /tmp/kafka-logs
	log.dirs = /tmp/kafka-logs
	log.flush.interval.messages = 9223372036854775807
	log.flush.interval.ms = null
	log.flush.offset.checkpoint.interval.ms = 60000
	log.flush.scheduler.interval.ms = 9223372036854775807
	log.flush.start.offset.checkpoint.interval.ms = 60000
	log.index.interval.bytes = 4096
	log.index.size.max.bytes = 10485760
	log.message.downconversion.enable = true
	log.message.format.version = 2.1-IV2
	log.message.timestamp.difference.max.ms = 9223372036854775807
	log.message.timestamp.type = CreateTime
	log.preallocate = false
	log.retention.bytes = -1
	log.retention.check.interval.ms = 300000
	log.retention.hours = 168
	log.retention.minutes = null
	log.retention.ms = null
	log.roll.hours = 168
	log.roll.jitter.hours = 0
	log.roll.jitter.ms = null
	log.roll.ms = null
	log.segment.bytes = 1073741824
	log.segment.delete.delay.ms = 60000
	max.connections.per.ip = 2147483647
	max.connections.per.ip.overrides = 
	max.incremental.fetch.session.cache.slots = 1000
	message.max.bytes = 1000012
	metric.reporters = []
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	min.insync.replicas = 1
	num.io.threads = 8
	num.network.threads = 3
	num.partitions = 1
	num.recovery.threads.per.data.dir = 1
	num.replica.alter.log.dirs.threads = null
	num.replica.fetchers = 1
	offset.metadata.max.bytes = 4096
	offsets.commit.required.acks = -1
	offsets.commit.timeout.ms = 5000
	offsets.load.buffer.size = 5242880
	offsets.retention.check.interval.ms = 600000
	offsets.retention.minutes = 10080
	offsets.topic.compression.codec = 0
	offsets.topic.num.partitions = 50
	offsets.topic.replication.factor = 1
	offsets.topic.segment.bytes = 104857600
	password.encoder.cipher.algorithm = AES/CBC/PKCS5Padding
	password.encoder.iterations = 4096
	password.encoder.key.length = 128
	password.encoder.keyfactory.algorithm = null
	password.encoder.old.secret = null
	password.encoder.secret = null
	port = 9092
	principal.builder.class = null
	producer.purgatory.purge.interval.requests = 1000
	queued.max.request.bytes = -1
	queued.max.requests = 500
	quota.consumer.default = 9223372036854775807
	quota.producer.default = 9223372036854775807
	quota.window.num = 11
	quota.window.size.seconds = 1
	replica.fetch.backoff.ms = 1000
	replica.fetch.max.bytes = 1048576
	replica.fetch.min.bytes = 1
	replica.fetch.response.max.bytes = 10485760
	replica.fetch.wait.max.ms = 500
	replica.high.watermark.checkpoint.interval.ms = 5000
	replica.lag.time.max.ms = 10000
	replica.socket.receive.buffer.bytes = 65536
	replica.socket.timeout.ms = 30000
	replication.quota.window.num = 11
	replication.quota.window.size.seconds = 1
	request.timeout.ms = 30000
	reserved.broker.max.id = 1000
	sasl.client.callback.handler.class = null
	sasl.enabled.mechanisms = [GSSAPI]
	sasl.jaas.config = null
	sasl.kerberos.kinit.cmd = /usr/bin/kinit
	sasl.kerberos.min.time.before.relogin = 60000
	sasl.kerberos.principal.to.local.rules = [DEFAULT]
	sasl.kerberos.service.name = null
	sasl.kerberos.ticket.renew.jitter = 0.05
	sasl.kerberos.ticket.renew.window.factor = 0.8
	sasl.login.callback.handler.class = null
	sasl.login.class = null
	sasl.login.refresh.buffer.seconds = 300
	sasl.login.refresh.min.period.seconds = 60
	sasl.login.refresh.window.factor = 0.8
	sasl.login.refresh.window.jitter = 0.05
	sasl.mechanism.inter.broker.protocol = GSSAPI
	sasl.server.callback.handler.class = null
	security.inter.broker.protocol = PLAINTEXT
	socket.receive.buffer.bytes = 102400
	socket.request.max.bytes = 104857600
	socket.send.buffer.bytes = 102400
	ssl.cipher.suites = []
	ssl.client.auth = none
	ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
	ssl.endpoint.identification.algorithm = https
	ssl.key.password = null
	ssl.keymanager.algorithm = SunX509
	ssl.keystore.location = null
	ssl.keystore.password = null
	ssl.keystore.type = JKS
	ssl.protocol = TLS
	ssl.provider = null
	ssl.secure.random.implementation = null
	ssl.trustmanager.algorithm = PKIX
	ssl.truststore.location = null
	ssl.truststore.password = null
	ssl.truststore.type = JKS
	transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
	transaction.max.timeout.ms = 900000
	transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
	transaction.state.log.load.buffer.size = 5242880
	transaction.state.log.min.isr = 1
	transaction.state.log.num.partitions = 50
	transaction.state.log.replication.factor = 1
	transaction.state.log.segment.bytes = 104857600
	transactional.id.expiration.ms = 604800000
	unclean.leader.election.enable = false
	zookeeper.connect = 10.242.152.156:2181
	zookeeper.connection.timeout.ms = 6000
	zookeeper.max.in.flight.requests = 10
	zookeeper.session.timeout.ms = 6000
	zookeeper.set.acl = false
	zookeeper.sync.time.ms = 2000
 (kafka.server.KafkaConfig)
[2018-12-18 15:02:52,306] INFO [ThrottledChannelReaper-Fetch]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2018-12-18 15:02:52,307] INFO [ThrottledChannelReaper-Produce]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2018-12-18 15:02:52,308] INFO [ThrottledChannelReaper-Request]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2018-12-18 15:02:52,693] INFO Loading logs. (kafka.log.LogManager)
[2018-12-18 15:02:52,699] INFO Logs loading complete in 5 ms. (kafka.log.LogManager)
[2018-12-18 15:02:52,715] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2018-12-18 15:02:52,717] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2018-12-18 15:02:53,057] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2018-12-18 15:02:53,105] INFO [SocketServer brokerId=0] Started 1 acceptor threads (kafka.network.SocketServer)
[2018-12-18 15:02:53,128] INFO [ExpirationReaper-0-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2018-12-18 15:02:53,129] INFO [ExpirationReaper-0-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2018-12-18 15:02:53,131] INFO [ExpirationReaper-0-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2018-12-18 15:02:53,153] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
[2018-12-18 15:02:53,542] INFO Creating /brokers/ids/0 (is it secure? false) (kafka.zk.KafkaZkClient)
[2018-12-18 15:02:53,888] INFO Result of znode creation at /brokers/ids/0 is: OK (kafka.zk.KafkaZkClient)
[2018-12-18 15:02:53,890] INFO Registered broker 0 at path /brokers/ids/0 with addresses: ArrayBuffer(EndPoint(arindam-laptop,9092,ListenerName(PLAINTEXT),PLAINTEXT)) (kafka.zk.KafkaZkClient)
[2018-12-18 15:02:53,943] INFO [ExpirationReaper-0-topic]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2018-12-18 15:02:53,944] INFO [ExpirationReaper-0-Heartbeat]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2018-12-18 15:02:53,946] INFO [ExpirationReaper-0-Rebalance]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2018-12-18 15:02:54,319] INFO [GroupCoordinator 0]: Starting up. (kafka.coordinator.group.GroupCoordinator)
[2018-12-18 15:02:54,320] INFO [GroupCoordinator 0]: Startup complete. (kafka.coordinator.group.GroupCoordinator)
[2018-12-18 15:02:54,321] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 15:02:54,857] INFO [ProducerId Manager 0]: Acquired new producerId block (brokerId:0,blockStartProducerId:1000,blockEndProducerId:1999) by writing to Zk with path version 2 (kafka.coordinator.transaction.ProducerIdManager)
[2018-12-18 15:02:55,153] INFO [TransactionCoordinator id=0] Starting up. (kafka.coordinator.transaction.TransactionCoordinator)
[2018-12-18 15:02:55,155] INFO [Transaction Marker Channel Manager 0]: Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager)
[2018-12-18 15:02:55,155] INFO [TransactionCoordinator id=0] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator)
[2018-12-18 15:02:55,195] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2018-12-18 15:02:56,662] INFO [SocketServer brokerId=0] Started processors for 1 acceptors (kafka.network.SocketServer)
[2018-12-18 15:02:56,666] INFO Kafka version : 2.1.0 (org.apache.kafka.common.utils.AppInfoParser)
[2018-12-18 15:02:56,666] INFO Kafka commitId : 809be928f1ae004e (org.apache.kafka.common.utils.AppInfoParser)
[2018-12-18 15:02:56,667] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
[2018-12-18 15:02:58,502] ERROR Caught unexpected throwable (org.apache.zookeeper.ClientCnxn)
java.lang.NullPointerException
	at kafka.zookeeper.ZooKeeperClient$$anon$10.processResult(ZooKeeperClient.scala:234)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)

After a topic creation when I am trying to describe the topic I am getting

	Topic: my-topic	Partition: 0	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 1	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 2	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 3	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 4	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 5	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 6	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 7	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 8	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 9	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 10	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 11	Leader: none	Replicas: 0	Isr: 
	Topic: my-topic	Partition: 12	Leader: none	Replicas: 0	Isr:

It's working fine with Kafka 2.0.1. The potential root cause of this issue is in Kafka 2.1.0 added MultiOp request and response in ZookeeperClient. Please refer https://issues.apache.org/jira/browse/KAFKA-6082.(Relevent commit - apache/kafka@297fb39)
I can see in PR - #57 initial multiop support added for zetcd. However, it's broken for Kafka 2.1.0

Does not seem to work on Windows 10

I downloaded etcd-v3.2.5-windows-amd64 from Releases and ran etcd.exe.

I then followed the instructions to build and run zetcd:

go get github.com/coreos/zetcd/cmd/zetcd
zetcd --zkaddr 0.0.0.0:2181 --endpoints localhost:2379

Output of zetcd:

C:\Users\jeffr\go\bin>zetcd --zkaddr 0.0.0.0:2181 --endpoints localhost:2379
Running zetcd proxy
Version: Version not provided (use make instead of go build)
SHA: SHA not provided (use make instead of go build)

I then ran the zkctl simple test:

go get github.com/coreos/zetcd/cmd/zkctl
zkctl watch / &
zkctl create /abc "foo"

The zkctl watch / command produced this output:

C:\Users\jeffr\go\bin>zkctl watch /
watch dir /
2017/08/10 14:50:16 Connected to 127.0.0.1:2181
2017/08/10 14:50:16 Authenticated: id=7587824084266938116, timeout=1000
[] &{Czxid:0 Mzxid:0 Ctime:0 Mtime:0 Version:0 Cversion:-1 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:1 Pzxid:0}

The zkctl create /abc "foo" command produced this output:

C:\Users\jeffr\go\bin>zkctl create /abc "foo"
2017/08/10 14:50:43 Connected to 127.0.0.1:2181
2017/08/10 14:50:43 Authenticated: id=7587824084266938119, timeout=1000

But then the zkctl watch / command seemed to have a problem and quit running (I'm assuming this is unexpected output?):

{Type:EventNodeChildrenChanged State:Unknown Path:/ Err:<nil> Server:}

I am completely new to both etcd and zetcd, so I apologize in advance for not knowing what I am doing. Ought this example work?

My go version is:

go version go1.8.3 windows/amd64

2017-08-10 14:50:16.678468 W | etcdserver: apply entries took too long [408.5913ms for 1 entries]
2017-08-10 14:50:16.680467 W | etcdserver: avoid queries with large range/delete range!
2017-08-10 14:50:43.719299 W | etcdserver: apply entries took too long [256.5433ms for 1 entries]
2017-08-10 14:50:43.719299 W | etcdserver: avoid queries with large range/delete range!
2017-08-10 14:50:45.520147 W | etcdserver: apply entries took too long [158.998ms for 1 entries]
2017-08-10 14:50:45.521148 W | etcdserver: avoid queries with large range/delete range!

Memory/leak issue

We're using zetcd (running in a container - from the quay.io/repository/coreos/zetcd tag v0.0.5) as a middleware between etcd and Mesos. We're experiencing a behavior where the memory usage of the zetcd seems to continue to climb and climb and climb gradually. It was overflowing a 4gig ram instance very quickly, so moved it to a host with 8 gigs, but the zetcd container still seems to continue to grow and grow in memory usage.

I'd be interested in helping solve this.. is there anything that I can provide to help expose the memory leak? Is there any automatic garbage collecting or anything like that that can be implemented? Are there any docker container launch parameters to contain its hunger for memory?

Key child prefixing naming makes debugging difficult

The current prefixing of a binary byte length to key names makes a lot of debugging activities with the CLI quite difficult. Since the value is constrained to 255, would it be terribly inefficient to switch to using a hex-prefix instead - i.e. /zk/keys/\0x02/somekey would become something like /zk/keys/02/somekey (readable from fuse mounts / etc.).

i.e. would a PR along these lines be considered?

jepsen

Kyle Kingsbury/Aphyr did Jepsen testing of zookeeper. That could probably be a good baseline for testing the zetcd functionality.

https://aphyr.com/posts/291-jepsen-zookeeper

https://github.com/jepsen-io/jepsen/blob/master/zookeeper/src/jepsen/zookeeper.clj

fix CI

CI's been kind of hand-wavy to avoid directly vendoring the full etcd code for integration testing. Now it's broken as of the 3.2 branch. Oops

New version/future plans?

I noticed that at least for a couple of month development of zetcd stopped.
Does it mean that zetcd is stable enough - so are there any plans for new version?
Or does it mean that zetcd will not be developed any more?
Thanks!

fd leak

Started to get these logs after running for awhile:

I0410 17:54:57.997000    3672 server.go:145] Accept()=accept tcp [::]:2182: accept: too many open files
I0410 17:54:58.062469    3672 server.go:145] Accept()=accept tcp [::]:2182: accept: too many open files
I0410 17:54:58.133796    3672 server.go:145] Accept()=accept tcp [::]:2182: accept: too many open files
I0410 17:54:58.235657    3672 server.go:145] Accept()=accept tcp [::]:2182: accept: too many open files
I0410 17:54:58.370626    3672 server.go:145] Accept()=accept tcp [::]:2182: accept: too many open files

four letter word commands

Support all of it (DoS stuff aside)

Document -oracle vs -bridge options?

Neither of these two options are really documented by their command line help nor anywhere else I can see at the moment. Is there a clearer explanation of what they do?

efficient send message buffering

The encoder expects a full-sized buffer, but the size of the message isn't known until encode time. Currently zetcd allocates a pool of maximum sized buffers and writes to those, somewhat mitigating the problem of repeatedly allocating huge buffers. However, there's still a sizeable memory cost.

Ideally the encoder will either allocate data as it needs it or copy into a larger, pooled buffer if the current buffer winds up being too small.

Bundle "zkctl" binary as well in the docker image of zetcd

I'm trying to use zetcd for the first time using Docker+Kubernetes.
I wanted to debug/try and see whether the zetcd proxy routes the requests to etcd servers.

However, the utility "zkctl" listed in the README.md of the project does not seem to be bundled in the docker image. It would be helpful if it is bundled in the docker image so that we can quickly verify whether the zetcd proxy is routing the requests to the server or not.

docker image

docker image for pulling zetcd

3rd party component testing with xchk

Right now all the xchk testing is done by hand. In the future it would be nice to have infrastructure to automatically xchk against popular zk projects as part of CI.

Ephemeral node deleted by connection loss prevents parent node deletion

This one's a bit weird, but basically if you create some node, and then create a sequential node within it, the node goes away on client connection loss, but the parent cannot be deleted because it thinks it isn't empty. Here's a sample (using the python kazoo library):

Basic setup:

Python 3.4.5 (default, Jan 26 2017, 00:57:26) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from kazoo.client import KazooClient
In [2]: c=KazooClient()
In [3]: c.start()
In [4]: c.get_children('/')
Out[4]: ['']

Okay, now we can create foo, create foo/bar as ephemeral, fail to delete foo as expected, delete foo/bar, and then deleting /foo works also as expected:

In [5]: c.create('/foo')
Out[5]: '/foo'
In [6]: c.create('/foo/bar', ephemeral=True)
Out[6]: '/foo/bar'
In [7]: c.get_children('/foo')
Out[7]: ['bar']
In [8]: c.delete('/foo')
---------------------------------------------------------------------------
NotEmptyError                             Traceback (most recent call last)
...
NotEmptyError: 

In [9]: c.delete('/foo/bar')
Out[9]: True

In [10]: c.delete('/foo')
Out[10]: True

In [11]:

Now for the weird; create /foo as before, create /foo/bar ephemeral as before, but disconnect and reconnect the client, then try to delete /foo (which appears to be empty):

In [11]: c.create('/foo')
Out[11]: '/foo'

In [12]: c.create('/foo/bar', ephemeral=True)
Out[12]: '/foo/bar'

In [13]: c.stop()

In [14]: c.close()

In [15]: c.start()

In [16]: c.get_children('/foo')
Out[16]: []

In [17]: c.delete('/foo')
---------------------------------------------------------------------------
NotEmptyError                             Traceback (most recent call last)
...
NotEmptyError:

Well, that was weird. /foo looks empty, but deleting it says it isn't empty. Can we fix things? Yes, re-creating /foo/bar and then explicitly deleting it lets us delete /foo again:

In [18]: c.create('/foo/bar')
Out[18]: '/foo/bar'

In [19]: c.delete('/foo/bar')
Out[19]: True

In [20]: c.delete('/foo')
Out[20]: True

So, that's weird.

can zetcd work with ZK java client?

can zetcd work with ZK java client API?
is zetcd ready to replace ZK? for example can I use zetcd in my Kafka or Spark cluster instead of ZK?

superblock

No way to track/reject format changes; an invitation for corruption. Read/write a superblock on server boot to configure and check version/feature information.

/zookeeper/quota

quota support
https://zookeeper.apache.org/doc/r3.1.2/zookeeperQuotas.html
http://people.apache.org/~larsgeorge/zookeeper-1215258/build/docs/dev-api/org/apache/zookeeper/Quotas.html#procZookeeper

Deleting kafka dir yields an exception

When using zkCli to delete kafka chroot dir, getting an error:

[zk: zetcd(CONNECTED) 1] rmr /kafka
Node not empty: /kafka/brokers/ids

[zk: zetcd(CONNECTED) 2] ls /kafka/brokers/ids 
[]

However, as you see the node is empty. When doing the same operation with native zookeeper things work as expected.

From zetcd logs:

I0706 06:15:54.018151       1 zketcd.go:439] GetChildren(10) = (zxid=285, resp={Children:[]})
I0706 06:15:54.018189       1 conn.go:139] conn.Send(xid=10, zxid=284, &{Children:[]})
I0706 06:15:54.019391       1 server.go:110] zkreq={xid:11 req:*zetcd.DeleteRequest:&{Path:/kafka/brokers/ids Version:-1}}
I0706 06:15:54.019424       1 zklog.go:33] Delete(11,{Path:/kafka/brokers/ids Version:-1})
I0706 06:15:54.026331       1 conn.go:139] conn.Send(xid=11, zxid=285, 0xc420200110)

Zk Client version:

Client environment:zookeeper.version=3.4.9-1757313, built on 08/23/2016 06:50 GMT

Kafka fails to start: Zookeeper namespace does not exist

I have a Kafka 0.11 and Zookeeper 3.4 setup running flawlessly in docker.

I want to replace Zookeeper with etcd and zetcd.

This is the error I get when I start kafka:

m9edd51-zetcd.m9edd51 (172.18.0.3:2181) open
[2017-07-11 01:32:09,769] INFO KafkaConfig values: 
	advertised.host.name = null
	advertised.listeners = null
	advertised.port = null
	alter.config.policy.class.name = null
	authorizer.class.name = 
	auto.create.topics.enable = true
	auto.leader.rebalance.enable = true
	background.threads = 10
	broker.id = -1
	broker.id.generation.enable = true
	broker.rack = null
	compression.type = producer
	connections.max.idle.ms = 600000
	controlled.shutdown.enable = true
	controlled.shutdown.max.retries = 3
	controlled.shutdown.retry.backoff.ms = 5000
	controller.socket.timeout.ms = 30000
	create.topic.policy.class.name = null
	default.replication.factor = 1
	delete.records.purgatory.purge.interval.requests = 1
	delete.topic.enable = true
	fetch.purgatory.purge.interval.requests = 1000
	group.initial.rebalance.delay.ms = 0
	group.max.session.timeout.ms = 300000
	group.min.session.timeout.ms = 6000
	host.name = 
	inter.broker.listener.name = null
	inter.broker.protocol.version = 0.11.0-IV2
	leader.imbalance.check.interval.seconds = 300
	leader.imbalance.per.broker.percentage = 10
	listener.security.protocol.map = SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,TRACE:TRACE,SASL_SSL:SASL_SSL,PLAINTEXT:PLAINTEXT
	listeners = null
	log.cleaner.backoff.ms = 15000
	log.cleaner.dedupe.buffer.size = 134217728
	log.cleaner.delete.retention.ms = 86400000
	log.cleaner.enable = true
	log.cleaner.io.buffer.load.factor = 0.9
	log.cleaner.io.buffer.size = 524288
	log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
	log.cleaner.min.cleanable.ratio = 0.5
	log.cleaner.min.compaction.lag.ms = 0
	log.cleaner.threads = 1
	log.cleanup.policy = [delete]
	log.dir = /tmp/kafka-logs
	log.dirs = /var/lib/kafka/data
	log.flush.interval.messages = 9223372036854775807
	log.flush.interval.ms = null
	log.flush.offset.checkpoint.interval.ms = 60000
	log.flush.scheduler.interval.ms = 9223372036854775807
	log.flush.start.offset.checkpoint.interval.ms = 60000
	log.index.interval.bytes = 4096
	log.index.size.max.bytes = 10485760
	log.message.format.version = 0.11.0-IV2
	log.message.timestamp.difference.max.ms = 9223372036854775807
	log.message.timestamp.type = CreateTime
	log.preallocate = false
	log.retention.bytes = -1
	log.retention.check.interval.ms = 300000
	log.retention.hours = 168
	log.retention.minutes = null
	log.retention.ms = null
	log.roll.hours = 168
	log.roll.jitter.hours = 0
	log.roll.jitter.ms = null
	log.roll.ms = null
	log.segment.bytes = 1073741824
	log.segment.delete.delay.ms = 60000
	max.connections.per.ip = 2147483647
	max.connections.per.ip.overrides = 
	message.max.bytes = 1000012
	metric.reporters = []
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	min.insync.replicas = 1
	num.io.threads = 8
	num.network.threads = 3
	num.partitions = 1
	num.recovery.threads.per.data.dir = 1
	num.replica.fetchers = 1
	offset.metadata.max.bytes = 4096
	offsets.commit.required.acks = -1
	offsets.commit.timeout.ms = 5000
	offsets.load.buffer.size = 5242880
	offsets.retention.check.interval.ms = 600000
	offsets.retention.minutes = 1440
	offsets.topic.compression.codec = 0
	offsets.topic.num.partitions = 50
	offsets.topic.replication.factor = 1
	offsets.topic.segment.bytes = 104857600
	port = 9092
	principal.builder.class = class org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
	producer.purgatory.purge.interval.requests = 1000
	queued.max.requests = 500
	quota.consumer.default = 9223372036854775807
	quota.producer.default = 9223372036854775807
	quota.window.num = 11
	quota.window.size.seconds = 1
	replica.fetch.backoff.ms = 1000
	replica.fetch.max.bytes = 1048576
	replica.fetch.min.bytes = 1
	replica.fetch.response.max.bytes = 10485760
	replica.fetch.wait.max.ms = 500
	replica.high.watermark.checkpoint.interval.ms = 5000
	replica.lag.time.max.ms = 10000
	replica.socket.receive.buffer.bytes = 65536
	replica.socket.timeout.ms = 30000
	replication.quota.window.num = 11
	replication.quota.window.size.seconds = 1
	request.timeout.ms = 30000
	reserved.broker.max.id = 1000
	sasl.enabled.mechanisms = [GSSAPI]
	sasl.kerberos.kinit.cmd = /usr/bin/kinit
	sasl.kerberos.min.time.before.relogin = 60000
	sasl.kerberos.principal.to.local.rules = [DEFAULT]
	sasl.kerberos.service.name = null
	sasl.kerberos.ticket.renew.jitter = 0.05
	sasl.kerberos.ticket.renew.window.factor = 0.8
	sasl.mechanism.inter.broker.protocol = GSSAPI
	security.inter.broker.protocol = PLAINTEXT
	socket.receive.buffer.bytes = 102400
	socket.request.max.bytes = 104857600
	socket.send.buffer.bytes = 102400
	ssl.cipher.suites = null
	ssl.client.auth = none
	ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
	ssl.endpoint.identification.algorithm = null
	ssl.key.password = null
	ssl.keymanager.algorithm = SunX509
	ssl.keystore.location = null
	ssl.keystore.password = null
	ssl.keystore.type = JKS
	ssl.protocol = TLS
	ssl.provider = null
	ssl.secure.random.implementation = null
	ssl.trustmanager.algorithm = PKIX
	ssl.truststore.location = null
	ssl.truststore.password = null
	ssl.truststore.type = JKS
	transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
	transaction.max.timeout.ms = 900000
	transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
	transaction.state.log.load.buffer.size = 5242880
	transaction.state.log.min.isr = 1
	transaction.state.log.num.partitions = 50
	transaction.state.log.replication.factor = 1
	transaction.state.log.segment.bytes = 104857600
	transactional.id.expiration.ms = 604800000
	unclean.leader.election.enable = false
	zookeeper.connect = m9edd51-zetcd.m9edd51:2181/kafka
	zookeeper.connection.timeout.ms = 6000
	zookeeper.session.timeout.ms = 6000
	zookeeper.set.acl = false
	zookeeper.sync.time.ms = 2000
 (kafka.server.KafkaConfig)
[2017-07-11 01:32:09,934] INFO starting (kafka.server.KafkaServer)
[2017-07-11 01:32:09,936] INFO Connecting to zookeeper on m9edd51-zetcd.m9edd51:2181/kafka (kafka.server.KafkaServer)
[2017-07-11 01:32:09,958] INFO Starting ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2017-07-11 01:32:09,971] INFO Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:host.name=m9edd51-kafka1.m9edd51 (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:java.version=1.8.0_131 (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:java.home=/usr/lib/jvm/java-1.8-openjdk/jre (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:java.class.path=:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b05.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.5.jar:/opt/kafka/bin/../libs/connect-api-0.11.0.0.jar:/opt/kafka/bin/../libs/connect-file-0.11.0.0.jar:/opt/kafka/bin/../libs/connect-json-0.11.0.0.jar:/opt/kafka/bin/../libs/connect-runtime-0.11.0.0.jar:/opt/kafka/bin/../libs/connect-transforms-0.11.0.0.jar:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b05.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b05.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b05.jar:/opt/kafka/bin/../libs/jackson-annotations-2.8.5.jar:/opt/kafka/bin/../libs/jackson-core-2.8.5.jar:/opt/kafka/bin/../libs/jackson-databind-2.8.5.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.8.5.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.8.5.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.8.5.jar:/opt/kafka/bin/../libs/javassist-3.21.0-GA.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b05.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.0.1.jar:/opt/kafka/bin/../libs/jersey-client-2.24.jar:/opt/kafka/bin/../libs/jersey-common-2.24.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.24.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.24.jar:/opt/kafka/bin/../libs/jersey-guava-2.24.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.24.jar:/opt/kafka/bin/../libs/jersey-server-2.24.jar:/opt/kafka/bin/../libs/jetty-continuation-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-http-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-io-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-security-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-server-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-servlet-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-servlets-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jetty-util-9.2.15.v20160210.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.3.jar:/opt/kafka/bin/../libs/kafka-clients-0.11.0.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-0.11.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-0.11.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples-0.11.0.0.jar:/opt/kafka/bin/../libs/kafka-tools-0.11.0.0.jar:/opt/kafka/bin/../libs/kafka_2.12-0.11.0.0-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-0.11.0.0-test-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-0.11.0.0.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-1.3.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.5.0.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.0.24.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.0.1.jar:/opt/kafka/bin/../libs/scala-library-2.12.2.jar:/opt/kafka/bin/../libs/scala-parser-combinators_2.12-1.0.4.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-1.1.2.6.jar:/opt/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-0.10.jar:/opt/kafka/bin/../libs/zookeeper-3.4.10.jar (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:java.library.path=/usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server:/usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64:/usr/lib/jvm/java-1.8-openjdk/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:os.version=4.10.0-26-generic (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:user.name=kafka (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:user.home=/opt/kafka (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,972] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:09,973] INFO Initiating client connection, connectString=m9edd51-zetcd.m9edd51:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@75329a49 (org.apache.zookeeper.ZooKeeper)
[2017-07-11 01:32:10,046] INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)
[2017-07-11 01:32:10,048] INFO Opening socket connection to server m9edd51-zetcd.m9edd51/172.18.0.3:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2017-07-11 01:32:10,105] INFO Socket connection established to m9edd51-zetcd.m9edd51/172.18.0.3:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2017-07-11 01:32:10,113] WARN Connected to an old server; r-o mode will be unavailable (org.apache.zookeeper.ClientCnxnSocket)
[2017-07-11 01:32:10,113] INFO Session establishment complete on server m9edd51-zetcd.m9edd51/172.18.0.3:2181, sessionid = 0x694d5d2f465b6c07, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2017-07-11 01:32:10,114] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
[2017-07-11 01:32:10,147] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.kafka.common.config.ConfigException: Zookeeper namespace does not exist
	at kafka.utils.ZkPath.checkNamespace(ZkUtils.scala:1019)
	at kafka.utils.ZkPath.createPersistent(ZkUtils.scala:1034)
	at kafka.utils.ZkUtils.makeSurePersistentPathExists(ZkUtils.scala:456)
	at kafka.server.KafkaServer.$anonfun$initZk$2(KafkaServer.scala:333)
	at kafka.server.KafkaServer.$anonfun$initZk$2$adapted(KafkaServer.scala:327)
	at scala.Option.foreach(Option.scala:257)
	at kafka.server.KafkaServer.initZk(KafkaServer.scala:327)
	at kafka.server.KafkaServer.startup(KafkaServer.scala:191)
	at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
	at kafka.Kafka$.main(Kafka.scala:65)
	at kafka.Kafka.main(Kafka.scala)
[2017-07-11 01:32:10,151] INFO shutting down (kafka.server.KafkaServer)
[2017-07-11 01:32:10,159] INFO shut down completed (kafka.server.KafkaServer)
[2017-07-11 01:32:10,159] FATAL Exiting Kafka. (kafka.server.KafkaServerStartable)
[2017-07-11 01:32:10,161] INFO shutting down (kafka.server.KafkaServer)

If I use zkui to browse the zetcd server, it gives this warning when connecting to zetcd: get(/) failed, err=node not exists.

I think this is because zetcd does not create the root node by default, resulting in kafka not being able to create its chroot node.

If I run zookeeper and access it using zkui, I do not get that error, and there is also a zookeeper child node.

xchk build tag to xchk integration tests

Abstract out newZKCluster so it creates a cross checking cluster with a fresh external zk server.

ZNode metadata in etcd: physical representation and documentation

In zetcd introduction, there is a nice illustration how ZNode metadata is stored in etcd. However, I can't see the same after these simple steps:

$ ./go/bin/zkctl watch / &
$ ./go/bin/zkctl create /abc "test"
$ ./go/bin/zkctl create /abc test
2017/05/23 22:36:17 Connected to 127.0.0.1:2181
2017/05/23 22:36:17 Authenticated: id=7587822333160445584, timeout=1000
$ export ETCDCTL_API=3
$ ./go/bin/etcdctl get --prefix /zk
/zk//abc
test
/zk//abc

/zk//

/zk//abc

/zk//abc
�ޗ�׌�
/zk//abc
�ޗ�׌�
/zk//abc
������-��ACL��PermsScheme
                         ID
                           ��>worldanyone
/zk//abc

/zk/e
1
$

A lot of «/zk//abc» are displayed instead of /zk/key/…, /zk/acl/…, etc I've been expecting (according to the illustration and its description). Am I doing something wrong or this documentation is somehow misleading?

against actual zookeeper-3.4.9/bin/zkServer.sh start: the basic lock test finishes quickly

~/go/src/github.com/glycerine/ezk/recipes (master) $ go test -v
2017/01/20 08:41:00 Connected to 127.0.0.1:2181
2017/01/20 08:41:00 Authenticated: id=97316385888796672, timeout=10000
2017/01/20 08:41:00 Re-submitting `0` credentials after reconnect
=== RUN   TestLock
--- PASS: TestLock (0.13s)
=== RUN   TestLockWithCleaner
--- PASS: TestLockWithCleaner (0.12s)
=== RUN   TestTryLock
--- PASS: TestTryLock (0.11s)
=== RUN   TestTryLockWithCleaner
--- PASS: TestTryLockWithCleaner (0.12s)
PASS
2017/01/20 08:41:01 Recv loop terminated: err=EOF
2017/01/20 08:41:01 Send loop terminated: err=<nil>
ok  	github.com/glycerine/ezk/recipes	0.618s
~/go/src/github.com/glycerine/ezk/recipes (master) $

against zetcd, the lock test just hangs forever:

~/go/src/github.com/glycerine/ezk/recipes (master) $ go test -v
2017/01/20 08:42:44 Connected to 127.0.0.1:2181

where zetcd was started as:

$ zetcd -zkaddr 0.0.0.0:2181 -endpoint localhost:2379
Running zetcd proxy

mesos xchk failure

mesos xchk failure via goreman -f scripts/Procfile.mesos.xchk start:

$ grep -i xchk zketcd.xchk 
I1116 22:23:19.634919   24947 conn.go:131] sendXchk Xid:1479363800 ZXid:2 Resp:0xc4201ac050
I1116 22:23:19.636225   24947 conn.go:131] sendXchk Xid:1479363801 ZXid:2 Resp:0xc4201993b0
I1116 22:23:19.654726   24947 conn.go:131] sendXchk Xid:1479363802 ZXid:3 Resp:&{Path:/mesos}
I1116 22:23:19.664422   24947 conn.go:131] sendXchk Xid:1479363803 ZXid:4 Resp:0xc4201ac050
I1116 22:23:19.667112   24947 conn.go:131] sendXchk Xid:1479363804 ZXid:4 Resp:&{Children:[]}
I1116 22:23:19.687624   24947 conn.go:112] xchkSendOOB response {Type:4 State:3 Path:/mesos}
W1116 22:23:19.687727   24947 zk.go:326] xchk failed (path mismatch)
I1116 22:23:19.687738   24947 conn.go:131] sendXchk Xid:1479363805 ZXid:5 Resp:&{Path:/mesos/json.info_0000000000}
I1116 22:23:19.693699   24947 conn.go:131] sendXchk Xid:1479363806 ZXid:5 Resp:&{Children:[json.info_0000000000]}
I1116 22:23:19.695883   24947 conn.go:131] sendXchk Xid:1479363807 ZXid:5 Resp:&{Children:[json.info_0000000000]}