Git Product home page Git Product logo

go-zookeeper's Introduction

Native Go Zookeeper Client Library

GoDoc Build Status Coverage Status

License

3-clause BSD. See LICENSE file.

This Repository is No Longer Maintained

Please use https://github.com/go-zookeeper/zk for an actively maintained fork.

go-zookeeper's People

Contributors

alxzh avatar davidreynolds avatar dmitshur avatar dougm avatar hdonnay avatar horkhe avatar jdef avatar jeffbean avatar jhump avatar mattrobenolt avatar mkaczanowski avatar nemith avatar nomis52 avatar noxiouz avatar nsd20463 avatar samuel avatar santosh653 avatar spenczar avatar tailhook avatar theatrus avatar theckman avatar vespian avatar yunxianghuang avatar zellyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-zookeeper's Issues

Slow zookeeper server can cause a crash in client library

We recently ran into an issue in which every single server in our go cluster crashed at about the same time due to the presence of a log.Fatal in this library. The general sequence of operations was:

  1. The go-zk library connected to a slow ZK server.
  2. Authentication with the server is slow (but works). The go-zk library sends a "set watch" request, and spawns a goroutine that calls log.Fatal if the "set watch" request fails.
  3. The go-zk library enters a receive loop goroutine, which encounters a read timeout (since it sets a read deadline on its read). This read timeout propagates to the "set watch" response from (2) and causes the entire process to die due to the call to log.Fatal().

I'm not sure what the correct behavior here is, but it seems like this library shouldn't be calling log.Fatal just because a read was slow. Ideally the library would just back off and retry again on the new connection sometime later.

For more details, here's an analysis of the log lines that we saw in stderr on the affected servers:

2014/11/07 06:51:45 Failed to connect to zoo09:2181: dial tcp x.x.x.x:2181: i/o timeout
2014/11/07 06:51:55 read tcp y.y.y.y:2181: i/o timeout
2014/11/07 06:51:55 zk: connection closed

This first log line ("Failed to connect...") corresponds to a call to c.connect() in the (*Conn).loop() method:

func (c *Conn) connect() {
    // ...
    for {
        zkConn, err := c.dialer("tcp", c.servers[c.serverIndex], c.connectTimeout)
        if err == nil {
            c.conn = zkConn
            c.setState(StateConnected)
            return
        }

        log.Printf("Failed to connect to %s: %+v", c.servers[c.serverIndex], err)
        // ...
    }
}

So it looks like the library tried to connect to zoo09, and it failed printing the "Failed to connect" error. Then we slept for a while, and successfully connected to another zk server (which happens to be slow for whatever reason), exiting this loop.

The second log line corresponds to a read error in recvLoop():

func (c *Conn) recvLoop(conn net.Conn) error {
    buf := make([]byte, bufferSize)
    for {
        // package length
        conn.SetReadDeadline(time.Now().Add(c.recvTimeout))
        _, err := io.ReadFull(conn, buf[:4])
        if err != nil {
            return err
        }

When loop() receives this error, it prints it out, then returns the same error to any outstanding requests:

        // Yeesh
        if err != io.EOF && err != ErrSessionExpired && !strings.Contains(err.Error(), "use of closed network connection") {
            log.Println(err)
        }

        select {
        case <-c.shouldQuit:
            c.flushRequests(ErrClosing)
            return
        default:
        }

The problem is, there is an outstanding request being sent concurrently in sendSetWatches(), which is called by authenticate() upon successfully connecting to a server. (Note that the read that occurs in authenticate() does not time out like the later read in recvLoop() because go-zk does not set a read deadline for the read that occurs in authenticate().) Here is the relevant snippet of code:

func (c *Conn) sendSetWatches() {
    // ...
    go func() {
        res := &setWatchesResponse{}
        _, err := c.request(opSetWatches, req, res, nil)
        if err != nil {
            log.Fatal(err)
        }
    }()
}

Since all outstanding requests returned a timeout error, the log.Fatal is triggered, which caused the server to die.

Clean shutdown

I'm investigating clean up of all watcher go-routines on closing the zookeeper connection.

This following snippets shows that the ChildrenW loop is still running after c.Close().

I'm wondering if it would be a good idea to have conn.Close shutdown all running watch loops, and close their channels? zookeeper sessions can survive between connections, so maybe the current behavior is desired?

package main

import (
    "os"
    "os/signal"
    "syscall"
    "fmt"
    "time"
    "github.com/samuel/go-zookeeper"
)

func main() {
    c, _, _ := zk.Connect([]string{"localhost:2181"}, time.Second)
    defer c.Close()
    defer func() {fmt.Println("bye.")}()
    c.ChildrenW("/")

    exit := make(chan os.Signal, 1)
    defer signal.Stop(exit)
    signal.Notify(exit, syscall.SIGINT, syscall.SIGTERM)
    <-exit

    panic("")
}

Add TreeCache?

As part of implementing service discovery of Twitter Serversets in Prometheus, I did a basic implementation of a TreeCache (http://curator.apache.org/curator-recipes/tree-cache.html) in Go.

As this is generally useful and not tied to Prometheus, I think it'd be a good idea to make it more generally available as part of this repository under some directory in this repository.

The code is at https://github.com/prometheus/prometheus/blob/master/retrieval/discovery/serverset.go#L194-372

Would you be willing in principle to accept this code into this repository?

Also, Are there any stubs for the zk client to aid unittesting of something like this? I couldn't find anything from a quick look.

Conn.SetLogger() race condition

I'd like to use Conn.SetLogger(Logger) to change the behavior of the default logger. go test -race reports a race with the call to this function and calls to c.logger.Printf(...) in conn.go (e.g., in the connect(...) function. I'm using SetLogger(...) as follows:

zkConn, zkConnEventChl, err := zk.Connect(zkServerList, timeoutMs)
nullLogger := nullZKLogger{}
zkConn.SetLogger(zk.Logger(nullLogger))

Am I using this capability incorrectly or is it a bug?

Possible race condition in basic usage.

When I run the basic.go example with Go's race detector, I get the following output:

Desktop $ go run -race basic.go 
==================
WARNING: DATA RACE
Write by goroutine 4:
  github.com/samuel/go-zookeeper/zk.(*Conn).authenticate()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:359 +0x9a8
  github.com/samuel/go-zookeeper/zk.(*Conn).loop()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:178 +0xab
  github.com/samuel/go-zookeeper/zk.func·001()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:126 +0x36
  gosched0()
      /usr/local/go/src/pkg/runtime/proc.c:1218 +0x9f

Previous read by goroutine 1:
  sync/atomic.AddUint32()
      /usr/local/go/src/pkg/sync/atomic/race.go:99 +0x4b
  sync/atomic.AddInt32()
      /usr/local/go/src/pkg/sync/atomic/race.go:92 +0x3a
  github.com/samuel/go-zookeeper/zk.(*Conn).nextXid()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:530 +0x3f
  github.com/samuel/go-zookeeper/zk.(*Conn).queueRequest()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:545 +0x37
  github.com/samuel/go-zookeeper/zk.(*Conn).request()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:557 +0x83
  github.com/samuel/go-zookeeper/zk.(*Conn).ChildrenW()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:579 +0x325
  main.main()
      /Users/Dmitri/Desktop/basic.go:15 +0x125
  runtime.main()
      /usr/local/go/src/pkg/runtime/proc.c:182 +0x91

Goroutine 4 (running) created at:
  github.com/samuel/go-zookeeper/zk.Connect()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:129 +0x5b2
  main.main()
      /Users/Dmitri/Desktop/basic.go:11 +0xbb
  runtime.main()
      /usr/local/go/src/pkg/runtime/proc.c:182 +0x91

Goroutine 1 (running) created at:
  _rt0_amd64()
      /usr/local/go/src/pkg/runtime/asm_amd64.s:87 +0x106

==================
==================
WARNING: DATA RACE
Read by goroutine 1:
  github.com/samuel/go-zookeeper/zk.(*Conn).ChildrenW()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:583 +0x3bb
  main.main()
      /Users/Dmitri/Desktop/basic.go:15 +0x125
  runtime.main()
      /usr/local/go/src/pkg/runtime/proc.c:182 +0x91

Previous write by goroutine 7:
  github.com/samuel/go-zookeeper/zk.func·005()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:577 +0x9f
  github.com/samuel/go-zookeeper/zk.(*Conn).recvLoop()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:518 +0x12d3
  github.com/samuel/go-zookeeper/zk.func·003()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:194 +0x99
  gosched0()
      /usr/local/go/src/pkg/runtime/proc.c:1218 +0x9f

Goroutine 1 (running) created at:
  _rt0_amd64()
      /usr/local/go/src/pkg/runtime/asm_amd64.s:87 +0x106

Goroutine 7 (running) created at:
  github.com/samuel/go-zookeeper/zk.(*Conn).loop()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:200 +0x926
  github.com/samuel/go-zookeeper/zk.func·001()
      /Users/Dmitri/Local/GoTrLand/src/github.com/samuel/go-zookeeper/zk/conn.go:126 +0x36
  gosched0()
      /usr/local/go/src/pkg/runtime/proc.c:1218 +0x9f

==================
[folder new_folder] &{Czxid:0 Mzxid:0 Ctime:0 Mtime:0 Version:0 Cversion:123 Aversion:0 EphemeralOwner:0 DataLength:0 NumChildren:2 Pzxid:1234}

Any thoughts?

Let outer know state changes immediately

I know that we can use State() function to check current connection state, but can not know the state changes immediately.

In setState, there is a eventChan but it can not used outer, only in test I know. Could you let it open? Maybe below:

func (c *Conn)StateCh() <-chan Event {
    return c.eventChan
}

Why I need this, if SessionExpired and reconnect ok, we can not know it any way. So now we could only re-create a new connection when expired.

removing/disabling logging

I wanted to open an issue for this issue to get feedback before going further and starting to modify the code in preparation for a PR.

When using the library, it log failures and the like by default:

2015/06/04 21:07:54 Failed to connect to 127.0.0.1:2181: dial tcp 127.0.0.1:2181: connection refused

Without setting the writer for all logging to something that isn't stdout, there doesn't appear to be a way to disable this logging. Is there a reason logging was enabled in the library by default?

Multi GetDataRequest

Is it possible to implement Mult with GetDataRequest construct? I've see the Op but not sure that it's valid or not?

Delete() doesn't work if Auth doesn't contain the ACL of the child path

Steps to reproduce:

  1. Given a directory with world anyone ACL, create a child with one or more ACLs;
  2. Try to delete the child path without any Auth set.

Expected result: Child should be deleted as the parent has world anyone ACL, and the permission for deletion is managed by the parent, not the child, as stated in docs: "DELETE: you can delete a child node"
Actual result: I get this error: zk: not authenticated

bug in function handler

encountered a repeated failure traced down into conn.go, function handler, in the receive loop where the the log message Response for unknown request with xid # is sent.

existing code is

} else {
            c.requestsLock.Lock()
            req, ok := c.requests[res.Xid]
            if ok {
                delete(c.requests, res.Xid)
            }
            c.requestsLock.Unlock()

            if !ok {
>>              log.Printf("Response for unknown request with xid %d", res.Xid)
            } else {
                _, err := decodePacket(buf[:blen], req.recvStruct)
                req.recvChan <- err
            }
        }

changed the

log.Printf("Response for unknown request with xid %d", res.Xid)

to be

req.recvChan <- errors.New("UNKWNREQ: Response for unknown request with xid " + string(res.Xid))

basically the channel needs a reply, otherwise we just hang

the error message I'm sending back is something you probably want to align with your error handling, but I did need to see this specific error so I could attempt recovery in my client code, in my case I just retry if I get this error

Bill

How can I watch a node at all time

I got a problem with watch a node. Since I call the "ChildrenW" function to get children list with the parent node, but I can get the change notification only at once. If I want to watch a node at all time, How can I do?

EventSession vs EventNotWatching

What should I look for on an event when I get disconnected from Zookeeper?

I see an EventNotWatching when I get disconnected but I see StateDisconnected for EventSession as well.

client will receive session expires error, when zk leader shut down

if client connect to zk leader, then zk leader shutdown.
client will retry another zk server, connect, but then io.ReadFull will get EOF.

in zk log, find
-07-29 15:28:09,846 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2283:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:35420
2015-07-29 15:28:09,848 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2283:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2015-07-29 15:28:09,849 [myid:3] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2283:NIOServerCnxn@366] - IOException stack trace
java.io.IOException: ZooKeeperServer not running
at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:931)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:237)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2015-07-29 15:28:09,850 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2283:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:35420 (no session established for client)

zk follower shutdown , because leader down.
then in go-zookeeper client, reconnect get EOF error.

should wait a little time before reconnect ?

Warning about 'old client' missing 'readOnly' flag

We are using the client to connect to a 3.4.6 zookeeper server. On connect, we get these warnings in the zk server log:

- Connection request from old client /127.0.0.1:51335; will be dropped if server is in r-o mode

Looking at the source code in ZooKeeperServer.java, there's this:

        boolean readOnly = false;
        try {
            readOnly = bia.readBool("readOnly");
            cnxn.isOldClient = false;
        } catch (IOException e) {
            // this is ok -- just a packet from an old client which
            // doesn't contain readOnly field
            LOG.warn("Connection request from old client "
                    + cnxn.getRemoteSocketAddress()
                    + "; will be dropped if server is in r-o mode");
        }

It appears this is ok, but the client should probably be updated to look like a current client.

sorry

I find my mistake

Application crashes on zk.Connect() if servers slice is empty

It should return an err, and not throwing some exceptions that crash the app. I manage to do a simple workaround, by making my own validation on servers list before calling zk.Connect(). Probably a small validation inside the ConnectWithDialer() would be nicer.

needed a way to get data out of a node

so I added the following function to conn.go, maybe you want to add this as well ? Its a mod of your Children function

func (c *Conn) Data(path string) (data []byte, stat *Stat, err error) {
    xid := c.nextXid()
    ch := make(chan error)
    rs := &getDataResponse{}
    req := &request{
        xid: xid,
        pkt: &getDataRequest{
            requestHeader: requestHeader{
                Xid:    xid,
                Opcode: opGetData,
            },
            Path:  path,
            Watch: false,
        },
        recvStruct: rs,
        recvChan:   ch,
    }
    c.sendChan <- req
    err = <- ch
    data = rs.Data
    stat = &rs.Stat
    return
}

Change Conn.Multi() to take a slice argument vs. varargs?

(c *Conn) Multi(ops ...interface{}) ([]MultiResponse, error) is a variadic function. Is there a reason it was done this way vs. having it take a []interface{} instead? My reason for asking is that I'd like to pass 50 or so ops into the function and this would make for a quite long and awkward function call.

zookeeper client session timeout not respected

it looks like client session timeout value encoded and submitted in the conn authorize method isn't being parsed correctly by zookeeper 3.3.x and 3.4.x which results in the session timeout being set to the default (2 x tick interval, which on default installations is only 40 seconds). when the connection is established, zk returns a timeout value which then overrides the user specified value for future connections as well. is this the expected behavior?

need a way to set data on a node

so add the following function in conn.go based off of the set and get functions

func (c *Conn) Set(path string, data []byte) (stat *Stat, err error) {
    xid := c.nextXid()
    ch := make(chan error)
    rs := &setDataResponse{}
    req := &request{
        xid: xid,
        pkt: &setDataRequest{
            requestHeader: requestHeader{
                Xid:    xid,
                Opcode: opSetData,
            },
            Path:    path,
            Data:    data,
            Version: -1,
        },
        recvStruct: rs,
        recvChan:   ch,
    }
    c.sendChan <- req
    err = <-ch
    stat = &rs.Stat
    return
}

Children method fails on receiving big list of children

In file github.com/samuel/go-zookeeper/zk/conn.go on line 545 we grow buffer size to blen if blen is bigger than current buffer. Then in several places we decode stuff like:
_, err := decodePacket(buf[16:16+blen], res)
which is obviously panics - because we get past slice boundary.

Fix is to change line 545 to look like:
buf = make([]byte, blen+16)

Can't create a pull request right now. But would appreciate if somebody could fix this in mainstream.

Failed ZK connection retry strategy

I'm testing connection failure scenarios between ZK (3.4.6) and a client written with go-zookeeper. When I kill the ZK server go-zookeeper retries the connection to ZK, but the retry loop seems to have no limit to the number of retries.

2015/10/02 11:42:16 Failed to connect to ...: dial tcp ...: connection refused

Is there anyway to influence how many times or how long go-zookeeper will continue to retry connecting to ZK? What will go-zookeeper return to the client if it exceeds the retry limit?

Thanks!

When you set on an empty path, the zookeeper will crash

When you try setting data on an empty path, it will crash and burn your zookeeper server. And by this I mean data corruption, you need to empty your data directory and then restart it, otherwise nobody will be able to connect to it any more and will get a connection refused.

It would be nice to have an error if the path is empty, rather than letting the set happen and watching your zookeeper go down in style.

Can't set connectTimeout

Hi,

In zk/conn.go:145: connectTimeout: 1 * time.Second,
in very pool network condition or when DNS responses very slow, connection will fail.

I recommend to use recvTimeout as connectTimeout instead of Hard-code 1 * time.Second.

Multi

I have just started using Multi call and have run into two issues.

Issues:

  1. When a Multi call is preformed with zookeeper, it seems I do not get a response.
    -No sure what is causing this I will dig a little deeper and see what comes up.
  2. Because I never get a response, I never return from the call.
    -Their should be a time out on the receive channel, do you want me to add it?

EventNodeDeleted is missing

According to zookeeper document, if ChildrenWatcher is set on a node, when this node deleted, EventNodeDeleted should be received on the watcher. But this event is only sent to Conn.eventChan.

example code:

package main

import (
        "fmt"
        "github.com/samuel/go-zookeeper/zk"
        "time"
)

func main() {
        c, zkChan, err := zk.Connect([]string{"127.0.0.1"}, time.Second) //*10)
        if err != nil {
                panic(err)
        }
        children, stat, wCh, err := c.ChildrenW("/foo")
        if err != nil {
                panic(err)
        }
        fmt.Printf("%+v %+v\n", children, stat)
        for {
                select {
                case e := <-wCh:
                        fmt.Printf("get event from watcher: %+v\n", e)
                case e := <-zkChan:
                        fmt.Printf("get event from session: %+v\n", e)
                }
        }
}

output:

go run watcher.go 
[] &{Czxid:25769803778 Mzxid:25769803778 Ctime:1394091427992 Mtime:1394091427992 Version:0 Cversion:0 Aversion:0 EphemeralOwner:0 DataLength:2 NumChildren:0 Pzxid:25769803778}
get event from session: {Type:EventSession State:StateConnecting Path: Err:<nil>}
get event from session: {Type:EventSession State:StateConnected Path: Err:<nil>}
get event from session: {Type:EventSession State:StateHasSession Path: Err:<nil>}
get event from session: {Type:EventNodeDeleted State:StateSyncConnected Path:/foo Err:<nil>}
^Cexit status 2

seems a minor modification is needed on conn.go:

@@ -497,7 +497,7 @@ func (c *Conn) recvLoop(conn net.Conn) error {
            case EventNodeCreated:
                wTypes = append(wTypes, watchTypeExist)
            case EventNodeDeleted, EventNodeDataChanged:
 -              wTypes = append(wTypes, watchTypeExist, watchTypeData)
 +              wTypes = append(wTypes, watchTypeExist, watchTypeData, watchTypeChild)
            case EventNodeChildrenChanged:
                wTypes = append(wTypes, watchTypeChild)
            }

Lock fails when sibling nodes have dashes in the names

If you have sibling nodes with dashes in their names, the code for "Lock" fails when it tries to parse a non-existent integer for the name.

Looking at the code, it also seems like it will fail if there are any siblings at all.

Do locks have to be done on a completely separate directory to ensure it works correctly?

Possible goroutine leak

After a watch has been seen it's send down the Event chan, and then deleted from the watchers list here https://github.com/samuel/go-zookeeper/blob/master/zk/conn.go#L496 then when watchers are invalidated, the map is recreated here https://github.com/samuel/go-zookeeper/blob/master/zk/conn.go#L258

In both these cases the Event channels are basically lost, all references are dropped from go-zookeeper, but ultimately this could lead to leaking of goroutines on the calling side, or indefinite blocking as there is never an opportunity for a receive on this channel.

Ideally we'd want to close the channel before we loose our references to the channel so that we essentially notify the caller there will be no subsequent sends on the channel, and can be trapped via ev, _ok := <- eventChan so that the caller can continue on.

Thoughts?

Chroot support?

A feature supported by some other Zookeeper clients, including the canonical
JVM one, is chroot support. This allows you to create (or move) a connection
that's relative to some fixed path prefix, for example "/apps/myapp", Chrooting
is discussed in the Zookeeper admin guide here:

https://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#ch_zkSessions

Here's the chroot interface for the canonical Java client:

https://zookeeper.apache.org/doc/r3.3.3/api/org/apache/zookeeper/ZooKeeper.html#ZooKeeper(java.lang.String, int, org.apache.zookeeper.Watcher)

What do you think about chroot support in this client?

It would be tricky to support something like this against the current interface
of the Go client because we expect a []string of server addrs, instead of the
single connection string expected by the Java client. But perhaps we could
figure something out. An alternative would be a Chroot(path string) *Conn
method on *Conn that returned a new connection chrooted to the given
path.

If you're interested in supporting chrooting, I'd consider trying to write a
patch for it.

(BTW, thanks for writing and sharing this code!)

frequently receive a StateDisconnected

log:

comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:21.827998 33139 zk.go:50] zookeeper get a event: StateSyncConnected
comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:22.047343 33139 zk.go:50] zookeeper get a event: StateDisconnected
comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:22.295198 33139 zk.go:50] zookeeper get a event: StateConnecting
comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:22.295206 33139 zk.go:50] zookeeper get a event: StateConnected
comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:22.369996 33139 zk.go:50] zookeeper get a event: StateExpired
comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:22.370007 33139 zk.go:50] zookeeper get a event: StateDisconnected
comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:22.386103 33139 zk.go:50] zookeeper get a event: StateConnecting
comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:22.386112 33139 zk.go:50] zookeeper get a event: StateConnected
comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:22.742684 33139 zk.go:50] zookeeper get a event: StateHasSession
comet103.yw-0-0.root.log.INFO.20140528-035809.33139:I0528 06:03:22.818002 33139 zk.go:50] zookeeper get a event: StateDisconnected

seems go-zookeeper retry connecting, but session expired, then receive a StateHasSession, what does this mean?

flipped string, substring in func Connect

In conn.go

func Connect(servers []string, recvTimeout time.Duration) (*Conn, <-chan Event, error) {

the Contains parameters should be (string, substring)

so

if !strings.Contains(":", addr) {

should be

if !strings.Contains(addr, ":") {

Operation type limitation in the Multi.

First thanks for you fixes on the Multi, now I am using the new Multi and it works well. I have a little question though as I noticed the Multi only accepts four kinds of operations:

  • CreateRequest
  • SetDataRequest
  • DeleteRequest
  • CheckVersionRequest

Are there any special reason for this limitation?

I think in some cases, It would be nice to get a znode's stat change within a transaction. So if the Multi could accepts operations like GetData, Exists this could be easily done.

Resetting watches silently fails

When applications depend on watchers to do work (currently trying to debug wvanbergen/kafka#76), if the connection to zookeeper is interrupted (and reconnects), and subsequently resetting the watchers fails (which it does silently - as it is not possible to catch the error) - then goroutines wait on the ChildrenW() channel and don't make any meaningful progress.

I'm not sure what the official/Apache way of dealing with this, but if the connection is interrupted then either 1.) all watchers should be closed or 2.) if you need to keep the reset watchers for backward compatibility there should be a method of catching this error when it fails so the application can deal with that failure.

is it a problem that zk.Multi is imposing an order to the operations?

I notice that https://github.com/samuel/go-zookeeper/blob/master/zk/conn.go#L749 appears to be imposing a distinct order on the operations, first create, then setData, then Delete, then Check. Doesn't that cause a problem if someone needs to perform multiple operations in a different order?

E.g., if someone wants to Delete and then Create a given node (say because it is owned by a previous session which will time out shortly), in that order?

added defer funcs to 'capture' panics

in structs.go I added the following to the top of each function decodePacket, decodePacketValue, encodePacket and encodePacketValue so that they would pass back an error that my app could use and continue running instead of aborting via the panic.

defer func() {
        if r := recover(); r != nil {
            e = errors.New("decodePacket: " + fmt.Sprintf("%v", r))
        }
    }()

the string in the errors.New call is the function name

also 'named' the variables being returned, ie (i int, e error)

func decodePacket(buf []byte, st interface{}) (i int, e error) {
func decodePacketValue(buf []byte, v reflect.Value) (i int, e error) {
func encodePacket(buf []byte, st interface{}) (i int, e error) {
func encodePacketValue(buf []byte, v reflect.Value) (i int, e error) {

hopefully its clear what I did

frequently receive "connection refused" logs

if i set the wrong zk host or port. I'll receive logs like below frequently:

2014/07/25 19:02:19 Failed to connect to :2181: dial tcp :2181: connection refused
2014/07/25 19:02:20 Failed to connect to :2181: dial tcp :2181: connection refused
....

I don't know if i miss something or it is an issue.

if i miss something please tell me what should i do, very thanks!

session always expired....

2014/03/28 17:17:53 [INFO] zk.go:181 zk path: "/gopush-cluster" receive a event {EventNotWatching StateDisconnected /gopush-cluster zk: session has been expired by the server}
2014/03/28 17:18:25 [INFO] zk.go:181 zk path: "/gopush-cluster" receive a event {EventNotWatching StateDisconnected /gopush-cluster zk: session has been expired by the server}

need a way to create a node

so I added the following function to conn.go

func (c *Conn) Create(path string, data []byte) (rpath *string, err error) {
    xid := c.nextXid()
    ch := make(chan error)
    rs := &createResponse{}
    req := &request{
        xid: xid,
        pkt: &createRequest{
            requestHeader: requestHeader{
                Xid:    xid,
                Opcode: opCreate,
            },
            Path:  path,
            Data:  data,
            Acl:   []acl{acl{0x1f, id{"world", 0}}},
            Flags: 0,
        },
        recvStruct: rs,
        recvChan:   ch,
    }
    c.sendChan <- req
    err = <-ch
    rpath = &rs.Path
    return
}

changed encodePacketValue in case reflect.Slice: to

case reflect.Slice:
        switch v.Type().Elem().Kind() {
        default:
            // count := int(binary.BigEndian.Uint32(buf[n : n+4]))
            count := v.Len()
            n += 4
            for i := 0; i < count; i++ {
                // n2, err := decodePacketValue(buf[n:], v.Index(i))
                n2, err := encodePacketValue(buf[n:], v.Index(i))
                n += n2
                if err != nil {
                    return n, err
                }
            }
            // add the size for the structure
            binary.BigEndian.PutUint32(buf[0:4], uint32(n-4))
        case reflect.Uint8:
            ...
            }
        }

but, sadly this didn't work as well as the get and set functions we added earlier. When I run this I get the following error that I'm having trouble figuring out, maybe you have an idea?

Here i'm trying to create a node /test/create with the data 'jobone'

./gozoo -p/test/create -c jobone

   path: /test/create
calling create
2012/12/07 09:06:42 ---> in Create
2012/12/07 09:06:42 jobone
2012/12/07 09:06:42 ----------create request------------------------
2012/12/07 09:06:42 &{xid:1 pkt:0xf840085040 recvStruct:0xf8400398d0 recvChan:0xf840089050}
2012/12/07 09:06:42 &{requestHeader:{Xid:1 Opcode:1} Path:/test/create Data:[106 111 98 111 110 101] Acl:[{Perms:31 Id:{Scheme:world Id:0}}] Flags:0}
2012/12/07 09:06:44 --------- +||||+ -----> req =>&{xid:1 pkt:0xf840085040 recvStruct:0xf8400398d0 recvChan:0xf840089050}
2012/12/07 09:06:44 --------- +||||+ -----> req.pkt =>&{requestHeader:{Xid:1 Opcode:1} Path:/test/create Data:[106 111 98 111 110 101] Acl:[{Perms:31 Id:{Scheme:world Id:0}}] Flags:0}
2012/12/07 09:06:44 --------- +||||+ -----> n =>59
2012/12/07 09:06:44 --------- +||||+ -----> buf =>[0 0 0 59 0 0 0 1 0 0 0 1 0 0 0 12 47 116 101 115 116 47 99 114 101 97 116 101 0 0 0 6 106 111 98 111 110 101 0 0 0 17 0 0 0 31 0 0 0 5 119 111 114 108 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...  0 0]
2012/12/07 09:06:44 ----------create response------------------------
2012/12/07 09:06:44 &{responseHeader:{Xid:1 Zxid:12886431305 Err:-5} Path:7v-?????ͪ?}
2012/12/07 09:06:44 err:<nil>
2012/12/07 09:06:44 rpath:7v-?????ͪ?
2012/12/07 09:06:44 <--- out Create

I think I have the acl struct set right, the error returned is -5, which is trouble with unmarshalling as near as i've been able to uncover ... hope its clear what i'm trying to do.

Bill

Is log.Fatal in sendSetWatches neccessary?

Hi,

At the end of sendSetWatches there is the following code:

    go func() {
        res := &setWatchesResponse{}
        _, err := c.request(opSetWatches, req, res, nil)
        if err != nil {
            log.Fatal(err)
        }
    }()
}

There are two issues with the code:

  1. I believe when request timeouts, or some other networking error happens the process just exits. Probably it's just a bug, as all other networking errors are handled gracefully
  2. The log message should be more descriptive. In particular if I see: 2014/11/17 16:57:43 read tcp 198.118.130.75:2181: i/o timeout, can it be attributed to this particular fatal log call?

Is there a way to tell if a previously acquired lock has been lost?

I'm trying to use the Lock.Lock() functionality and it looks like there's no (easy?) way for the caller of Lock.Lock() to know if after acquiring the lock, the lock is subsequently lost. For example: what if ZK goes down?

If we could get the key-path to the znode that the lock is holding onto we could, in principal, watch and make sure we have the lock? Any suggestions on how to implement this?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.