Git Product home page Git Product logo

Comments (10)

brndnmtthws avatar brndnmtthws commented on August 17, 2024

Instead of using lock(), maybe we should use tryLock() with a timeout? I'd rather overwrite some data, than have no data written at all.

from secor.

pgarbacki avatar pgarbacki commented on August 17, 2024

Underneath, locking is implemented on top of twitter's DistributedLock. You can read about the details of that algorithm here: http://twitter.github.io/commons/apidocs/index.html#com.twitter.common.zookeeper.DistributedLockImpl

Since locking is based on ephemeral nodes, no cleanup is required. It is possible that intermediate paths will stay in zk after the thread holding the lock dies but I don't think it's a big deal since those paths will be reused.

from secor.

brndnmtthws avatar brndnmtthws commented on August 17, 2024

Indeed, it doesn't seem right. That's why I reported the issue here.

from secor.

pgarbacki avatar pgarbacki commented on August 17, 2024

What doesn't seem right?

from secor.

brndnmtthws avatar brndnmtthws commented on August 17, 2024

The part where the ZK locks were stuck, and the only solution was to manually remove them, is not right.

from secor.

pgarbacki avatar pgarbacki commented on August 17, 2024

What are the circumstances under which that would happen?

from secor.

brndnmtthws avatar brndnmtthws commented on August 17, 2024

In my particular case, it happened upgrading the secor cluster from the previous build (which was c982615 at the time) to the latest build (which was 96969a8 at the time).

from secor.

pgarbacki avatar pgarbacki commented on August 17, 2024

Did you wait long enough for the zk lease to expire before diagnosing the problem?

According to the Java API linked above, a deadlock would happen if the worker is hanging but the zk thread is alive. I don't see a code path that would lead to this unless the consumer somehow gets stuck on s3 operations.

from secor.

brndnmtthws avatar brndnmtthws commented on August 17, 2024

I think I waited as long as 60 minutes, which should have been more than enough. The other possibility is that there was a java process hung somewhere (holding the lock), but I never thought to check. If that was the case, it's no longer running.

from secor.

pgarbacki avatar pgarbacki commented on August 17, 2024

We need a better understanding of the issue before attempting a fix. The next time it happens, go to zookeeper and check the values under /secor/locks///. If there is a process holding a lock, ephemeralOwner should be set to a non-zero value. You should be also able to see mtimes and ctimes. Those should provide a bit more context.

For the use case at Pinterest, it is instrumental that there is no message duplication and loss so changing the semantics to best-effort locking is not a preferred solution. FWIW, in the last 6 months or so since we started using Secor, I didn't see an issue with locking but of course this doesn't prove anything.

from secor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.