Git Product home page Git Product logo

Comments (7)

puneetkhanduri avatar puneetkhanduri commented on September 24, 2024

The use case you've proposed makes sense but let me share the motivation behind our current implementation.

We currently return an exception to respond as quickly as possible to the client without blocking on responses from any of the test instances (primary, secondary, candidate). Our premise is that some resource (thread/task/memory) is blocked on the client side (whoever is sending test requests to diffy) and should be freed as quickly as possible.

This is specially relevant when you eavesdrop on your production clusters to sample and emit large volumes of one-way "dark" traffic to Diffy. Whatever instrumentation sends this traffic to Diffy does not care about the response as it's only purpose is to send a sufficiently large volume of traffic while consuming the minimal amount of resources.

Ultimately, the constant exception is just about being able respond to any type of request without any blocking on the Diffy side.

from diffy.

cross311 avatar cross311 commented on September 24, 2024

Just to make sure I understand: Given current implementation of Diffy, the only way to truly utilize its power is to add an additional piece of software/infrastructure to record and play traffic against the Diffy proxy. Given that Diffy does not return anything that is usable for upstream consumers.

Does the Diffy team plan on releasing a tool to help with this? That seems like a crucial part of the puzzle.

I can see other approaches:

  • Setup another proxy in front of this proxy to send duplicate traffic to Diffy and ignore response.
  • Place load balancer in front of Diffy that sends a part of traffic to it, and make sure production systems retry on empty response, hoping the load balancer will send traffic to the production url.
  • Modify Diffy to return one of the three responses

I would really love to utilize Diffy, just having a hard time justifying the additional infrastructure needed to utilize.

from diffy.

cross311 avatar cross311 commented on September 24, 2024

Can you expand on how Twitter uses Diffy? I realize you might not be able to because it is not authorized public knowledge. I would find it helpful.

Thanks for your response!

from diffy.

coderunner avatar coderunner commented on September 24, 2024

What about passing a command line flag to determine if a response is to be returned or not?

Looking at the code, I saw that the current implementation calls all the backing services in sequence. I also saw some parallel implementation but that seems unused.

Question: Why sequential access has been chosen?

If passing a flag is acceptable, in order to send the response quickly, I think diffy should return the primaryService (or the flag could specify which response to return) response without waiting for anything else to be processed.

As a prototype, I modified slightly the DifferenceProxy#proxy method to query all services in parallel and return the future of the primary server response.

Does that make sense? Would you accept a PR that introduces a flag to return a response?

def proxy = new Service[Req, Rep] {
    override def apply(req: Req): Future[Rep] = {
      val rawResponses = Seq(primary.client, candidate.client, secondary.client) map { service =>
        service(req).liftToTry
      }

      val responses: Future[Seq[Message]] =
        Future.collect(rawResponses) flatMap { reps =>
          Future.collect(reps map liftResponse) respond {
            case Return(rs) =>
              log.debug(s"success lifting ${rs.head.endpoint}")

            case Throw(t) => log.debug(t, "error lifting")
          }
        }

      responses foreach {
        case Seq(primaryResponse, candidateResponse, secondaryResponse) =>
          liftRequest(req) respond {
            case Return(m) =>
              log.debug(s"success lifting request for ${m.endpoint}")

            case Throw(t) => log.debug(t, "error lifting request")
          } foreach { req =>
            analyzer(req, candidateResponse, primaryResponse, secondaryResponse)
          }
      }

      rawResponses.head.flatMap { Future.const }
    }
  }

from diffy.

puneetkhanduri avatar puneetkhanduri commented on September 24, 2024

I was in the process of composing a response proposing the approach of adding a command line flag. A pull request is welcome.

Regarding parallel vs sequential. We actually have code that does that in the proxy package but we moved from parallel to sequential as it serves as a noise reduction trick when used in primary -> candidate -> secondary sequence. This helps when the underlying data may be live and skews the odds of noise in favor of candidate. i.e. Primary is now more likely to disagree Secondary than Candidate because the request to Secondary is more delayed than the request to Candidate (relative to Primary) and underlying data is more likely to change over a longer time interval than a shorter one.

from diffy.

derEremit avatar derEremit commented on September 24, 2024

That was pretty confusing for me!
As I always thought I should get a response from a proxy!
Please at least mention that in the README

from diffy.

derEremit avatar derEremit commented on September 24, 2024

btw: for duplicating traffic at least on a test setup:
https://github.com/agnoster/duplicator

from diffy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.