This issue represents an opportunity for discussion of RFD 108 Operator Configurable T

RFD 110: Discussion about rfd HOT 5 OPEN

tritondatacenter commented on August 10, 2024

RFD 110: Discussion

from rfd.

Comments (5)

trentm commented on August 10, 2024

Note that [email protected] has server.inflightRequests() and a simple plugin to respond with 429s if the number of inflight requests goes above a hard threshold: https://github.com/restify/node-restify/blob/master/lib/plugins/inflightRequestThrottle.js
This seems equiv to the "concurrency + queueTolerance" described in the current proposal.

A discussion of or comparison with restify's current throttle (http://restify.com/docs/plugins-api/#throttle) plugin would be useful. Sorry if I missed that discussion elsewhere.

from rfd.

jmwski commented on August 10, 2024

At a manta call a couple of weeks back I proposed using a similar restify plugin but we came to the conclusion to not use it because (1) not all services that we want to throttle use restify, and (2) we also just didn't want to depend on restify. I can include some notes about this in the RFD itself.

from rfd.

rmustacc commented on August 10, 2024

Thanks for putting this together.

Is it possible to get a discussion of why we ended up going with a per-process model with no state shared between them? I'm not sure if this was just for simplicity, if other things were ruled out already, or not. I mostly ask because other major cases that have come up for throttling are to have per-user throttles, where the stateless nature can break down right away. In addition, it can be harder to reason about what the per-process throttle should be, as that ends up meaning that the throttle scales with the number of zones. Also, was there any consideration for doing the throttling based on the backend resources? For example, the global handled request rate may vary a lot depending on the actual consumed resources.

Is throttling going to be enabled by default or will it be disabled by default and it will be up to operators to enable it somehow? If so, how do we plan on managing this across the fleet even statically. Would these be sapi properties that config-agent picks up or is there some other means that we're expecting to use?

from rfd.

jmwski commented on August 10, 2024

The per-process muskie throttle was just an initial iteration intended mostly as a proof-of-concept but also as a base implementation to work off of to build a more generic library.

I think making throttling decisions based on the availability of backend resources is a good idea. Do you have any suggestions as to what specific resources would be useful to take into account and how I might be able to interface with such metrics from a node module?

The eventual plan is to have a globally aware stateful throttling service. If I recall correctly in an early discussion at the manta calls we decided that sapi might not be the right place for these tunable to be pushed out because they're intended for long-lived properties that don't change frequently (I'm not 100% certain but I seem to recall dap saying something along those lines).

In the RFD I propose the idea of having http endpoint for setting and getting throttle parameters dynamically. I guess I really what I really want is for it to play the role of an agent that could be queried periodically by a global throttling service that could make decisions based in information from all the muskie processes.

I think maybe leaving throttling off by default might be less invasive. But I could also see having a boolean per-service parameter that decided whether throttling is on or not.

UPDATE: I've modified the RFD to (hopefully) address the above feedback! Thanks to everyone who has read through it so far!

from rfd.

kellymclaughlin commented on August 10, 2024

Returning a 429 for every throttled request seems potentially confusing since the throttle is not actually looking at the user account making the requests to make decisions. It seems like there are two ideas here: user rate-limiting and backpressure to prevent system overload. This makes me think about two different things:

It seems like most of the cases we're trying to handle with the initial iteration are actually server-side problems and something like a 503 would be more appropriate. For internal services like moray where the database replication is laggy that isn't a user problem, it's our internal service that is unable to keep up. If the return status code is static maybe it should be configurable.
Maybe the throttle (now or in the future) should have a backpressure component for managing overload that receives feedback from the servers it communicates with and also a user rate-limiting component. It be great to be able distinguish these two cases and return different status codes based on the problem.

As an aside, I wonder if it would be a good idea to standardize our manta components to all use either an HTTP framework like restify or the RPC framework. I can't claim to know the history about why different pieces use one or the other, but having more commonality has a lot of advantages when learning and debugging and also means less software (and accompanying bugs) to support.

from rfd.

RFD 110: Discussion about rfd HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent