diff options
Diffstat (limited to 'posts/2025/systems-self-defense.md')
-rw-r--r-- | posts/2025/systems-self-defense.md | 309 |
1 files changed, 309 insertions, 0 deletions
diff --git a/posts/2025/systems-self-defense.md b/posts/2025/systems-self-defense.md new file mode 100644 index 0000000..25d1e50 --- /dev/null +++ b/posts/2025/systems-self-defense.md @@ -0,0 +1,309 @@ +# Systems and Self-Defense + +_Published 2025-04-11_ + +Despite what the first Avengers movie would tell us, a system can +protect itself _from itself_ if built intentionally. To understand how, +let's start with the following: + +**Tyler’s Law**: +> “Any system will inevitably be used to 100% of its authorized capacity.” + +**Tyler's Corollary**: +> "If your authorized capacity is equal to your available capacity, your +> system will fail." + +"Authorized" is a term different from "available" capacity and they are +not interchangeable. + +## Denial of Service (DoS) + +Authorized capacity for a computer is not generally controllable by the +end user (unless you've got `root` access), so a single process can +consume as much in resources as the computer has available (with very +few limits). If one runs a command to duplicate a movie file of "Plan 9 +from Outer Space" fifty thousand times, like + +```seq 50000 | xargs -I{} cp plan9.mov plan9copy{}.mov``` + +the computer will dutifully use all its resources to accomplish that +objective until the job completes or the disk is full. + +Note, there is _no interactivity_ once executed -- the command accepts one +set of instructions, stops taking input, then executes without +indications of progress until the job is done. + +A service, on the other hand, accepts input from another source and +_persists._ As the service does work on a job, it may be coded to accept +more inputs, and reply to the requestor with already-completed work. + +This poses a problem for managing resources: how many resources are to +be used for in-flight operations? Does the service have enough resources +to accept new work while processing a current job? How does the service +tell the requestor it's not ready yet? + +If the system is using all of its _available_ resources to do work, then +there is no resource left to respond to a client/user, to process added +[`signals`][9] (like `kill`/`term`), or even provide telemetry to an +observer. + +If a service or a computer "goes silent," how are we sure it is +functioning correctly, if at all? + +## The Problem + +The key point to the previous section is this: + +>If we are able to give a service enough work that it uses all of its +>available resources, then we've achieved a [Denial of Service][1] +>condition. + +This is bad. To allow a program to "cancel" an erroneous command or be +triggered to produce telemetry / feedback, the program must be able to +listen for [`signals`][9] from the operating system and act accordingly. +The only exception is [`SIGKILL`][2], which cannot be blocked or +handled. + +The goal of the operating system / kernel is to ensure that "authorized +resources" never exceed "avialable resources", or *the system will +crash*. This is why `SIGKILL` is unblockable -- it's an action of last +resort by the operating system (OS) to protect itself. + +But what if we have an interactive _service_? We don't want to terminate +the process if it gets stuck -- we want it to keep running. So how do we +protect it? + +## Self-defense + +All programs and services practice a form of self-defense known as +response-codes or error-codes. They provide signals to an operator or +requestor that vary from "I'm still here" to "Please try again later" +or "this broke something" or even "your request is broken and I won't do +it." + +What does this look like in practice though? + +### Service Example + +Let's say I have a web server that accepts text, appends it to a file, +and returns a line number to the client. + +``` +(Step 1) ClientA -- "foo" --> <Service> --> [write to disk] +(Step 2) ClientA <-- "1" -- <Service> +``` + +Because this service has to write values sequentially, while it is +performing work for Client-A, it cannot do anything else. This means +that if Client-A gives sufficient work to Service, it can't do anything +else. + +So let's look at what happens when we introduce Client-B: + +``` +(Step 1) ClientA -- "foo" --> <Service> --> [write to disk] +(Step 2) ClientB -- "bar" --> <Service> --> [BLOCKED] +(Step 3) ClientB <-- "TIMEOUT" -- <Service> +(Step 4) ClientA <-- "1" -- <Service> -- [write completes] +``` + +Because this service has to write values sequentially, while it is +performing work for Client-A, if Client-B wants send "bar" to the +service, we must tell Client-B to wait or come back later. + +By default, a TCP connection to a service will `CONNECT`, send data, and +then receive a response. The operating system can multiplex TCP +`CONNECT` requests, so if one is already active, it will tell the +underlying network hardware to "wait" until it can open up another port. + +Your application, however, can't see this "wait" condition -- it just +goes silent until either the kernel accepts your connection, or you +time-out and the kernel evicts you from the queue. + +This is not a great solution. What could we do instead? + +### Do Nothing? + +No really, what if we do nothing? + +That's a pretty great situation for the developer -- zero work needed +and the kernel/OS does the multiplexing. This, however, is a _horrible_ +experience for clients and service operators. Before specific solutions +like running a dedicated HTTP service or providing an http stack +in-process, the answer to making a service network-available was using +[inetd][10]. It was (and still is) incredibly slow and does not scale +beyond very low traffic rates. + +So, in doing nothing, the service appears inconsistent, with +periodically high latency, and does not actually fix the problem (see +"Denial of Service" above). + +### Application Layer Defense + +Most network services operate on HTTP, even ones that [bind to unix +sockets][5]. Even [grpc operates on http2][6], so I think it's safe to +say I can leverage HTTP status codes as an example of how to respond to +a client without requiring _too much_ translation to other stacks. + +When an HTTP server is "busy", [rfc6585][8] suggests responding with a +[code `429`][4] which maps to "Too Many Requests." + +Of note is this paragraph: + +>Note that this specification does not define how the origin server +>identifies the user, nor how it counts requests. For example, an +>origin server that is limiting request rates can do so based upon +>counts of requests on a per-resource basis, across the entire server, +>or even among a set of servers. Likewise, it might identify the user +>by its authentication credentials, or a stateful cookie. + +Let's refer back to our Server example. If we have Client-A handling a +request, then any *additional* requests during the time Client-A's +request is being processed would be "too many requests" for the server +to handle. + +So what we should so is configure the server to do two things at once +(or, at least, two things concurrently). As pseudocode: + +``` +var locked bool + +fn writeLineToFile(s string, f file) -> (success bool, l int) { + locked = true; + n, err = os.Write(s, f); + locked = false; + + if err != nil{ + return false, 0; // We failed + } + return true, n; +} + +main(){ + f = open("filename") + requests = http.Listen(port) + + for r in requests { + if locked { + r.reply(http-429) // Too Many Requests + } else { + ok, line = writeLineToFile(r.payload, f) + if ok { + r.reply(line) + } else { + r.reply(http-500) // Error + } + } +} +``` + +Let's take a second to look at this -- the main code opens a file to +persist the data and listens for requests on the HTTP port. All seems +normal, until we get to the `if locked` section. If the lock is active, +immediately stop what you're doing and reply with a [HTTP-429][4]. +(For those using GRPC, you can swap "HTTP-429" out for metadata [status +code][7] `UNAVAILABLE(14)`.) + +You'll notice that we haven't done any additional code checking -- no +computation, no querying the file, just check the boolean value, then +act. + +This is, computationally speaking, very cheap to do. The benefit of the +positioning also means that we short-circuit the operation _before we +even start down the computationally-expensive path_ of accepting the +payload and doing anything with the file. + +Looking further into the `writeLineToFile` function, you'll also see +that we "lock" the file for the minimum part of the operation -- the +part where we actually write to the file. We don't lock the +error-checking portion of the code and we don't lock the reply to the +client. This means that while the code is doing +things-not-writing-to-files, we can go as fast and as concurrently as we +want. + +If we increase our activity to three clients (A,B,C), our new operation +graph looks something like this: + +``` +(Step 1) ClientA -- "foo" --> <Service> --> [write to disk] +(Step 2) ClientB -- "bar" --> <Service> --> [BLOCKED] +(Step 3) ClientC -- "quux" --> <Service> --> [BLOCKED] +(Step 4) ClientB <-- "E: 429" -- <Service> +(Step 5) ClientC <-- "E: 429" -- <Service> +(Step 6) ClientA <-- "1" -- <Service> -- [write completes] +(Step 7) ClientC -- "quux" --> <Service> --> [write to disk] +(Step 8) ClientB -- "bar" --> <Service> --> [BLOCKED] +(Step 9) ClientB <-- "E: 429" -- <Service> +(Step 10) ClientC <-- "2" -- <Service> -- [write completes] +``` + +You'll notice at Step-7, ClientC tried again, faster than ClientB, and +they won the race! As we reply to clients, they can decide when to try +again, and since ClientA's request was fulfilled in Step-6 the lock was +lifted, allowing ClientC to make a write. + +Regardless of who wins the race to the next write, the service can only +handle one active request at a time, and **it protects itself** from +being forced to handle additional work by telling clients to try again. + +## "Authorized Capacity" != "Available Capacity" + +The service described previously has an "Authorized Capacity" of "1 +in-flight request." If the computer running this service was a +single-core tiny computer, it might not have enough power to serve +additional requests (even the HTTP-429 replies) but most computing +devices will have either sufficient speed (allownig for concurrent +processing) or additional cores to handle concurrent requests. As a +result, authorized capacity is less than the total available capacity of +the server. + +When running an unmodified, uncontained process in a computer, the +process is "authorized" to use _nearly_ the entire set of resources +available to the OS. + +In a cloud environment, a process can run inside a virtual machine (VM) +where the constraints are similar to a bare-metal computer, but (with +few exceptions) are executing within even higher capacity bare-metal +hardware. _This means the VM is authorized to operate in a +higher-availability environment._ + +If running in a container or bsd-jail, the constraints are set by the +container manager (sometimes just a command line argument!) then the +kernel grants (and constrains) those resources to the process inside. + +If a process exceeds its authorized limits in any of these environments, +the supervisor process (OS, container manager, VM, etc) will forcibly +end the offending process ("kill"/"terminate") and reclaim its +resources. It doesn't matter whether or not **additional resources +exist** that the process could use -- the supervisory system will act. + +## Conclusions + +All of this leads to a few heuristics when defining the runtime +environment for an application: + +- Determine the maximum resource usage for a given process, then add + excess capacity to ensure the system has more than it needs by an + appreciable margin. A heuristic is a minimum of 25-15% at lower values, and as + little as 5% for larger environments (to avoid wasting significant + capacity). +- Stress-test / Load-test your application to its "maximum" throughput + for the resources you want to use, then set your maximum in-flight to + 90% of that value. This will ensure that your instance is always + capable of processing the maximum in-flight within limits. + +Both of these will ensure your *authorized* resources (for the +application, your container, and your process) will never exceed your +*available* capacity and keep your service functioning at maximum +throughput in the face of overwhelming requests. + +[1]:https://en.wikipedia.org/wiki/Denial-of-service_attack +[2]:https://www.gnu.org/software/libc/manual/html_node/Termination-Signals.html +[3]:https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/413 +[4]:https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/429 +[5]:https://aweirdimagination.net/2024/04/07/http-over-unix-sockets/ +[6]:https://grpc.io/blog/grpc-on-http2/ +[7]:https://grpc.io/docs/guides/status-codes/ +[8]:https://www.rfc-editor.org/rfc/rfc6585#section-4 +[9]:https://www.man7.org/linux/man-pages/man7/signal.7.html +[10]:https://en.wikipedia.org/wiki/Inetd |