And Lo,
for the Earth was empty of Form,
and void.
And Darkness was all over the Face of the Deep.
And We said: 'Look at that fucker Dance.'
― David Foster Wallace, Infinite JestBack to Index
Early Bukowski is so weird and pretentious and sad.
Far from the image of Bukowski recognized in certain b-tier art consumer circles today: the tired but ultimately complacent and entertainingly vulgar, foul-mouthed, grotesuqe, post-sexual old fuck.
It is a frustrated and resentful young Buk, aggresively naive, lobbing trite and tired old observations about consumerism and superficiality at a society he sees himself as somehow, not belonging, if not so much as superior to. Criticizing, unsubtly justifying himself and his obvious misery instead of trying to merely 'convey' and 'express'
We see a man utterly, yet understandably, incapable of conceiving of a reader who is genuinely curious to read about who he is, what he sees and says and does, to get to know him as opposed to his thoughts and opinions and, strangely, words.
A sort of like, archetypal adolescent lamentation, representative of types that he would, himself, eventually come to describe as: 'writing about life as if they had a real angle on it.'
He, to put it rather mildly, bores. If his trick is being such a disastrous mess that you can not, for the life of you, look away, then clearly he isn't there yet, as he writes.
The aspiration to come out on top is still there, and it's so familiarly tedious and dull, another ambitious striver making ostensibly pointed remarks on the sourness of the proverbial grapes. He loathes and resents us for having and enjoying and taking for grant all these things which he will, eventually, come to confess a profound pain of deprivation and unmet need for. The deadthly-tired old toad now brooding, ruminating, reflecting on all those things he never had, will never have. The deep and festering wounds of rejection, isolation and pain ever wide open, never healed, nor closed or scarred. A Bukowski who, in due time, wearing his collosal gash on the proverbial sleeve, would gather small and rowdy crowds in small rooms translucent with smoke and loud with jeering and swear words, crowds that never missed a good laugh at a bodily-function type joke but would often seem to fail to perceive the endings of his rather more serious pieces and, like, forget to applaud, or respond, the way that failing to add the appropriate inflection at the end of a sentence can leave your interlocutor like, hanging, waiting for you to go on
Because, still, at that point, we see a man, who, like so many of us, still hopes and strives, kicks back and wants, grasps for it, frustrated and sad.
A while back I was looking into HTTP GETs timing out over the loopback interface. Big server, plenty of headroom, no network in the path — just curl talking to nginx on the same box. Should have taken under a second. Was hitting my 10s timeout instead. Not every request. Just some of them.
First thing you do is check the error log. Nothing. Access log? Everything looks normal. syslog? Nothing of interest. dmesg? Clean. No OOM, no kernel weirdness, the box wasn't even sweating.
So I did the most basic thing imaginable. Wrote a small bash loop that fired GETs at the server forever and printed how many requests it had done since the last timeout. Then I sat there and watched logs scroll by in another terminal, waiting to see if anything suspicious happened in syslog at the moment a timeout occurred.
If I have ever felt like more of monkey then I'm not sure when that was.
And then it occured to me that there was a way to match curl's timeouts to nginx's access log lines. The server was already logging the User-Agent. So instead of just printing the iteration number to my terminal, I started sending it as the User-Agent of each request:
i=0
while true; do
i=$((i+1))
curl -m 10 --user-agent "iter:$i" -s -o /dev/null \
-w "%{http_code} %{time_total} iter:$i\n" \
http://127.0.0.1/
done
Now when curl reported a timeout I had a unique iteration number I could grep for in the access log. I waited for one, took the iteration from curl's output, grepped the access log.
Status: 200.
Ok. Probably typo'd the grep. Try again. Wait for another timeout. Grep.
200.
Every single timed-out request was logged as 200 by nginx. Every. Single. One. The server believed it had served a request that the client never received.
So then I stopped and stared into the abyss of the terminal.
Once you stop staring, the explanation isn't even that exotic. nginx logs the request when it considers it complete, and "complete" for nginx means I finished writing the response into the kernel send buffer. It does not mean the client acknowledged anything. It does not mean the connection closed cleanly. There is no FIN-ACK in this picture. From nginx's point of view the transaction ended when the last writev / sendfile / SSL_write returned and there was nothing more to send. Status: 200. bytes_sent: whatever. Log line written. Move on.
Whatever happens on the wire after that is somebody else's problem. And on loopback, apparently, somebody else had a problem.
I think a lot of people read an access log the way they'd read the receipt at a restaurant — status 200, you got the food. But it's more like reading the kitchen ticket. The kitchen says the food went out. Whether it ever made it to your table is a separate question, and the access log is not going to answer it for you.
Once I understood this I stopped chasing nginx and went to look at the actual TCP state, which is where the answer was always going to be. Send buffer was full, nothing was being acked from the other end of loopback (yes, loopback can do this — that's a separate rabbit hole), eventually curl gave up. nginx never noticed because nginx had already moved on.
I'd love to tell you the next chapter and what was actually wrong on the kernel side, but that's a different post and honestly I'm still not 100% sure. The point of this one is simpler: stop trusting the access log.
If your access log shows 200 and your client says it timed out, your client is right. Always. The access log is telling you what the server intended; it is not telling you what happened to the bytes.
A few things I do differently now:
$bytes_sent and $request_length in my log format and sanity-check them when I have weird timeouts. If bytes_sent equals the full content length, the server believes it did its job. If it's short, that's a clue.The lesson here is not really about nginx. nginx is doing exactly what it's documented to do. The lesson is that every log line is triggered by some specific event in the code, and if you don't know which event, you don't know what the line means. Otherwise you end up where I was — staring at a terminal at 11pm wondering why a server that says it succeeded is also the one that's making your client time out.
Ask me how I know.
Back to IndexHot take: HTTP/3 / QUIC is one of the cleanest case studies of dev<>ops tension I can think of, and almost nobody frames it that way. Everyone wants to talk about head-of-line blocking and 0-RTT. Fine. But the actual organizational consequence of QUIC is that the transport layer just walked out of the kernel and into your application binary, and that means it walked out of the ops team's hands and into the developers' laps.
For three decades the deal was: developers write the app, ops owns the network. TCP lived in the kernel. Congestion control, retransmits, backoff, RTO — all of that was somebody else's problem. You sysctl-tuned a few knobs, you upgraded the kernel for a new congestion controller, and the same TCP stack served every process on the box. One place to observe it. One place to fix it. ss, tcpdump, netstat, the /proc/net/* counters, nstat — an entire ecosystem of tools that assume the kernel is the source of truth about your connections.
QUIC throws that contract out. QUIC runs over UDP, and the entire reliable-transport state machine — streams, flow control, congestion control, loss detection, retransmits, ACKs, the works — lives in userspace, inside whatever library the application happens to link against. nginx-quic, ngtcp2, quiche, msquic, lsquic, the Go stdlib's implementation, Chromium's implementation. They are not the same code. They do not behave the same under loss. They do not expose the same counters. There is no ss for QUIC. The kernel sees UDP datagrams and shrugs.
That is a transfer of responsibility, dressed up as a protocol upgrade.
Concretely: when something is wrong at the transport layer in a QUIC deployment, the ops-side answer ("let me check the kernel, the NIC, the sysctls") gets you basically nowhere. The behavior you care about is now a function of which library your developers chose, which version of it they pinned, what congestion controller it implements, and what it decided to log. If the dev team picks one library and the CDN in front picks another, those are two different transports that happen to interoperate on the wire. The properties you used to get for free — one TCP stack, kernel-tuned, observable with standard tools — are gone, and getting them back is a development task, not an ops task.
This is also why a lot of organizations quietly hate operating QUIC even when they love serving it. The thing that made TCP boring — that it was somebody else's code, running in somebody else's address space, observable with somebody else's tools — is exactly the thing QUIC gives up. In exchange you get to ship transport-level changes without waiting for a kernel release, which is genuinely great if you are Google and terrible if you are a four-person ops team trying to figure out why p99 went sideways on Tuesday.
None of this is an argument against QUIC. It's an argument that QUIC is not just a protocol change, it's an org-chart change, and the people who have to debug it at 3am are usually not the ones who got to vote on it.
Back to IndexIf your nginx alerting is built on $upstream_response_time because you read somewhere that it's "how long the request took to be served", you are quietly underreporting a class of incident. Specifically: any incident where requests pile up behind proxy_cache_lock.
The thing I had to learn the slow way: time spent waiting on the cache lock is counted in $request_time but not in $upstream_response_time. They are not the same metric and they don't differ by a constant. Under contention they can drift apart by seconds, and the variable that "feels" like the right one to alert on is the variable that is hiding the problem.
proxy_cache_lock exists to coalesce parallel cache-fill requests. If ten clients ask for the same uncached object at the same time, you don't want ten parallel fetches against your origin — you want one fetch and nine followers waiting for the result. The followers, while they wait, are blocked. That wait is real wall time. It's user-visible. It's in $request_time. It is not in $upstream_response_time, because $upstream_response_time measures only the upstream call: connect plus send plus read. Time the request spent pinned to a lock isn't part of that, and nginx is not going to retroactively add it for your convenience.
Single-worker nginx, cache-locked location, a lua origin that sleeps before responding. Same cache key for every request so the lock is shared.
daemon off;
master_process on;
worker_processes 1;
error_log stderr;
events {}
http {
log_format cachetest
'$time_iso8601 conn=$connection req#$connection_requests '
'"$request" $status cache=$upstream_cache_status '
'rt=$request_time urt=$upstream_response_time';
proxy_cache_path /mnt/disk1/cd levels=1:2 keys_zone=normal:10m
max_size=10m inactive=1h use_temp_path=off;
upstream origin { server 127.0.0.1:8081; }
server {
listen 8080;
access_log /dev/stderr cachetest;
location / {
proxy_pass http://origin;
proxy_cache normal;
proxy_cache_key $uri;
proxy_cache_valid 200 1h;
proxy_cache_lock on;
proxy_cache_lock_timeout 2s;
proxy_cache_lock_age 1h;
}
}
server {
listen 127.0.0.1:8081;
location / {
content_by_lua_block {
ngx.sleep(3)
ngx.say("ok")
}
}
}
}
Drive it with five parallel curl calls to the same URL.
Origin sleeps 3s, proxy_cache_lock_timeout 2s. The leader fetches upstream. The four followers sit on the lock for 2s, the lock times out (the leader hasn't finished — it'll finish at 3s), and each follower then goes upstream itself.
conn=2 cache=MISS rt=3.010 urt=3.006 # leader
conn=4 cache=MISS rt=5.010 urt=3.006 # follower
conn=6 cache=MISS rt=5.010 urt=3.006
conn=7 cache=MISS rt=5.010 urt=3.006
conn=8 cache=MISS rt=5.010 urt=3.006
For each follower, rt - urt = 2.0s exactly. That's the lock wait. urt reflects only the upstream call — identical for all five.
If your dashboard is graphing urt p99, this incident is invisible. Every request "took 3 seconds upstream", which is technically true and operationally useless. The clients waited 5.
Now make the origin uncacheable. Same 3s sleep, but the response sets Cache-Control: no-store. Push the lock timeout high enough that it doesn't fire (proxy_cache_lock_timeout 60s). The cache never populates, so each upstream call doesn't satisfy any followers; the lock just keeps getting passed down the line.
content_by_lua_block {
ngx.header["Cache-Control"] = "no-store"
ngx.sleep(3)
ngx.say("ok")
}
rt=3 urt=3 # leader
rt=6 urt=3 # waited 3s
rt=9 urt=3 # waited 6s
rt=12 urt=3
rt=15 urt=3
urt stays flat at 3s. rt climbs by the cumulative lock wait. If you're alerting on the upstream metric, the box looks fine: every upstream call takes 3 seconds, which is what it always takes. Meanwhile the fifth client waited fifteen.
The reason this surprises people, I think, is that "upstream response time" sounds like the longest-running, most-distal thing nginx waits on, and intuitively the longest-running thing should be the dominant component of latency. But nginx is bookkeeping, not narrating. $upstream_response_time is "time spent talking to the upstream server", measured from the perspective of the connect/send/read calls. Time spent waiting on internal coordination — locks, queues, anything that isn't the upstream socket — is not in there. It's a measurement of one specific subsystem, and the name oversells it.
The fix is boring: alert on $request_time if you want to know what the client experienced. Keep $upstream_response_time in the log so you can compute rt - urt and see which component is moving. A high rt with a flat urt across many parallel requests for the same key is the tell-tale of cache-lock contention, and there isn't really anything else that produces that exact shape.
Same lesson as the access-log post, slightly different shape: every metric is the answer to a specific question. If you don't know which question, you don't know what you're looking at.
Back to Indexnginx can be incredibly unintuitive in some of its behaviors. Variables will often mean something either subtly or entirely different from what you think they mean, directives will do something other than what you expected them to, function names will suggest behaviors that function does not exhibit at all.
But I want to reflect on something deeper than the myriad peculiarities of any one caching proxy server or another, and that's our inherent lack of intuition about parallelism.
Back to Index