HAProxy, Chrome, tcp-preconnect and error 408

This is my guide to making haproxy work reasonably with the deployment model I’m constructing.

In broad strokes, I’m looking to have haproxy running on a host which is acting as an http and https front end for a large number of web servers living on a private network, each hosting an individual web site.

But first I had to make haproxy work with the modern web browser Chrome.

Starting with default settings led to a series of werid failures, where I would either get flashes of grey blank pages while loading, or error pages loaded when I clicked on content that I knew existed. Of course, this wasn’t generating logs on the webserver, and the logs from haproxy were equally weird, with nothing showing up that was temporally aligned with the action that created the result.

I knew this couldn’t be a just me problem yet. That generally takes me a few more hours of digging into a corner nobody else has ever gone into.

So off to google I go and type “haproxy chrome” and it suggests “haproxy chrome 408”. Well, that’s an omnious sign.

The first hit is https://www.haproxy.com/blog/haproxy-and-http-errors-408-in-chrome/ which starts to explain what’s going on.

Related is this Chromium issue: https://bugs.chromium.org/p/chromium/issues/detail?id=377581

A brief aside. Standing around with provisional sockets open burns resources on the server and the client, all for a few round trips of latency in client reaction. Well, I can totally understand reducing client latency by 1.5 round trips, but getting broken pages back, when I click links is a terrible result. Additionally, responding with a error when you’ve never seen a request is a wonderfuly grey area of the http spec, as noted in the chromium issue.

Why would google think this isn’t a big deal? Based on @dakami’s Defcon presentation from 5ish years ago, he was scanning the internet and realized that only one side of the connection actually had to keep state. His high volume scanner decided to be the side of a tcp link that wasn’t keeping state. This worked great, until he scanned google, where he discovered that google isn’t actually tracking state for tcp either. TCP doesn’t work right when neither party is tracking state. That said, there’s the reason provisional connections aren’t considered a big deal. If your tcp stack doesn’t keep state for clients on the internet, there’s no cost for having them open provisional sockets. So, it all hangs together, I don’t know that it’s offical reasoning.

This commment is enlightening as well: https://bugs.chromium.org/p/chromium/issues/detail?id=377581#c47
The implication I see in this is that you’ve added a new race condition, the question of what’s a ‘fresh socket’. That’s likely a timeout of some sort, or an event of some sort, but that’s not clear.

Right then, what to do about it.

As of December 2017, when I’m writing this, HAProxy 1.7.9 has the following configuration option:
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#option%20http-ignore-probes
which fixes most things.

Is this perfect? No, the hiding of timeouts in the logs can cover up all sorts of errors and hides actual activity. Additionally, I experience flashes of the grey error page when Chrome has to re-connect to the server because it used an already closed socket.

It seems Chrome will only sit on a provisional socket for 5 minutes, so if you are willing to carry the load of idle client sockets for 5 minutes, I’d suggest also setting your timeout client to more than 5 minutes. Current chrome will open up to 6 sockets to a server. So scale the number of sockets you are willing to let sit idle accordingly.

That said, that still can create failures, because if your server timeout isn’t as long, the live client connection might be attached to an already timed out server socket, thus requiring the whole error 408 thing again. Thus you need to set your server timeouts to be just as long, and that means possibly holding open sockets on the servers for the same length of time. I suspect this is one case of what is being covered in the manual by “it is highly recommended that the client timeout remains equal to the server timeout in order to avoid complex situations to debug.” – https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-timeout%20server

Summary:
If you are running small websites ( 1 server or less of load, and less than 100 simultanious clients), I suggest the following settings.

timeout server 302s
timeout client 302s
timeout connect 10s
option http-ignore-probes

Other things that I learned along the way:
The first work around was to make the error 408 page /dev/null, resulting in no data being forwarded to chrome and triggering a reload. That still works to some extent.
You can get haproxy to serve single static small pages by declaring them to be a backend at a particular url on error 200. This is a terrible plan.