Re: [squid-users] trying to track down a bug

From: Henrik Nordstrom <[email protected]>
Date: Thu, 20 Jan 2005 23:46:35 +0100 (CET)

On Wed, 19 Jan 2005, Robert Borkowski wrote:

> I think I'm onto a fix. I suspect the TCP write buffer was exhausted trying
> to send data to the aborted client. Not sure why this affects the connection
> to the origin server, but it does.

Yes.

The real bug here is that your server does not allow the client sufficient
time to process the reply. When sending a largeish reply there can be a
significant time between the finish of the reply (FIN) and until the
client has fully processed the reply (FIN in other direction). If the
server is impatient and aborts the connection between these two events the
yet unprocessed data will be flushed from the client.

In this situation the client is Squid, and the processing done by Squid is
forwarding the response to the original client who can not accept the data
at the same rate your server is sending it to Squid.

Squid has little choice but to wait processing the data. If your server
does not accept this then it will abort the responses before they can even
reach the originally requesting client.

> Doubling the defaults causes my test to pass.
> echo "4096 174760 349520" > /proc/sys/net/ipv4/tcp_wmem

This will only make the situation require a slightly bigger response, it
is not a cure to the problem.

Note: You can acheive the same result by setting the READ_AHEAD_GAP define
in the Squid sources to a somewhat bigger value.

The real problem is your server not being patient with letting clients
process the response after it has sent the FIN. In most OS:es this
patience comes almost for free (only a slight memory usage) by letting
sockets linger in an "orphan" state and there is no good reason for
servers to behave like you describe, and is usually the default mode of
operation.

It is possible your web server OS is not properly tuned for the load it is
receiving, possibly due to misguided attempts to strengthen it from DoS
situations. If it is a Linux server look into

    - tcp_max_orphans
    - tcp_fin_timeout
    - and maybe other related parameters

neither of these should be set too low, and certainly never ever lower
than their default values which is very aggressive to begin with. But
don't overdo it either...

Also look into any "close linger" parameters in your web server software
settings.

Regards
Henrik
Received on Thu Jan 20 2005 - 15:46:38 MST

This archive was generated by hypermail pre-2.1.9 : Mon Mar 07 2005 - 12:59:36 MST