Re: [squid-users] Tuning Squid for large user base

From: Henrik Nordstrom <[email protected]>
Date: Sun, 7 Mar 2004 10:23:00 +0100 (CET)

On Sat, 6 Mar 2004, James MacLean wrote:

>
> This is _definitely_ the case. We have a rate limited (QoS via CISCO) 6MBs
> link. It's on a 100Mb/s ethernet circuit. It runs _full_ all day. Hence
> the idea to apply Squid. We are using squid now at over 300 educational
> sites and have had great success with it.

Is the above 6 Mbit/s or 6 MByte/s?

6 Mbit/s is not very much for a Squid to handle in terms of traffic
volume. 6 MByte/s is a different story.

The number of concurrent connections is a significant issue. Due to the
way the poll/select loops in Squid is designed the performance drops
rapidly due to increased CPU usage when the number of filedescriptors
increase so you want to keep this at a minimum. Even building Squid with
support for very many filedescriptors may be counter-productive.

Use of "half_closed_clients off", "quick_abort_min 0 KB" and
"quick_abort_max 0 KB" recommended in such situations. If extreme then
disabling the use of persistent connections also help.

> for there web pages is quicker than everyone being proxied by squid. This
> was the first time we had actually seen Squid act this way and obviously
> have been trying to pick out what we have been doing wrong. Some slight
> delay because of the proxy is fine, but as you watch, the traffic to the
> Internet drops and client response time jumps :(.

You need to determine why this happens. There is a couple of different
cenarios requiring different actions.

Things you need to monitor are

 * CPU usage
 * Number of active filedescriptors
 * vmstat activity
 * cache.log messages

> Interesting about the maximum_object_size. I had that large, but thought
> smaller would be better and just not cache large objects... They are long
> time transactions and the client expects them to take longer :). What is
> _really_ the benefit of a larger number here?

If you have a lot of cache space then there is only benefits of allowing
for large objects to be cached.

If your cache spacce is limited then large objects may take up a
unproportionally large amount of the cache reducing the over all hit
ratio.

> . Squid slows way down when it's upstream request pipe is full, or
> . There is a certain number of open FD's that when we go beyond it, Squid
> start to stall?

Both apply. Often together to make things even worse..

 * pipe is full, causing lag on fetching objects
 * the lag causing more and more clients to queue up increasing the
filedescriptor usage
 * the increased filedescriptor usage increases CPU usage, and when
reaching 100% squid starts to lag due to short of CPU time
 * the increased filedescriptor usage may also make Squid or your system
run short of filedescriptors, forcing Squid to stop accept new requests
for a while further increasing the lag. This condition is always logged in
cache.log should it occur.

The same symptoms can also be seen due to swap activity. If your system
for some reason runs short on memory it will start to swap, and Squid is
hit very badly from swap activity.

> I am not doing any redirecting... Or shouldn't be. That line was there,
> but I understood it had not effect if you had no redirectors... Arg. Is
> that not the case? Maybe that's causing us havoc :(?

The directive is only relevant if you have specified a redirector_program.

> Are the 3 cache_dir's per box on different channels... for speed?

Squid does not use very much bandwidth to the drives so multiple channels
rarely make much difference.

What Squid uses mostly for the cache is seek time, so it is important
each drive can operate independently. What this means is that certain IDE
configurations where only one drive per channel can be processing commands
at a time is not suitable for Squid.

> Does this suggest you are servicing 7835 clients and need no more than 946
> FDs? 'Cause that looks like the opposite of mine. For example, the last
> time we tested it, I see :
>
> Number of clients accessing cache: 938
> Number of HTTP requests received: 450489
> Maximum number of file descriptors: 32768
> Largest file desc currently in use: 2358

It is not very easy to give a relation between these two figures. "Number
of clients" is number of unique IP addresses seen by your Squid since
startup and does not really tell how many clients you have accessing your
Squid right now.

The filedescriptor usage on the other hand represents what is going on
right now and is not impacted by history.

But the filedescriptor usage does look a bit high even if assuming all
those clients are very active right now.

Regards
Henrik
Received on Sun Mar 07 2004 - 02:23:09 MST

This archive was generated by hypermail pre-2.1.9 : Thu Apr 01 2004 - 12:00:01 MST