Re: [squid-users] Performance problems - need some advice from john allspaw on 2006-02-08 (squid-users)

From: john allspaw <[email protected]>
Date: Wed, 8 Feb 2006 06:47:37 -0800 (PST)

Jeremy -

I suspect that your setup should work quite fine. We're running a similar hardware (6 disk scsi, 4gb RAM) and linux with aufs and ext2/noatime.

There's a couple of things that I would look at, and Kinkie has brought to light some of them already.
First is, take a look at cachemgr, and start with /menu. These things might seem like no-brainers, but they are worth checking:

- do you have any large or complicated ACLs ?
- how much time are you spending doing DNS lookups ?
- aufs on linux (and ext2) is the way to go, I wouldn't spend any other time, except maybe the epoll patch.
- how big is your cache_mem ? You have some fast disks there, and it could be that some of the problem could be having too much disk cache, which could mean too many objects to hold in cache_dirs while still setting aside memory for the Hot Object cache. Don't be afraid to reduce the size of your cache_dirs, if it means better performance (which it can)
- look in /storedir in cache mgr. what is your LRU reference age on each of your spindles ? this will help you see how old your cache is before it starts to evict objects. if the age is less than you think it should be, then maybe you're spending too much time evicting objects, and not enough time putting/getting objects
- look in /refresh in cache mgr. how much effort are you making in validating, and for what reason ?
- take a look at your working set. check out your refresh_pattern directive. are you keeping things fresh long enough ? in fact, should they ever go stale ? if you're only caching WMV files, how often would one be changed ? if you're like us, (we serve photos, and a lot of them) then the actual photo doesn't ever change, so there is no need to think of it as stale, ever. it's either in the cache, or it's not.

think of your origin (apache) servers as the persistent store, and squid as the mechanism to smooth out the peak demands for objects, like a standing wave.
I would suggest highly NOT to think of Apache versus Squid, and I won't comment on whether it's faster or not, because in each case, it can be quite different, depending on the working set.

For my work, if we were to try and serve our content directly via apache/tux/mathopd/thttpd, we would fall over. Our contect is about as cacheable as you can get, and squid gets us serving about 500 images/sec without too much effort at all.

Oh, one other thing I would ask/suggest: how are you balancing these two caches with your Foundry ? round-robin ? if your persistent store is growing that large, then you could be doubly-caching things too much, and at some point your efficiency drops. Does your particular LB have any layer7 capabilities ? Nothing beats some URL Hash balancing, when you can get hit ratios >90% :)

good luck,
--john

On Tue, 2006-02-07 at 16:29 -0800, Jeremy Utley wrote:
> On 2/7/06, Kinkie <kinkie-squid@kinkie.it> wrote:
> > On Tue, 2006-02-07 at 12:49 -0800, Jeremy Utley wrote:
> > > On 2/7/06, Kinkie <kinkie-squid@kinkie.it> wrote:
> > >
> > > > Profiling your server would be the first step.
> > > > How does it spend its CPU time? Within the kernel? Within the squid
> > > > process? In iowait? What's the number of open filedescriptors in Squid
> > > > (you can gather that from the cachemgr)? And what about disk load? How
> > > > much RAM does the server have, how much of it is used by squid?
> > >
> > > I was monitoring the servers as we brought them online last night in
> > > most respects - I wasn't monitoring file descriptor usage, but I do
> > > have squid patched to support more than the standard number of file
> > > descriptors, and am using the ulimit command according to the FAQ.
> >
> > That can be a bottleneck if you're building up a SYN backlog. Possible
> > but relatively unlikely.
> >
> > > When I was monitoring, squid was still building it's cache, and squid
> > > was using most of the system memory at that time. It seems our major
> > > bottleneck is in Disk I/O - if squid can fulfill a request out of
> > > memory, everything is fine, but if it has to go to the disk cache,
> > > performance suffers.
> >
> > That can be expected to a degree. So are you seeing lots of IOWait in
> > the system stats?
>
> During our last test run, the machines were running at around 30-50%
> in iowait time, according to iostat.

Which means lots of disk activity.
You might try to squeeze even more performance out of your disks by
using even more cache_dirs.

> > > Right now, we have 5 18GB SCSI disks placing our
> > > cache, 2 of those are on the primary SCSI controller with the OS disk,
> > > the other 3 on the secondary.
> >
> > How are the cache disks arranged? RAID? No RAID (aka JBOD)?
>
> Right now, no raid is involved at all. Each cache disk has a single
> partition on it, occupying the entire disk, and each partition is
> mounted to a separate directory:
>
> /dev/sdb1 -> /cache1
> /dev/sdc1 -> /cache2
> /dev/sdd1 -> /cache3
> /dev/sde1 -> /cache4
> /dev/sdf1 -> /cache5

Excellent.

> Each one has it's own cache_dir line in the squid.conf file.

You might want to double them: each cache_dir has its own server thread
AFAIK. Your high iowait stats mean that the threads get blocked while
waiting for i/o. Having more worker threads might mean higher
parallelism and less iowait.

So:
cache_dir aufs /cache1/a <blah>
cache_dir aufs /cache1/b <blah>
cache_dir aufs /cache2/a <blah>
etc etc etc.

> > > Could there perhaps be better
> > > performance with one larger disk on one controller with the OS disk,
> > > and another larger disk on the secondary controller?
> >
> > No, in general more spindles are good because they can perform in
> > parallel. What kind of cache_dir system are you using? aufs? diskd?
>
> Our initial testing last night used the normal ufs - we just switched
> over to aufs (posts I found on the squid ML said this would be better
> for Linux systems), and it had a very much noticeable improvement in
> performance.

Definitely. With ufs squid would be spending all that iowait time
blocked.

> > > We're also
> > > probably a little low on RAM in the machines - each of the 2 current
> > > squid servers have 2GB of ram installed.
> >
> > I assume that you're serving much more content than that, right?
>
> Of course. Our total content being served by this cluster right now
> is close to 200GB total right now, and expected to grow farther.
>
> >
> > > Right now, we have 4 Apache servers in a cluster, and these machines
> > > currently max out at about 300Mb/s. Our hope is to utilize squid to
> > > push this up to about 500Mb/s, if possible. Has anyone out there ever
> > > gotten a squid server to push that kind of traffic? Again, the files
> > > served from these servers range from a few hundred KB to around 4MB in
> > > size.
> >
> > In raw terms, Apache should outperform Squid due to more specific OS
> > support. Squid outperforms Apache in flexibility, manageability and by
> > offering more control over the server and what the clients can and
> > cannot do.
>
> This seems surprising to me, honestly. If Squid, utilizing it's
> caching ability, can't push out data faster than Apache, then it seems
> there wouldn't be any reason to use it as an http accelerator like
> this. Maybe there's something I'm missing in your statement,

Well, both the squid cache and the raw data served reside on disk, so
there's really no big reason why squid should be any faster than Apache
if that's the bottleneck ;)
Squid can help with the hot object cache in RAM, but that cache can
cover maybe 0.5% of your total served content, which is not much
anyways. Paradoxically it might help you to DECREASE your cache_dir
sizes, in order to make it so that squid keeps warmer data on-disk (and
in RAM) and goes to the backend server to fetch less-popular contents.
In other words, to decrease your cached objects lifetime.

Regarding the performance comparison between squid and Apache it's a sad
truth: squid interacts with the OS via a big poll(3) array, while Apache
uses wake-up-one and sendfile(1) to do its i/o. This means fewer context
switches in and out of the kernel to perform i/o, and less table passing
and scanning in squid and in the kernel. Also due to its multiprocess
approach Apache can use better multiprocessor systems.

> > Please keep the discussion on the mailing-list. It helps get more ideas
> > and also it can provide valuable feedback for others who might be
> > interested in the same topics.
>
> I never intended to take the discussion off-list, but when I hit
> reply, it went to you instead of the list :(

No harm done

Kinkie
Received on Wed Feb 08 2006 - 07:47:43 MST

This archive was generated by hypermail pre-2.1.9 : Wed Mar 01 2006 - 12:00:03 MST