Re: [squid-users] What is decent/good squid performance and architecture

From: Jos Houtman <[email protected]>
Date: Wed, 06 Jul 2005 17:01:16 +0200

Robert Borkowski wrote:

> Chris Robertson wrote:
>
>>> -----Original Message-----
>>> From: jos houtman [mailto:jos@hyves.nl]
>>> Sent: Saturday, July 02, 2005 3:08 PM
>>> To: squid-users@squid-cache.org
>>> Subject: [squid-users] What is decent/good squid performance and
>>> architecture
>>>
>>>
>>> hello list,
>>>
>>> Iam running a website and have setup 3 squidservers as reverse
>>> proxy's to handle the images on the website.
>>> And before I try to tweak even more i am wondering what is
>>> considered good performance in requests/min.
>>>
>>> some basic stats to get an idea:
>>> - only images files are servers
>>> - avarage size 40KB
>>> - possible number of files somewhere between 10 and 15 million (and
>>> growing).
>>> - the variaty of files thats accessed? ...
>>> I got these stats from a squid servers thats running for 2/3 days now.
>>> Internal Data Structures:
>>> 2024476 StoreEntries
>>> 146737 StoreEntries with MemObjects
>>> 146721 Hot Object Cache Items
>>> 2000067 on-disk objects
>>>
>>> Is it safe to assume that the number of images actually accessed is
>>> about 2million?
>>>
>>
>>
>> That is a fairly safe assumption (give or take a few thousand). I
>> love this list. Some of the service requirements just make me gawk.
>> 10-15 million images...
>>
>>
>>> on our dual xeon with 4GB ram sata disk servers i can get about 250
>>> hits/seconds
>>> on our dual xeon 8 GB scsi server i can get about 550 hits/seconds
>>> are these decent numbers?
>>
>
> 550 hits/second * 40KB average object size * 3 squids = 515 Mbps
> Make sure you have enough upstream bandwidth before worrying about
> further performance. Even at 250 hits/second you'd be close to
> saturating 100BaseT on each squid box (If that's what you're using).

currently we are doing about 15Mb/s upstream on a 100Mb line.
Upgrading this to 1Gbit if we need it won't be problem.

there are 2 reasons that i want to have each squid server perform optimal:
- failover, i want to be able to run on 1/2 squidservers
- the growthfactor is large (member number went times 10, and the
pageviews times 3 in the last 6 months)

 
guess i overestemated the avarage file size, a second look learns me that
the images that you see by far most on the website (thumbnails) rang
from 0.5KB to 2 KB,
the next size is 40Kb, which is acessed alot less.

so i guess the avarage filesize of the files served taking into account
there frequency. would be more towards 10KB maybe even less.

>> Given just the information above (and assuming that the OS and number
>> of cache disks are the same between servers), I would guess that it's
>> just a function of memory and disk speed (more objects cached in RAM,
>> faster access to those not cached).
>>
>> In any case,
>> http://www.squid-cache.org/mail-archive/squid-users/200505/0974.html
>> is an example of 700 hits per second. No hardware specifics in the
>> email. There is a patch for squid to use epoll on linux that at
>> least one person had a good experience with
>> http://www.squid-cache.org/mail-archive/squid-users/200504/0422.html.
>>
>> Here's an email from Kinkie (one of the Squid Devs if I'm not
>> mistaken) describing 500 hits/sec on a Pentium IV 3.2GHz w/2GB RAM as
>> "not really too bad." He also has a HowTo set up describing running
>> multiple instances of Squid on a single box:
>> http://squidwiki.kinkie.it/squidwiki/MultipleInstances. If you are
>> running out of CPU on one processor (Squid doesn't take full
>> advantage of Multi-CPU installations), this might be something to
>> look into.
>>
Thanks i will look into epoll when i find the time,
running 2 squid servers, i dont know i dont really want the split the
memory used for caching.

>> One method would be to set the cache servers up as cache-peers using
>> the proxy-only option. The message at
>> http://www.squid-cache.org/mail-archive/squid-users/200506/0175.html
>> is all about clustering squids for internet caching, but it does
>> imply that ICP peering should work just fine up to 8 servers. If you
>> want to limit what each squid caches based on hierarchy, a
>> combination of urlpath_regex acls and the no_cache directive are
>> capable. No promises on what that will do to performance. For more
>> explicit suggestions it would help to know how your caches are set up
>> currently (separate IPs w/RR DNS? Using a HW load balancer?
>> Software cluster?).
>
Our setup is as follows,
a LVS loadbalancer with a weighted round robin schedular.
behind that are the 3 squid servers (x.x.x.124).
If it is a cache hit then the request will go the webservers (x.x.x.125
behind the same loadbalancer)
because they most problaby need to render a new image size from the
original.

the webservers will get the image from a NAS, which has a dir structure
like this.
MEDIA (originals)
MEDIA/1-50000
MEDIA/50001-100000
......
MEDIARENDERED (rendered)
MEDIARENDERED/1-50000
....
MEDIARENDERED/950001-1000000

>>
>
> Another method would be CARP. I haven't used it myself, but it's used
> to split the load between peers based on URL. Basically a hash based
> load balancing algorithm.
>
<cut from carp manual>

     When the hosts receive an ARP request for 192.168.1.10, they both select
     one of the virtual hosts based on the source IP address in the request.
     The host which is master of that virtual host will reply to the request,
     the other will ignore it.

</carp>
i think that loadbalancing is based on source ip, instead of url.
so carp wouldnt be an option.

> If you have a load balancer with packet inspection capabilities you
> can also direct traffic that way. On F5 BigIPs the facility is called
> iRules. I'm pretty sure NetScaler can do that too.
>
That is the kinda solution iam looking for, but then without the cost we
are pretty new company without the money to buy expensive solutions. so
we prefer open source solutions.

another point:
what is your experience with ext2/3 reiserfs?
our ext3 partitions tend to get corrupted, when used for squid caches or
simular purposes.
i tend to change things to reiserfs entirely, but its just a guess.
does anyone have the same experience?
Received on Wed Jul 06 2005 - 09:00:22 MDT

This archive was generated by hypermail pre-2.1.9 : Mon Aug 01 2005 - 12:00:02 MDT