Squid use/enhancement questions

From: Clifton Royston <[email protected]>
Date: Wed, 30 Jun 1999 13:51:42 -1000 (HST)

  Now that I've got this working, I've got several questions triggered
by thinking about how to deploy this practically:

1) Initial connection performance:

  I've got a couple people who use the Squid proxy non-transparent from
dial-up complaining that their first connection seems to take several
seconds. Any reason known for this? It certainly isn't affecting all
users - my initial connections seem very fast. I'll try to narrow this
down further, but I wondered if anyone else sees something similar.

2) Sharing transparent vs. non-transparent proxy cache data:

  Almost the first thing I realized, once I got it going, was that
sharing the server between transparent and non-transparent use was not
the win I thought.

  Because the cached data for non-transparent use is hashed and stored
under the URL based on its FQDN, e.g.

squid.nlanr.net/Squid/FAQ/index.html

and with transparent use the FQDN is not available, so it'll be stored
under the numeric IP, or the equivalent of (e.g.)

192.172.226.146/Squid/FAQ/index.html

the two will be stored as completely different files. So there ends up
being no benefit at all of sharing the server, right?

  That's led me to wonder - would it work out to add code to Squid to
store the non-transparent cached files under the numeric IP (which it
has to look up anyway, to query the server)? This would be less
efficient in space in cases where a given FQDN is load-balanced across
several replicated web servers with distinct numeric IP addresses, but
still more efficient than storing all of those *plus* the FQDN. Where
there's one IP address for a given FQDN, it's a significant win.

  Is anyone working on this as an option/hack? If code were to be
written for this as an option with an enable-flag, would the Squid
developers be willing to mainstream it?

3) Boosting response speed:

  Am I correct in understanding that a Squid server should correctly
"pass through" requests larger than its maximum object size to a parent
cache server? If so, would it provide a significant gain to run a
small cluster of front-end Squid servers with no cache disk and a
maximum cache object size of <= 13KB, running completely out of a large
cache (e.g. 800MB) in the Memory File System? These would have to talk
to a regular cache (Squid or other) with the usual large disk array, as
their parent.

  This was prompted by thinking about the claim that Squid can deliver
45Mb/s out of a non-disk configuration, together with the supposed
Zipf-distribution of web hits. It would seem like this cluster
configuration could deliver 100+Mb/s throughput on a moderate
percentage of hits, falling back to a mere 10+Mb/s when servers had to
hit the main disk-based Squid cache.

  Is there some obvious and fatal flaw with this two-level cache
design?

  -- Clifton

-- 
 Clifton Royston  --  LavaNet Systems Architect --  cliftonr@lava.net
        "An absolute monarch would be absolutely wise and good.  
           But no man is strong enough to have no interest.  
             Therefore the best king would be Pure Chance.  
              It is Pure Chance that rules the Universe; 
          therefore, and only therefore, life is good." - AC
Received on Wed Jun 30 1999 - 17:27:29 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:47:04 MST