RE: [squid-users] Caching Web sites

From: <[email protected]>
Date: Mon, 10 Dec 2001 20:54:08 -0800

Or, the answer would be have a redirector write the redirected (cached) URLs
to a database, which can be queried for unique values by a script automating
wget. Or use the redirector to fork off wget to preemtively follow links.
Serves the same end. A script based redirector like Pyredir should be easy
enough to hack to do this.

Sean

-----Original Message-----
From: Andrew Reid [mailto:andrew.reid@plug.cx]
Sent: Sunday, December 09, 2001 9:05 PM
To: squid-users@squid-cache.org
Cc: mapunda@muchs.ac.tz; squid-users@squid-cache.org
Subject: Re: [squid-users] Caching Web sites

On Sun, Dec 09, 2001 at 08:54:19PM -0800, sean.upton@uniontrib.com wrote:

> Prime with wget -r and a cron job on a client. This is frequently
> recommended as the easiest way to do this.

But, you see, this isn't what he's after.

Perhaps with some log analysis, you could generate a list of
frequently used sites and cache them to the proxy server.

I'd guess that you could either implement some sort of redirection
(such as squidGuard) that redirects users to locally cached copies of
the data.

... but that's not the ideal way of doing it. It would be good if it
was cached to the Squid cache store. However, many objects would
probably expire (or worse still, never make it to the cache) before
they were able to be used.

The thing that is interesting about this issue in general is that if
the frequently used sites were cacheable, Squid would cache them
appropriately, giving quick access to the objects from its cache
store. I don't see the point in re-downloading objects that may not be
cacheable or are already cached.

Perhaps the answer is some sort of preemptive caching. For example,
when I pull up freshmeat.net, Squid should automatically start
downloading the images and banner ads before the client makes the
request. Perhaps it could even cache some of the pages that
freshmeat.net is directly linking to, n levels deep.

Reading "Web Caching"[1], I almost got the impression that Squid does
that already. I know that the book categorically points out individual
commercial cache products that do so, but I'm really sure about
squid. Perhaps hno could comment, as I'm not near any Squid code at
the present moment.

   - andrew

[1] Duane Wessels, 2001 (O'Reilly & Associates). It's a good book,
    too. Fantastic work, Duane.

-- 
Andrew J. Reid                    "Catapultam habeo. Nisi pecuniam omnem  
andrew.reid@plug.cx               mihi dabis, ad caput tuum saxum immane 
+61 401 946 813                   mittam"                                
Received on Mon Dec 10 2001 - 21:53:09 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:05:18 MST