[SQU] squid as archiver, and why are some files not cached?

From: Gerald Oskoboiny <[email protected]>
Date: Wed, 30 Aug 2000 21:09:04 -0400

Hi,

I installed Squid on my home machine and wrote a Perl script to
copy files out of Squid's cache into a persistent archive, so I
have a permanent copy of any files I browse. Details/code:

    http://impressive.net/people/gerald/1999/01/http-archive/

Unfortunately, this doesn't quite capture everything because
Squid (understandably) doesn't cache certain dynamic resources.

However, it misses a lot of resources that I think it really
ought to be caching, and I can't figure out why.

I checked the FAQ, particularly:

    12.23 How come some objects do not get cached?
    http://www.squid-cache.org/Doc/FAQ/FAQ-12.html#ss12.23

and read through the squid.conf file fairly carefully, and didn't
see any why this might be happening. I'm basically using the
default configuration shipped with the current squid package on
Debian Linux (woody); my config file is here:

    http://impressive.net/people/gerald/2000/08/30/squid.conf

Here are some entries from my store.log that I think should be
cached but aren't:

    967678174.962 RELEASE FFFFFFFF 304 967678174 -1
      967764574 unknown -1/0 GET http://larve.net/stylesheets/base

    967681619.244 RELEASE FFFFFFFF 304 967681607 -1 -1 unknown -1/0
      GET http://www.photo.net/photo/pcd0196/railroad-tracks-35.3.jpg

the only other entry matching that last resource in my squid logs
is in my access.log:

    967681619.244 149 127.0.0.1 TCP_MISS/304 225
      GET http://www.photo.net/photo/pcd0196/railroad-tracks-35.3.jpg
      - DIRECT/www.photo.net -

most URIs are cached fine; samples:

    967681816.192 SWAPOUT 00000043 200 967681815 966465784 967768216
      text/html 4468/4468 GET http://www.squid-cache.org/
    967681816.334 SWAPOUT 00000044 200 967681816 935384563 967768216
      image/gif 493/493 GET
      http://www.squid-cache.org/Icons/cache_now.gif

Checking the HTTP headers for one of these resources, I don't see
what is causing it not to be cached:

    devo: gerald> telnet larve.net 80
    Trying 18.29.5.151...
    Connected to flo.larve.net.
    Escape character is '^]'.
    GET /stylesheets/base HTTP/1.0
    Host: larve.net

    HTTP/1.1 200 OK
    Date: Thu, 31 Aug 2000 00:53:48 GMT
    Opt: "http://www.w3.org/2000/P3Pv1";ns=11
    Content-Length: 86
    Content-Location: http://larve.net/stylesheets/base.css
    Content-Type: text/css
    Etag: "ca4ftc:rob1u6o8"
    Expires: Fri, 01 Sep 2000 00:53:48 GMT
    Last-Modified: Thu, 10 Aug 2000 21:46:17 GMT
    Server: Jigsaw/2.1-20000814 jre/1.2.2_006 javacomp/1.2.15
    11-PolicyRef: /2000/08/p3p-policyref

And this happens on many different kinds of servers, including
Apache, NaviServer, and Jigsaw.

Can anyone see something I'm missing? Or is this due to a bug?

(also, any suggestions for something else to use as a proxy/archiver,
or configuration file tweaks I should make would be very welcome.)

thanks,

-- 
Gerald Oskoboiny <gerald@impressive.net>
http://impressive.net/people/gerald/
--
To unsubscribe, see http://www.squid-cache.org/mailing-lists.html
Received on Wed Aug 30 2000 - 19:11:30 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:55:08 MST