[squid-users] Disk full problem in cache_dir

From: Daniel Baldoni <[email protected]>
Date: Sat, 09 Feb 2002 21:56:15 +0800

G'day folks,

Recently, we installed three new Squid (2.4STABLE3) servers for a client,
all running Solaris 8 on Ultra 10s. Each machine has 1GB of RAM with the
cache_dirs sitting on external SCSI drives.

One of the machines has started complaining about one of the filesystems
being full. Here's a snippet from cache.log:

> 2002/02/09 00:17:48| Store rebuilding is 17.7% complete
> 2002/02/09 00:17:54| comm_accept: FD 15: (130) Software caused connection
> abort
> 2002/02/09 00:17:54| httpAccept: FD 15: accept failure: (130) Software
> caused connection abort
> 2002/02/09 00:18:42| Store rebuilding is 18.0% complete
> 2002/02/09 00:19:34| diskHandleWrite: FD 13: disk write error: (28) No
> space left on device
> FATAL: Write failure -- check your disk space and cache.log
> Squid Cache (Version 2.4.STABLE3): Terminated abnormally.
> CPU Usage: 138.000 seconds = 49.400 user + 88.600 sys
> Maximum Resident Size: 0 KB
> Page faults with physical i/o: 205567

Similarly, here's the relevant log entry from /var/adm/messages:

> Feb 9 21:32:09 <HOST> ufs: [ID 845546 kern.notice] NOTICE: alloc:
> /var/spool/cache3: file system full
> Feb 9 21:32:09 <HOST> squid[20504]: [ID 702911 user.alert] Write failure
> -- check your disk space and cache.log

But, a df on the relevant file systems shows usage space at about 85%:
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c1t1d0s2 15683444 315964 15210646 3% /var/spool/cache1
/dev/dsk/c1t2d0s2 15683444 12037584 3489026 78% /var/spool/cache2
/dev/dsk/c1t3d0s2 15683444 13022361 2494129 84% /var/spool/cache3

Actually, the above df output is after restarting squid with a "clean"
swap.state file to keep it going. This action keeps the machine running for
about a week (or until another cache_dir fills). The swap.state file on a
"full" cache_dir site at about 52MB.

Whenever the failure occurs, the cache_dir in question (it could be any of
them) is at about 84%-85% capacity. I don't know of any relevance that may
have.

The cache_mem setting is 256MB and the three cache_dirs are all defined as:
        cache_dir ufs /var/spool/cache1 14000 128 256

I don't believe it's a limitation in the inodes either as a 'df -e' shows
plenty of free inodes. If it was a problem with the "spread" counters (128
and 256) we've used in defining the cache_dirs, I'd expect squid to complain
but not the kernel.

We were using aufs - but I made the change to ufs last night in a last-ditch
effort to keep things running until I could track the problem.

The above problem is only happening on one of the three systems.
Unfortunately, it happens to be the heaviest of the three - and is running
Solaris 8 7/01 (the other two are running 10/01).

So, has anybody seen the above problem? Better yet, cam anybody point me at
a solution? Any help would be much appreciated.

By the way, does anybody know how to reduce the incidence of the "httpAccept"
and "conn_accept" pair of errors? Not critical - just annoying.

As always, thanks for your time. Ciao.

-- 
-------------------------------------------------------+---------------------
Daniel Baldoni BAppSc, PGradDipCompSci                 |  Technical Director
require 'std/disclaimer.pl'                            |  LcdS Pty. Ltd.
-------------------------------------------------------+  856B Canning Hwy
Phone/FAX:  +61-8-9364-8171                            |  Applecross
Mobile:     041-888-9794                               |  WA 6153
URL:        http://www.lcds.com.au/                    |  Australia
-------------------------------------------------------+---------------------
"Any time there's something so ridiculous that no rational systems programmer
 would even consider trying it, they send for me."; paraphrased from "King Of
 The Murgos" by David Eddings.  (I'm not good, just crazy)
Received on Sat Feb 09 2002 - 06:52:08 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:06:13 MST