Re: Raw vs. cooked file systems for cache?

From: Henrik Nordstrom <[email protected]>
Date: Thu, 07 Jan 1999 02:19:36 +0100

Williams Jon wrote:
>
> I've been doing some reading on caching proxies, and most people state that
> you'll get better performance if you don't rely on the OS for your file
> system management and instead use raw filesystem space for storing the
> cache.

This very much depends on the OS used and how it is used. The way Squid
currently uses a filesystem to store a cache is far from optimal and can
be improved a lot without switching to a raw filesystem.

Designing a efficient store is far from obvious. The main reasons why
Squid uses a filesystem store is:
1. Simplicity. It is very easy to work with a filesystem.
2. Recoverability. The OS usually has good measures to keep the
filesystem intact if the software fails (core dumps and other nasty
things). It also has well tested routines for recovering a filesystem
after a system failure (loss of power, kernel panic or similar).
3. Not much tuning of the code needed for I/O performance. In theory the
OS manages all buffers and similar things needed to get a good I/O
performance.

The reasons why one may consider a custom filesystem store is:
1. I/O performance. Keeping a filesystem has some unneeded I/O overhead
for inodes, directories, ....
2. Not much tuning of the operating system needed for I/O performance.
In theory most of the buffer management and I/O planning issues are the
responsibility of the custom file system.

There are a lot to do on improving Squids I/O patterns, both in Squid
code and tuning of the operating system.

OS tunings:
* no-atime mount option (if available)
* VM tuning to avoid swapping Squid out to disk.
* VM tuning to keep a large enought free list (the vm freelist should
NEVER reach minfree).
* Buffer tuning to have the right filesystem structures cached in
memory.

Squid issues:
* Determine if it is possible to bring order out of caos somehow. Linear
I/O has a far greater performance than random I/O.
* How disk storage / files are maintained.
* VM usage.

> Has anyone played with this for squid that is not reselling it (i.e.
> don't consider their code to be proprietary or trade secret)?

People have talked about it, and some code has been presented.

My opinion is that there are other things of far higher importance than
a custom store "filesystem". Some things are listed below (in no
particular order):

* Do not waste bandwidth. Currently there are a number of different
requests that can get squid to spend a lot on bandwidth for nothing.
Some are configuration issues, but others are purely design / coding
issues.
* Proper management of persistent connections / retries.
* HTTP compliance. Get HTTP 1.1 up to speed. There are a lot to be
gained from having HTTP 1.1 deployed.
* Caching of partial objects
* Caching of IMS queries
* Give Squid some kind of structure. Today there are a lot of
interdependencies between the various source files, and close to no
documentation on how this beast works under the hood, making it a
nightmare to maintain and hard for independend developers to contribute
anything useful. The code are slowly getting better, but there is no
(documented) over all structure.
* A useful testsuite (other than having a handful of volunteers quickly
testing new releases).
* Security audit the code.
* Look into how Squid may be scaled to cope with very high I/O rates.
The async-io code is one attempt at this, but it is far from perfect.

---
Henrik Nordstrom
Spare time Squid hacker
Received on Wed Jan 06 1999 - 18:00:40 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:43:55 MST