[squid-users] If-Modified-Since requests: optimizing squids behaviour

From: Stefan Kuech <[email protected]>
Date: Tue, 11 Nov 2003 12:52:38 +0100

Firstly, I would like you to excuse my English and my weak knowledge about
programming and squid's internal functions. I am hardly trying to understand
this all, although I suppose that I never will be a programmer. (So I post
it here instead of squid-dev mailing list.)

Perhaps, I discovered two effects that may help optimizing squid. During the
last few weeks I noticed, that web performance can be greatly improved.

1.) IMS requests delete last-modification-time in squid's object store

2.) Client's IMS requests always let squid contact the origin server

------------------------------------------------------------------------

1.) IMS requests delete last-modification-time in squid's object store

------------------------------------------------------------------------

My configuration:

Topology: [Client] -FastEthernet-> [Squid] -SDSL/1MBps-> [Internet]

Client: Windows XP prof., MSIE6, configured statically to use squid as proxy

Squid: squid-3.0-PRE3-20031022, SuSE Linux 8.0 (i386), Kernel 2.4.18-4GB

If the web server is configured well, it sends an Expires header in every
"200 OK" response. Most web servers do not, but, at least, many web servers
send a Last-Modified header. So squid can calculate the object's age. In
combination with a refresh_pattern rule squid now can decide, whether a
requested cached object should be considered as fresh or if squid has to
initiate an If-Modified-Since request to the web server.

In the following description and examples, the web server

 - never send an Expires header.

 - send a Last-Modified header in "200 OK" responses.

 - don't send neither Expires nor Last-Modified in any "304 Not Modified"
responses.

I noticed, that squid seems to "forget" the object's last modification time
under the following (not very special) circumstances:

When squid receives a client request for an object, that is already cached
but stale, it initiates an If-Modified-Since request to the web server. If
squid then receives a Not-Modified response from the web server, it updates
the timestamp values (DateHeader, Last-Modified, Expires) of the already
stored object. In all Not-Modified responses, which I have seen in my
network, there is no Last-Modified header. As a result, squid overwrites the
stored Last-Modified value by -1. That means that there is no Last-Modified
timestamp available anymore for that object. All upcoming client requests
for that object will result in squid initiating an If-Modified-Since request
to the web server, regardless of any refresh_pattern rules. This is, because
squid has no chance to calculate the object's age anymore.

In my Opinion, the time of last modification can safely be considered as
unchanged as long as the object itself has not been modified. So, why can't
squid simply keep the Last-Modified time value as long as the object is
unmodified? I am not a programmer, but I tried to tests this and modified
src/store.cc, lines 1518 and 1520 (Version squid-3.0-PRE3-20031022). Now
squid only updates his cached time values if new timestamps are available
within the "304 Not Modified" response:

    if (reply->expires > 0)

        entry->expires = reply->expires;

    if (reply->last_modified > 0)

        entry->lastmod = reply->last_modified;

But, as I am a newbie to squid code, I am not sure, what other (unwanted)
things can be caused by that modification and if that violates HTTP
standards.

------------------------------------------------------------------------

2.) Client's IMS requests always let squid contact the origin server

------------------------------------------------------------------------

Configuration: [Client/WinXP/MSIE6] -> [squid-3.0-PRE3] -> [WebServer]

The web server holds a file "test.gif", last modified 08:00h today.

The web server doesn't send Expires, but Last-Modified headers.

My squid proxy server has the following refresh_pattern rule:

refresh_pattern -i \.gif$ 0 50% 1440

08:20:

 - user inputs URL into MSIE

 - MSIE sees that the requested object is not cached locally

 - client requests "test.gif" (GET)

 - squid requests and receives the file from the web server

 - squid stores the object (DateHdr=08:20, Last-Modified=08:00,
Expires=-1=N/A)

 - client receives response (200 OK)

 - client saves object to local disc cache (with last-modified=08:00,
valid-until=unknown)

08:25:

 - user inputs URL into MSIE

 - MSIE sees that the requested object is cached locally, but no expiration
time is available

 - client requests "test.gif" (GET, If-Modified-Since: 08:00)

 - squid is HTTP compliant, so the IMS request has to be forwarded to the
web server

 - squid receives response (304 Not Modified) and serves the client

 - client receives response (304 Not Modified)

If the client's local disc cache would have been cleared after the first and
before the second request, squid would have been able to serve the object
directly from its cache without contacting the web server, because at 08:15
the object is 20 Minutes old and should be considered as fresh until 08:30
(see below). So, a normal GET request without an IMS header would not result
in squid contacting the origin web server.

Squid's calculation, until which time the object is "fresh":

   DateHeader + ((DateHeader - LastModified) * 50%)

= 08:20 + (( 08:20 - 08:00) * 0.5)

= 08:20 + (00:20 * 0,5)

= 08:20 + 00:10

= 08:30

As I have a fast (FastEthernet) connection between the client and squid and
a high latency connection (because it's internet) between squid and the web
server, I want to prevent squid from unnecessarily making connections for
some IMS requests to the web server. As a quick and very dirty solution (for
testing purposes) and the HTTP violation in my mind, I successfully tried
the following:

squid-3.0-PRE3-20031022/src/client_side_request.cc Line 552:

changed from

    request->ims = httpHeaderGetTime(req_hdr, HDR_IF_MODIFIED_SINCE);

to:

    request->ims = -1;

As a result, squid handles all IMS requests from clients as they where
normal requests without an If-Modified-Since header. The refresh_pattern
rule is consulted to see, if a request to the web server is necessary or if
the cached object should be sent to the client. Due to the dirty code
manipulation, squid's response never will be a "304 Not Modified". But
fortunately, the client can handle a "200 OK" response after an IMS request.

The main effects of this modification:

a) Every object that is requested by the client is transferred from squid to
client, regardless of IMS headers and if it has been modified or not.

b) If an object is "fresh" (regarding to refresh_pattern rules), squid never
asks the web server, if the object has been modified. In an environment with
very fast client-squid connection, this is exactly, what I want. But if the
connection between clients and squid is less performant, it may slow down
mostly all web traffic, because more data has to be sent from squid to
clients.

Unfortunately, due to my lack of programming experience, I am not able to
complete the following ideas. As I see, that my "request->ims=-1"-hack isn't
acceptable in any way, I suggest the following two new options for the
refresh_pattern tag:

ignore-ims: Ignores client's If-Modified-Since requests and let squid handle
them as normal requests. Doing this VIOLATES the HTTP standard.

(That would be a more selective approach as my dirty "request->ims=-1"-hack
described above.)

lastmod-into-expires: If the object does not have an Expires header, squid
will calculate it by applying the given percentage value to the object's
last modification time. In the following, squid will treat the resulting
value as if it were given by the origin web server. That means, the
calculated expiration time will be used in both, the object store and the
client side reply. Doing this VIOLATES the HTTP standard.

The new lastmod-into-expires option would have the additional effect, that
MSIE would receive the computed expiration time as a HTTP Expires header,
write all to the local disc cache and would never do IMS requests for this
object until the expiration date or max-age (derived from squid's
refresh_pattern) of the object is reached.

------------------------------------------------------------------------

As squid is in use by thousands of users around the world for many years, I
suppose, I am not the only one thinking about things like these. So firstly
I would like to know, what others are thinking? Am I in error? Do I have
surveyed something? Did I read the source code the wrong way? Or perhaps
there is another and better way to solve the problems, I described?
Received on Tue Nov 11 2003 - 04:52:42 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:21:13 MST