commBind/commResetFD problems in Solaris 2.5.1

From: <[email protected]>
Date: Wed, 14 Jul 1999 14:14:32 +0200 (MET DST)

Hi!

I have a machine running Solaris 2.5.1, with squid-2.2 on it.
I have used 1.1.x, 2.0patch2, 2.1patch2, 2.2stable2, 2.2stable3,
and currently 2.2stable4 (in this order).
When I first started to used 2.2, after a few days I spotted the
following error messages in cache.log:

commBind: Cannot bind socket FD 100 to x.x.x.x:0: (22) Invalid argument

Actually, there were thousands of these messages. Usually, ten or so
with the same FD number, with a few seconds spacing. Sometimes
I personally got an error message in my browser.
The error messages seem to be clustered: sometimes half a day passes
without one, some other time there are plenty of them.

The I started to check the source. The log message comes from commBind
when squid tries to bind the socket before doing a connect.
This is because I'm using tcp_outgoing_address to set the source IP
address of all outgoing connection.

Before 2.2, only comm_open calls commBind, when a new outgoing connection
is initiated. However, since 2.2, commBind is also called from commResetFD.
After staring at this function for a few minutes, I realized how it
works. Basically, when a connect fails, squid wants to re-connect
using the same FD. This is not supported by the BSD socket API by default.
Therefore squid needs to "reset" the FD so that it can connect again.
To do that, squid first allocates a new FD, then dup2's that FD to
the old (this, in theory, should atomically close the old FD and
make the new FD available under the old FD number), and then closes
the new FD number, leaving a fresh, unconnected socket under the old
FD number. Squid then tries to bind this fresh FD to the address specified.

Shortly:

new = socket()
dup2(new, old)
close(new)
bind(old);

I am not sure if this is a defined behaviour of the BSD socket API or
not, nevertheless, the buggy socket emulation code in Solaris 2.5.1
does not correctly implement this (as it turned out).
After playing with this code a little bit, I have found a rearrangement
of the code that makes the error messages disappear:

new = socket()
bind(new)
dup2(new, old)
close(new)

Theoretically, this should not have a different outcome, however, for
Solaris 2.5.1 it is different. Since this code change solves my problem
(and possibly several other users of Solaris 2.5.1), and the change is
not substantial, I would like to have this little modification included
in the mainstream squid (for the benefit of Solaris 2.5.1 users).

Also, if someone knows why this change solves the problem, please
let me know.

Laszlo Valko
e-mail: valko@linux.karinthy.hu

My patch:

diff -rc squid-2.2.STABLE4/src/comm.c squid-2.2.STABLE4-mod/src/comm.c
*** squid-2.2.STABLE4/src/comm.c Tue Apr 20 19:55:03 1999
--- squid-2.2.STABLE4-mod/src/comm.c Wed Jul 14 11:56:23 1999
***************
*** 311,318 ****
--- 311,326 ----
          fdAdjustReserved();
          return 0;
      }
+ if (Config.Addrs.tcp_outgoing.s_addr != no_addr.s_addr) {
+ if (commBind(fd2, Config.Addrs.tcp_outgoing, 0) != COMM_OK) {
+ close(fd2);
+ fdAdjustReserved();
+ return 0;
+ }
+ }
      if (dup2(fd2, cs->fd) < 0) {
          debug(5, 0) ("commResetFD: dup2: %s\n", xstrerror());
+ close(fd2);
          fdAdjustReserved();
          return 0;
      }
***************
*** 323,333 ****
       * the original socket
       */
      commSetCloseOnExec(cs->fd);
- if (Config.Addrs.tcp_outgoing.s_addr != no_addr.s_addr) {
- if (commBind(cs->fd, Config.Addrs.tcp_outgoing, 0) != COMM_OK) {
- return 0;
- }
- }
      commSetNonBlocking(cs->fd);
  #ifdef TCP_NODELAY
      commSetTcpNoDelay(cs->fd);
--- 331,336 ----
Received on Wed Jul 14 1999 - 06:09:09 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:47:23 MST