Solaris NIS/DNS evils

From: Chris Tilbury <[email protected]>
Date: Sat, 6 Jun 1998 14:43:20 +0100

Since this cropped up a while ago, with reference to nscd slowing things
down, I thought I'd post details of a problem we were seeing here and the
solution.

Our site cache is running on a Solaris 2.6 machine. We use NIS to distribute
authentication and local hosts information around and in common with our
multiuser systems, we run a slave NIS server on it to help the response of
NIS queries.

We were seeing very high name-ip lookup times (avg ~2sec) and ip->name
lookup times (avg ~8 sec), although there didn't seem to be that much of a
problem with response times for valid sites until the cache was being placed
under high load. Then, performance went down the toilet.

After some time, and a bit of detective work, we found the problem. On
Solaris 2.6, if you have a local NIS server running (ypserv) and you have
NIS in your /etc/nsswitch.conf hosts entry, then check the flags it is being
started with. The 2.6 ypstart script checks to see if there is a resolv.conf
file present when it starts ypserv. If there is, then it starts it with the
"-d" option.

This has the same effect as putting the "YP_INTERDOMAIN" key in the hosts
table - namely, that failed NIS host lookups are tried against the DNS by
the NIS server.

This is a bad thing(tm)! If NIS itself tries to resolve names using the DNS,
then the requests are serialised through the NIS server, creating a
bottleneck (This is the same basic problem that is seen with nscd). Thus,
one failing or slow lookup can, if you have NIS before DNS in the service
switch file (which is the most common setup), hold up every other lookup
taking place.

If you're running in this kind of setup, then you will want to make sure
that

1. ypserv doesn't start with the "-d" flag.
2. you don't have the YP_INTERDOMAIN key in the hosts table (find the "B=-b"
   line in the yp Makefile and change it to "B=")

We changed these here, and saw our average lookup times drop by up to an
order of magnitude (~150msec for name-ip queries and ~1.5sec for ip-name
queries, the latter still so high, I suspect, because more of these fail and
timeout since they are not made so often and the entries are frequently
non-existent anyway).

Cheers,

Chris

-- 
Chris Tilbury, UNIX Systems Administrator, IT Services, University of Warwick
EMAIL: cudch+s@csv.warwick.ac.uk PHONE: +44 1203 523365(V)/+44 1203 523267(F)
                            URL: http://www.warwick.ac.uk/staff/Chris.Tilbury
Received on Sat Jun 06 1998 - 06:44:39 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:40:38 MST