Banner ad blocker ? Anyone with a good blocklist or better code

From: Armistead, Jason <[email protected]>
Date: Thu, 12 Mar 1998 17:36:00 -0500

Hi

I've been running a Perl redirector script, which with a careful
analysis of the Squid access.log file has been able to help me filter
out URLs which are nothing more than banner GIF files, and then to
redirect the request to a locally served & stored GIF image (I use
several different URLs, though I should probably eventually stick to
one, easily changed one for all add sites). This is a valuable upstream
bandwidth saver for us, and also has the advantage of keeping these
files out of the cache (disk /memory requirements are smaller and we get
a better hit ratio).

For a small ISP, this could be a good cost saver, and a chance to put
your favourite motd (Message Of The Day) or similar onto some well hit
sites (as long as the viewing public don't then try to go to the
underlying URLs in the HTML (which could also be redirected I guess - I
just never bothered with it).

So, I've had a look on the net for a definitive list of blocked ad
sites. I found http://internet.junkbuster.com, but a search for their
recommended phrases on AltaVista revealed only one site
http://www.teclata.es/junkbuster/english/blocklist.html (better than a
kick in the head I guess). The only other sites which helped was one
from http://www.markwelch.com/bannerad/ who appears to be a banner ad
consultant (I have yet to go through his HTML pages and extract all the
URLs for inclusion into the add breaker)

Sure, the advertisers will keep changing their host names and ad
directories, but with a bit of collective effort, we can keep it
up-to-date. Maybe someone with an external Internet site can post a ad
banner URL regexp list that we can add to using a form (as long as the
advertisers can't remove them !!!) ?

Anyone want to swap/share URLs or redirector scripts on this one ? Here
is my redirector code, in case anyone wants it. It's pretty primitive,
just a whole lot of pattern matching and IFs. Heck, I didn't even quote
the "." characters in the regexps !

Regards

Jason

------------------8<---- snip ----8<-----------------------------

local# more redir.pl

#!/usr/local/bin/perl
 $|=1;

 $base = "http://alpha.my.net/ico/";
 $no_ads = $base . "no_ads.gif\n";

 $no_adds_dn = $base . "no_adds_dn.gif\n";
 $no_adds_dblclick = $base . "no_adds_dblclick.gif\n";
 $no_av_left = $base . "no_av_left.gif\n";
 $no_av_right = $base . "no_av_right.gif\n";
 $no_lycos_ads = $base . "no_lycos_ads.gif\n";
 $no_focalink_ads = $base . "no_focalink_ads.gif\n";
 $no_infoseek_ads = $base . "no_infoseek_ads.gif\n";
 $no_aol_ads = $base . "no_aol_ads.gif\n";
 $no_aol_mini_ads = $base . "no_aol_mini_ads.gif\n";
 $no_infospace_adman = $base . "no_infospace_adman.gif\n";
 $no_infospace_ads = $no_ads;
 $no_yimg_ads = $no_ads;
 $no_yahoo_ads = $no_ads;
 $no_yahoo_promo = $base . "no_yahoo_promo.gif\n";

 while (<>) {

# Start of ad removal process

     m@dejanews.com/ads@ && do {$_ = $no_adds_dn; };
     m@http://ad.doubleclick.net/@o && do {

# AltaVista adds
# There is a left and right badge for the AltaVista title line

       if (m@altavista@o) {
          if (m@left@o) {$_ = $no_av_left; }
          elsif (m@right@o) {$_ = $no_av_right; }
          else {$_ = $no_adds_dblclick; }
       }

     };

# Lycos has its own ads server (for the moment anyhow)

     if (m@http://ads.lycos.com/ads@o) {$_ = $no_lycos_ads; }

# Focalink use adds on a number of servers, all with the same
SmartBanner
# CGI program

     if (m@focalink.com/SmartBanner@o) {$_ = $no_focalink_ads; }

# InfoSeek has its own ad directory too

     if (m@http://www.infoseek.com/ads@o) {$_ = $no_infoseek_ads; }

# AOL Netfind has an ad redirector of sorts

     if (m@http://ads.web.aol.com/@o) {

        if (m@/image/@o) {
           if (m@\?@o) {$_ = $no_aol_ads; }
           else {$_ = $no_aol_mini_ads; }
        }
        elsif (m@/content/@o) {$_ = $no_aol_ads; }
        else {$_ = $no_aol_ads; }
     }

     if (m@http://ads.infospace.com/adman@o) {$_ = $no_infospace_adman;
}
     if (m@http://ads.infospace.com/adredir@o) {$_ =
$no_infospace_adman; }

# Infospace use two servers called pic1 and pic2, but sometimes their IP
address

     if ((m@http://199.242.24@o) || (m@http://pic\d.infospace.com@o)) {
       if (m@/ads/@o) {$_ = $no_infospace_ads; }
     }

# Yimg ads have the format us.yimg.com/a or /adv

     if (m@http://us.yimg.com/a@o) {$_ = $no_yimg_ads; }

# Yahoo ads have yahoo.com/a or yahoo.com/adv/ URLs
# or with appropriate country suffix in domain name

     if
(m@\.yahoo\.(com|com\.au|ca|fe|de|no|se|co\.uk|com\.sg)/(a|adv)/@o)
       {$_ = $no_yahoo_ads; }

     if
(m@\.yahoo\.(com|com\.au|ca|fe|de|no|se|co\.uk|com\.sg)/promotions/@o)
       {$_ = $no_yahoo_promo; }

# End of ad removal process

     print;
 }
Received on Thu Mar 12 1998 - 14:48:15 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:39:21 MST