Re: [squid-users] web filtering

From: ajm <[email protected]>
Date: Sat, 30 Sep 2006 11:37:21 -0500

On Sat, Sep 30, 2006 at 04:19:13PM +0200, Christoph Haas wrote:
> On Saturday 30 September 2006 05:11, Chuck Kollars wrote:
> > Our experience with web filtering is the differences
> > in tools are _completely_ swamped by the quality and
> > depth of the blacklists. (The reverse of course is
> > also true: lack of good blacklists will doom_any_
> > filtering tool.)
> >
> > We currently have over 500,000 (!) sites listed in
> > just the porn section of our blacklist. With quality
> > lists like these, any old tool will do a decent job.
>
> And large portions of those half million sites are probably already
> something different but porn sites or the domains were given up. I
> wouldn't judge the quality completely by the quantity.
>
> > Lots of folks need to get such lists reasonably and
> > regularly (quarterly?).
>
> Daily even.
>
> > Useful lists are far far too
> > large to be maintained by local staff. Probably what's
> > needed is a mechanism whereby everybody nationwide
> > contributes, some central site consolidates and
> > sanitizes, and then publishes the lists.
>
> I'd welcome such an effort. Some companies invest a lot of effort into URL
> categorisation - not just regarding porn sites. But they have several
> employees working full-time on that and run a kind of editor's office.
> For a free/open-source project you would need a lot of people and some
> mechanism (e.g. a web spider) that searches for further sites. And doing
> that job is boring. So compared to other free/open-source projects there
> is much less motivation to contribute constantly.
>
> > This would be a huge effort. It's not easily possible
> > even with lots of clever scripts and plenty of compute
> > power. We've already seen more than a handful of
> > "volunteers" swallowed up by similar efforts.
>
> I believe that the only blacklist that survived over the ages was
> http://urlblacklist.com/ - just that they are non-free now. I may be
> mistaken about its history though.
>
> There already exist DNS-based blacklists that are very effective for mail
> spam detection. Perhaps a DNS-based register where you can look up if a
> certain domain belongs to a certain category might help. Large
> installations like ISPs could mirror the DNS zone and private people could
> just use them. Perhaps even the Squid developers could support such a
> blacklist.
>
> So IMHO we lack both a source (volunteers, spider, web based contribution
> system) and a good way to use it. Huge static ACLs don't work well with
> Squid.
>
> Since I had to tell our managers at work on how well URL filtering works
> (we use a commercial solution) I pulled some numbers. And around 3,000
> domains are registered at the DeNIC (german domain registry) alone every
> day. Now try that with other registries and get a rough number on many
> domains need to get categorized every day. That's the reason why it's so
> hard to create reasonable blacklists. (And also the cause for my rants
> where people expect decent filtering by just using the current publicly
> available blacklists).
>
> You didn't tell much about your intentions though. :)
>
> Kindly
> Christoph

As was already stated above...sites come and go daily. It would very
difficult to keep any current list available. Maybe updating on a
daily basis and even then you would be behind. Perhaps in addition to
squid, you could place other filtering software to check the web page
language. I use Squid + Dansguardian for my children. You could look
for something like it. I only block know URLs that are less likely to
change over night...example, playboy. It is not a perfect solution,
but it helps keep my URL list down. Wish you well.

-- 
Alex
FreeBSD 6.0-RELEASE i386 GENERIC
Received on Sat Sep 30 2006 - 10:35:26 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Oct 01 2006 - 12:00:04 MDT