MVPS HOSTS file 9/2/09

  • Thread starter Thread starter Anonymous Bob
  • Start date Start date
Anonymous said:

I don't use pre-compiled hosts lists. However, at this high a count for
entries in a *text* file (which isn't cached and isn't a database file
to provide, for example, for quicker searches through a binary tree),
I'm wondering how the use of such a huge list will impact the
performance of the web browser.

I remember looking at the MVP hosts file a couple years ago. Back then
there were 52 entries for DoubleClick alone. The hosts file is just
that: it lists hosts. It doesn't list domains so each blocked *host*
must be specified (i.e., host.domain.tld, not domain.tld). Since anyone
can rename their own host in their own nameserver, and since some
nameservers are configured to return the same IP address for a
particular host no matter what hostname is specified, it seems an
ever-changing or untargetable method of identifying unwanted hosts.

Not the above URL does not show you to a download link for the MVP
version of the hosts file (other entities also have pre-compiled hosts
lists where you relinquish control to someone else who has deemed a site
as something bad). The download is there but somewhat hidden. Click on
the circled "There's no place like 127.0.0.1" (to the left of the ad
promo for the PW site [who think alt.comp.freeware is their special-
interest newsgroup]). I have no idea why the MVP site doesn't make very
obvious a link to their hosts file. It's at:

http://www.mvps.org/winhelp2002/hosts.txt

They now have 82 entries where "doubleclick" appears in the domain
portion of the hostname. Does that mean Doubleclick now has 30 more
hosts for controlling their content than before (when I saw 52 listed)?
Not necessarily. Could mean that the author(s) for the list simply
found some more hosts that already existed that he/she didn't know about
before. Does Doubleclick actually have all 82 hosts actively handling
their content (i.e., those hosts exist). Not necessarily. I'm not sure
that once a host gets listed in the MVP hosts list that it ever gets
removed even after that host no longer exists.

From their site, "In many cases using a well designed HOSTS file can
speed the loading of web pages by not having to wait for these ads,
annoying banners, hit counters, etc. to load." Okay, but after what
point in creating an ever increasingly large list does the list itself
end up slowing down the serial local lookup through a text file more
than the ad that it might block in a web page?
 
** reply interspersed

VanguardLH said:
I don't use pre-compiled hosts lists. However, at this high a count for
entries in a *text* file (which isn't cached and isn't a database file
to provide, for example, for quicker searches through a binary tree),
I'm wondering how the use of such a huge list will impact the
performance of the web browser.

Fine. That's your choice to make.

They now have 82 entries where "doubleclick" appears in the domain
portion of the hostname. Does that mean Doubleclick now has 30 more
hosts for controlling their content than before (when I saw 52 listed)?
Not necessarily. Could mean that the author(s) for the list simply
found some more hosts that already existed that he/she didn't know about
before. Does Doubleclick actually have all 82 hosts actively handling
their content (i.e., those hosts exist). Not necessarily. I'm not sure
that once a host gets listed in the MVP hosts list that it ever gets
removed even after that host no longer exists.

I've never checked but I have faith in those who compile the list.
Should you care to verify the entries, you might consider:
http://majorgeeks.com/Sam_Spade_d594.html
From their site, "In many cases using a well designed HOSTS file can
speed the loading of web pages by not having to wait for these ads,
annoying banners, hit counters, etc. to load." Okay, but after what
point in creating an ever increasingly large list does the list itself
end up slowing down the serial local lookup through a text file more
than the ad that it might block in a web page?

It isn't only the download time for the ads. It's also the DNS lookups.

As stated on the site, you would want to disable the Windows DNS resolver
cache.

The primary purpose of the hosts file has evolved over the years and now is
used to block malicious sites. I consider that to be a part of my layered
defenses. Those of us who use it and those who compile the file consider it
worthwhile. I guess this is the point where I should thank those hard
working people, so thank you, you hard working people. ;-)

Respectfully,
Bob
 
Fine. That's your choice to make.

I'm not clear how this replies to the OP's wondering outloud if the use
of a huge HOSTS file effects browser performance. There was some
problem with with IE8 and users of a HOSTS file early on, although I
don't recall if the size of the list was the issue.

Gene
 
I'm not clear how this replies to the OP's wondering outloud if the use
of a huge HOSTS file effects browser performance. There was some
problem with with IE8 and users of a HOSTS file early on, although I
don't recall if the size of the list was the issue.

Gene

My first line only addressed his first line, that he doesn't use a hosts
file. I indicated later that he should disable the resolver cache. I also
mentioned later the blocking of malicious sites. Those sites could have
greater impacts than browser speed.

As to IE8, I haven't experienced that problem on XP and have no experience
with Vista.

Respectfully again,
Bob
 
Anonymous said:
The primary purpose of the hosts file has evolved over the years and now is
used to block malicious sites.

When did ads become malware? Most the folks that I've read their posts
that are using the hosts file are using it to block ads. You actually
know of Doubleclick spreading malware?
 
VanguardLH said:
When did ads become malware? Most the folks that I've read their posts
that are using the hosts file are using it to block ads. You actually
know of Doubleclick spreading malware?

I didn't say that ads are malware, but now that you mention it...
http://www.scmagazineus.com/malicious-banner-ads-hit-major-websites/article/35605/
http://www.smartcomputing.com/editorial/article.asp?article=articles/2008/s1905/25s05/25s05.asp
http://www.roseindia.net/community/spyware/malicious_advertising.shtml

As for doubleclick...
http://www.theregister.co.uk/2009/02/24/doubleclick_distributes_malware/

If you do a simple Google search for "ads" and "malicious" you get 1,350,000
hits:
http://www.google.com/search?hl=en&source=hp&q=ads+malicious

The FTC seems to frown on the practice:
http://www.ftc.gov/os/caselist/0723137/index.shtm

I would invite you to spend some time visiting one of the blogs at
msmvps.com:
http://msmvps.com/blogs/spywaresucks/

Please give my respects to Sandi.

Respectfully,
Bob
 
Infected banner ads have been, in the fairly recent past, a significant
source of malware. I don't know whether the providers have cleaned up their
act, but I've seen an infected banner ad at an entirely legitimate site at
some point in the last two years.

My understanding is that the hosts file also may list hosts whose entire
content is malware--the places where links in spam email take you, for
example.

I'll say parenthetically that I don't use the hosts file myself, and I don't
know the answer about the performance issue--but I suspect that it is
manageable. I guess if my office had been using the hosts file two years
ago, we might never have seen the virus alert from that infected banner ad.
And, if our antivirus vendor (Microsoft) had not been on the ball, we might
also have not seen that alert--and been infected!
 
Bill said:
Infected banner ads have been, in the fairly recent past, a significant
source of malware. I don't know whether the providers have cleaned up their
act, but I've seen an infected banner ad at an entirely legitimate site at
some point in the last two years.

My understanding is that the hosts file also may list hosts whose entire
content is malware--the places where links in spam email take you, for
example.

I'll say parenthetically that I don't use the hosts file myself, and I don't
know the answer about the performance issue--but I suspect that it is
manageable. I guess if my office had been using the hosts file two years
ago, we might never have seen the virus alert from that infected banner ad.
And, if our antivirus vendor (Microsoft) had not been on the ball, we might
also have not seen that alert--and been infected!

When I saw the 15K count for entries in the hosts file, it just popped
into my mind to wonder about the performance impact of such a huge list
that is a serially read text file might have on the web browser. It
would be an effect that might be interesting if known or measurable. It
could be that its impact would be so miniscule, like a 0.1 second or
less, that no one would care (yet). Or it could be something that
noticeably impacts the web browser (like a huge number of entries in the
Restricted Sites security zone significantly impacts IE8's performance).

If the hosts file is not cached (I've never heard mention that it was)
and thinking about what code might have been written to interrogate its
list of hosts and IP address lookups, a function to do a readln() to
walk serially through a text file to do a substr() to find that host
name and return() the IP address might sound fast except that function
gets slower as the list gets longer, plus this function would have to be
called on every DNS lookup. This is like having to start reading a book
from page 1 and read through every page when you want to lookup
something in the book. Even humans don't do than when flipping back and
forth through a dictionary to zero in on a word.

I was going to use the analogy that writing crib notes on your arm might
be handy but you finding a particular crib note would take longer as
more were added. Then I started to think that another disadvantage
would be running out of room. Might there might possible a physical
limit (versus the practical limit to which I alluded above) to how large
the hosts file can grow? After all, the original intended purpose of
the hosts file meant it would be rather small-sized (in its file size)
and low-count (in its number of entries). As examples only, what if the
limit were 2^11 (because, say it was a 12-bit binary but signed)
512-byte sectors for the hosts file (1024KB) which meant the current
list is already at 60% of that limit. What if a max of 2^14 entries
(16,384) were allowed of which now there are 15,114 of them used. It
seems unlikely there are no usable limits to this file's size and the
number of entries within. Another what-if is that possibly the list
already exceeds some max entry count so it merely gets truncated (not
all of it is read) which means the list grows but has no further effect.
I just threw out a couple what-ifs. There are always physical and
practical limits.

There might be physical or usable maximums to the size of the hosts file
(in its file size and/or in its entry count). There might be practical
limits in how a large list of entries might affect the web browser's
performance. I don't remember anyone delving into these considerations
even if only to nullify those considerations, or show how unimportant
are those considerations now to prove how much growth can be sustained.

On a side note, and unlike the manual method described in another post
which would be an impossible task for a human, is there a utility that
can take this text list of hostnames to check if a DNS lookup actually
returns a non-error result? That is, is there a speedy utility that
could validate this list to eliminate non-functional lookups? This
would also serve to show if this list is bloated and by how much with
superfluous entries.
 
On a side note, and unlike the manual method described in another post
which would be an impossible task for a human, is there a utility that
can take this text list of hostnames to check if a DNS lookup actually
returns a non-error result? That is, is there a speedy utility that
could validate this list to eliminate non-functional lookups? This
would also serve to show if this list is bloated and by how much with
superfluous entries.

There was once a time when I was more up to date on the activities in this
area but now I'm simply a consumer. I thought I'd look around a bit to see
if I could find anything that might tweak your interest.
http://hype-free.blogspot.com/2007/08/malicious-hosts.html
There is a new study on the honeynet site, titled Know Your Enemy: Malicious
Web Servers. While the study is interesting, there isn't anything
particularly new about it. The methodology was very similar to other studies
in this area (the Google Ghost in the browser - warning, PDF - study or the
Microsoft HoneyMonkey project<) - essentially it was a set of virtual
machines running unpatched versions of the OS which were directed to the
malicious links and any changes in them (created files, processes, etc) were
recorded.

The most interesting part (for me) however was the Defense Evaluation /
Blacklisting part. When applied on their dataset the very famous hosts file
maintained by winhelp2002 blocked all infections, although it contained only
a minority (12%) of the domains. This means that the majority of bad code
out there are redirectors and that these lists managed to include (at least
until now) the true sources of the infections. This is a very interesting
and it shows that while the number of different points of contact with
malicious intent on the Internet increases very rapidly, their variation
doesn't quite as rapidly and blacklisting technologies are still effective
(and by the same logic, AV systems can still be effective).

The hosts file in question is the one mentioned above. I think it's still
maintained by Mike Burgess aka winhelp2002.
There a huge amount of effort in the area of finding malicious sites. I
would imagine Mike would be a good source of information. He can be reached
via his blog:
http://msmvps.com/blogs/hostsnews/default.aspx
 
When I saw the 15K count for entries in the hosts file, it just popped
into my mind to wonder about the performance impact of such a huge list
that is a serially read text file might have on the web browser. It
would be an effect that might be interesting if known or measurable. It
could be that its impact would be so miniscule, like a 0.1 second or
less, that no one would care (yet). Or it could be something that
noticeably impacts the web browser (like a huge number of entries in the
Restricted Sites security zone significantly impacts IE8's performance).
<snip>

You can have a large HOSTS file and enable the DNS client service at the
same time:
http://forum.abelhadigital.com/index.php?showtopic=637
"How to keep the DNS Client service enabled with a big hosts file"

more info here:
http://forum.hosts-file.net/viewtopic.php?f=5&t=684
"Large HOSTS file + DNS Client service = faster machine!"

I have used this method on occasion with a very large HOSTS file (over
100,000 entries!) and I can say that it works very well. There is a
delay at bootup when the HOSTS file is being cached, but after that
browsing is faster due to the caching.

Another HOSTS file tool to speed up browsing is "Homer", a local proxy:
http://www.funkytoad.com/index.php?option=com_content&view=article&id=14&Itemid=32

If none of this is to your liking, try a pac file:
http://www.schooner.com/~loverso/no-ads/
but you might have to maintain it yourself.
 
I have to say I haven`t noticed a noteworthy degradation of performance in
IE8 associated with an `over enthusiastic` hosts file. Granted there must be
a miniscule degredation of performance while it troggs thru the look ups but
that is a small price for the world we live in today? Nano rather than milli
secs would be more appropriate - home or business user?

Be thankful for small mercies and those who take the time ................

Stu
 
Then again we haven`t entered into the discussion of CPU and
motherboard/hardware peformance .... that will undoubtedly impact the
ability of a paricular systems ability to cope with the requirement of todays
number crunching progs. There are a lot of users using out there ........ old
and new. You need to remember that and take stock ocassionally.

Stu
 
The hosts file is only scanned the moment you open the web browser. I use
mvp's hosts file and I rwecomend it in all of my blogs. I do not want to
force you to see what i see but next time you get malware that highjacks your
machine, install mvps hosts and watch how easy it becomes to rescue your
machine.
yes it blocks advertisements and it says that is what it does. it is illegal
to acuse a program designer of imbeding a virus. so i will hint that mvp's
hosts file, does more than block adverts.

VanguardLH said:
Anonymous said:

I don't use pre-compiled hosts lists. However, at this high a count for
entries in a *text* file (which isn't cached and isn't a database file
to provide, for example, for quicker searches through a binary tree),
I'm wondering how the use of such a huge list will impact the
performance of the web browser.

I remember looking at the MVP hosts file a couple years ago. Back then
there were 52 entries for DoubleClick alone. The hosts file is just
that: it lists hosts. It doesn't list domains so each blocked *host*
must be specified (i.e., host.domain.tld, not domain.tld). Since anyone
can rename their own host in their own nameserver, and since some
nameservers are configured to return the same IP address for a
particular host no matter what hostname is specified, it seems an
ever-changing or untargetable method of identifying unwanted hosts.

Not the above URL does not show you to a download link for the MVP
version of the hosts file (other entities also have pre-compiled hosts
lists where you relinquish control to someone else who has deemed a site
as something bad). The download is there but somewhat hidden. Click on
the circled "There's no place like 127.0.0.1" (to the left of the ad
promo for the PW site [who think alt.comp.freeware is their special-
interest newsgroup]). I have no idea why the MVP site doesn't make very
obvious a link to their hosts file. It's at:

http://www.mvps.org/winhelp2002/hosts.txt

They now have 82 entries where "doubleclick" appears in the domain
portion of the hostname. Does that mean Doubleclick now has 30 more
hosts for controlling their content than before (when I saw 52 listed)?
Not necessarily. Could mean that the author(s) for the list simply
found some more hosts that already existed that he/she didn't know about
before. Does Doubleclick actually have all 82 hosts actively handling
their content (i.e., those hosts exist). Not necessarily. I'm not sure
that once a host gets listed in the MVP hosts list that it ever gets
removed even after that host no longer exists.

From their site, "In many cases using a well designed HOSTS file can
speed the loading of web pages by not having to wait for these ads,
annoying banners, hit counters, etc. to load." Okay, but after what
point in creating an ever increasingly large list does the list itself
end up slowing down the serial local lookup through a text file more
than the ad that it might block in a web page?
 
TruXterTech said:
The hosts file is only scanned the moment you open the web browser. I use
mvp's hosts file and I recommend it in all of my blogs. I do not want to
force you to see what i see but next time you get malware that hijacks your
machine, install mvps hosts and watch how easy it becomes to rescue your
machine.
yes it blocks advertisements and it says that is what it does. it is illegal
to accuse a program designer of embedding a virus. so i will hint that mvp's
hosts file, does more than block adverts.

Hello:

Then just as a hint, where do you suppose someone should look for
something that does more in the mvps.org zip file (hosts.zip)?

Thank you.

NNTP posted. Not web posted.
 
1PW said:
Hello:

Then just as a hint, where do you suppose someone should look for
something that does more in the mvps.org zip file (hosts.zip)?

Thank you.

NNTP posted. Not web posted.
 
Back
Top