WLBS Fails when Default Host gets back on-line. Possible Affinity Bug!

  • Thread starter Thread starter Daniel Rasmussen
  • Start date Start date
D

Daniel Rasmussen

Hi Curtis!

What you describe here is exactly what we want to achieve.
Unfortunately, we've
now spent some time trying to get this to work. We've set
the NLB cluster up
according
to what you descibre below (using multiple host and 0/100
load weight to
achieve active/passive server behaviour). The problem is
two-fold

1) The passive server won't get any requests if its load
weight is set to 0.
Instead
we've set it to 1.
2) When we do that, the following occurs:

When the active server comes up after a temporary failure,
requests from
existing http connections in the passive host are
misstakingly passed on to the
active host, causing an access denied error to be
displayed since the client
does not have a valid session in the active host.

Do you have any ideas of what might be wrong here? Have we
missed something in
the configuration?

Best regards
Daniel Rasmussen


FWD: -->
Hi Zrb,
From what I understand you want a 2 node NLB cluster to
cluster IIS, and
you want all the traffic to go to one node unless if fails
then you want
the traffic to go to the second node.

This can be configured, but it should be noted that unlike
cluster service
the 'passive" node will not move active connections back
to the 'primary'
node when it comes back online. It will continue to
service all the traffic
it is handling, any new requests that arrive when
the 'primary' node comes
back online will however go to it. As connections are
closed they will
leave the 'passive' node as well.

There are 2 ways to configure this through the "Port
Rules" on the
"Filtering mode" options. The first way is to use
the "Multiple host"
settings "Load weight" to an unequal load. Remove the
check mark for
"Equal" and set the load number on the 'primary' node to
100, on the'
passive' node set the "Load weight" to 0.

The second way to do this is to set the "Filtering Mode"
to "Single host"
you then set the "Handling priority" to 1 on the host that
is to be
'primary' and 2 on the node that is to be 'secondary'.
This causes the node
with the highest priority that is available to take any
new requests.

Network Load Balancing does not restart the application on
failover. It
assumes that an instance of the application is running on
each host in the
cluster. This also allows you to load balance several
different services on
the nodes that use different ports and set a primary node
for each
application.

For Network Load Balancing to provide single-server
failover support for a
specific application, the files that the application uses
must be
simultaneously accessible to all hosts that run the
application. These
files normally reside on a back-end file server. Some
applications require
that these files be continuously open exclusively by one
instance of the
group; in a Network Load Balancing cluster, you cannot
have two instances
of a single file open for writing. These failover issues
are addressed by
server clusters, which run the Cluster service.

Other applications open files only on client request. For
these
applications, providing single-server failover support in
a Network Load
Balancing cluster works well. Again, the files must be
visible to all
cluster hosts. You can accomplish this by placing the
files on a back-end
file server or by replicating them across the Network Load
Balancing
cluster.
There are two alternatives for configuring the port rules
for single-server
failover support:
* Use no port rules. All the traffic goes to the host with
the highest
priority (the Host Priority ID with the lowest value). If
that host fails,
all the traffic switches to the host with the next-highest
priority.
* For each application for which you're configuring single-
server failover
support, create a different port rule for the
application's port range, in
which:
* Filtering Mode is set to Single.
* Handling priorities are set according to the desired
failover priority
across the cluster hosts.
* This option overrides the Host Priority IDs with
handling priorities for
each application's port range. With this configuration,
you can run two
single-server applications on separate hosts and fail in
opposite
directions.
__
Curtis Koenig
Windows 2000 MCSA,MCSE
Security MCSA,MCSE
Microsoft Clutering Technologies Support

This posting is provided "AS IS" with no warranties and
confers no rights.
Please reply to the newsgroup so that others may benefit.
Thanks!
--------------------
| >From: (e-mail address removed) (zrb)
| >Newsgroups: microsoft.public.win2000.advanced_server
| >Subject: NLB, failover and failback
| >Date: 23 Jul 2003 15:56:06 -0700
| >Organization: http://groups.google.com/
| >Lines: 16
| >Message-ID:
<[email protected]>
| >NNTP-Posting-Host: 64.94.157.1
| >Content-Type: text/plain; charset=ISO-8859-1
| >Content-Transfer-Encoding: 8bit
| >X-Trace: posting.google.com 1059000966 8640 127.0.0.1
(23 Jul 2003
22:56:06 GMT)
| >X-Complaints-To: (e-mail address removed)
| >NNTP-Posting-Date: 23 Jul 2003 22:56:06 GMT
| >Path:
cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!newsfeed00.sul.t-
online.de!t-onlin
e.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!sn-
xit-03!sn-xit-06!sn-
xit-08!sn-xit-09!supernews.com!postnews1.google.com!not-
for-mail
| >Xref: cpmsftngxa06.phx.gbl
microsoft.public.win2000.advanced_server:9796
| >X-Tomcat-NG: microsoft.public.win2000.advanced_server
| >
| >Hi,
| >
| > I am trying to configure a 2 node NLB cluster for
active/passive
| >operation of IIS.
| >
| > 1) I have two machines M0 and M1.
| > 2) I want all traffic to go to M0 until it fails.
| > 3) I want all traffic to go to M1 if M0 fails.
| > 4) Even if M0 comes back online, all traffic should
go to M1 only.
| >
| > Is there a way to do this using NLB. I know this is
easy with
| >Server Clusters, but for cost reasons I do not want to
do that.
| >
| >Regards
| >
| >zrb
 
Hi Daniel,
You are correct when the weight is set to 0 the passive node gets no
traffic, however if the main node goes down it should take all traffic. In
truth the best way to configure this is the use the second way I described
by using Single Host Filtering mode. If you are NLB clustering just one
application the best way to achieve this is to use no port rules. All
traffic goes to the host with the highest priority (the Host Priority ID
with the lowest value). If that host fails, all the traffic switches to the
host with the next highest priority.

If you want to do this with 2 applications I can go into that but that does
not seem the case here and I would like to stay concise.

In order to really see what is going on with your configuration I would
need to see the output from wlbs display from the nodes.

--
Curtis Koenig
Support Professional
Microsoft Clustering Technologies Support
MCSA, MCSAS,MCSE, MCSES

This posting is provided "AS IS" with no warranties and confers no rights.
Please reply to the newsgroup so that others may benefit. Thanks!
--------------------
 
Hi Curtis,

We have worked on this all day and the current status is
as follows:

1. We have changed the setting: We are now using Load
Weight 50/50, affinity set to single.

Result:
The setup works 100% ok on two of our browsers, however,
it fails on two others. We have spent a few hours to try
to find the cause of this, with no luck.

Our question are:
1. Are there any known bugs in IE6.0 that might cause this
variation in the WLBS clustering behaviour?
2. If we want to manually be able to set for how long WLBS
should keep connections alive? We would like to set this
time ourselves. What is the default value? With connection
we mean the mapping of client ip adresses with cluster
host id.

Best regards
Daniel
 
Hi Daniel,
I will try to address all of the questions from you last 2 posts here.

Terminology used:
VIP = Virtual IP address, IP address that is load balanced by NLB
DIP = Dedicated IP address, IP address that is on the same adapter as NLB
but is not balanced
ADMIN IP = IP address assigned to a second adapter that does not have NLB
installed.

I am going to begin with an analysis of the WLBS DISPLAY output.
The first thing I notice is that you have all the IP numbers in the same
subnet. Microsoft recommends that the VIP and DIP be in the same subnet but
that the ADMIN IP be in a different subnet. There are 2 recommended
configurations that this can be accomplished in. The first is called the
Internet Topology, The NLB adapter is hooked to an outgoing connection to
the internet and has the default gateway, the ADMIN adapter then connects
to a separate subnet that is the internal network. The second configuration
is called Private Topology, it has the same NLB NIC configuration but the
ADMIN adapters on all NLB hosts are plugged into a hub and speak only to
one another.

If all the IP addresses are in the same subnet then IP looping can occur
and NLB can fail to be effective and it can also increase traffic across
your network. A second thing to note is that if the NLB adapter is hooked
directly up a layer 2 switch the switch should have a VLAN implemented that
covers only the NLB adapters. The reason for this is that by default NLB
masks its source MAC, this causes switch flooding and increased traffic. A
VLAN minimizes this flooding and thus reduces unnecessary traffic to
non-NLB hosts.

Now that we have that squared away on to the questions:
Here is the flow that we want to achieve:
1. User logs on to our portal. (served by the active host)
2. Active host goes down
3. User session is lost since our portal does not support sticky sessions.
4. User logs on to portal again (now served by the passive host)
5. We bring up the active host again.
6. New users are now served by the active host
7. Existing users in the passive host should remain there untill they log
off

You mention that part 7 is the only problem, does this mean you want the
users to move back to the primary node? By default any clients connected to
node 2 should stay there until they terminate their session and any new
session should follow the rules that NLB implements given available hosts.

Second post:
Our question are:
1. Are there any known bugs in IE6.0 that might cause this
variation in the WLBS clustering behaviour?
2. If we want to manually be able to set for how long WLBS
should keep connections alive? We would like to set this
time ourselves. What is the default value? With connection
we mean the mapping of client ip adresses with cluster
host id.

1. No there are no known bugs in IE6.0 that are related to your reported
behavior.
2. There is no way to tell WLBS how long to keep a connection alive, this
is a TCP/IP setting and it is set by the client. If memory serves the
default timeout for a TCP session is 17 days. To make this any shorter
would require that you implement TCP/IP keep alive messages in the
application and on the servers registry. Doing so could have severely
detrimental outcomes to any other TCP/IP traffic as this is global to the
box and not to a specific service or type of connection.

Part of the problem here is that client computers maintain part of the
connection state, thus if the client does not do a complete tear down of
the session when it is done (i.e. FIN for the session) then it will try to
reconnect to the same host as the host will also have the session as still
active. Thus a burden of the session is on the client and not NLB, and
unfortunately NLB has no way of configuring a default connection time limit.
--
Curtis Koenig
Support Professional
Microsoft Clustering Technologies Support
MCSA, MCSAS,MCSE, MCSES

This posting is provided "AS IS" with no warranties and confers no rights.
Please reply to the newsgroup so that others may benefit. Thanks!
--------------------
 
Hi Daniel,
If you have some hosts that work and some that do not the next thing I
think you should do is take network traces from the working and non working
systems and compare them. The only thing I can think of that could cause a
client to change servers in an NLB cluster would be if the original
connection were broken or FINed and a new connection established to the
virtual adapter. The NLB algorithm uses the IP address and host port as the
identifier for affinity thus if you have some that work and some that do
not the problem could be in your network between the non-working machines
and the cluster. Beyond reading the traces yourself you may want to
consider engaging our support organization to help you diagnose or read the
network traces to assist you in finding the root cause.


Unfortunately NLB does not have any logging or monitoring properties built
in, the closest you will get is using performance monitor and monitoring
network counters but that will not tell you what host is connecting. The
only other tool that may help you is TCP view from
http:\\www.sysinternals.com, this tool can monitor a network connection and
tell you who is connecting and what not by IP address. I do not think this
is a programming error but an anomaly in the network itself that you will
have to hunt down.
--
Curtis Koenig
Support Professional
Microsoft Clustering Technologies Support
MCSA, MCSAS,MCSE, MCSES

This posting is provided "AS IS" with no warranties and confers no rights.
Please reply to the newsgroup so that others may benefit. Thanks!
--------------------
 
Back
Top