Cluster Resource 'Cluster IP Address' failed, Error 1069

Dennis · Jun 16, 2004

Every Sunday night, during our backups, the Cluster detects a failure
of the Public network (error ID 1069). This causes the shared disks
to fail to the passive node, which it turn, causes the backups to
fail. It usually occurs about an hour into the backups. MS KBA
242600 describes in detail "Network Failure Detection and Recovery in
a Two-Node Cluser". I can track exactly what happens in the
Cluster.log. I don't know why the NIC on the Active Node looses
connection with the network. In the cluster.log, I can see a record
of the Node no longer being able to ping the default gateway.
However, the passive Node can, which is why the resources fail over to
it. After about an hour, the resources fail BACK over AGAIN! Because
the same thing happens to the second node. Now the backups quit
trying, and everything is fine until the next Sunday. Note that a
Full backup runs for 16 hours on the Friday prior, having no problems
at all. Its the Sunday Incremental backup that's killing us.

We are using W2K Advanced Server, SP4, Two nodes, with a SAN connected
via Fibre HBA. The public conntection is using 3com Gigabit
3C985b-SX. We have Veritas Volume manager, and the Veritas Netbackup
Client running.

The System Event log typically has the following errors or warnings:
1123
1077
1077
1069
1122
1069
1069
1069

Cluster Resource 'Cluster IP Address' failed, Error 1069

Dennis