G
goood
Hello,
we have two HP DL-Server running with W2k AS. Storage is the XP 512
from HP too. There are only fileshares on that cluster.
The problem is that both clusternodes stopps reacting on client
requests for fileservice unexpected with undefined times between the
errors without switching to other node. There is going just "nothing"
on the cluster. The cluster-service seems to run all the
time.(services-snapin).Typing "cluster res" on the command line
returns a normal output: all resources are online and on the correct
node. We need to restart both nodes to get the cluster up and serving
files again.
Here is a part of cluster.log concerning the last "crash-time":
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumSendUpdateOnVote:
Type=0 Context=12
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumSendUpdateOnVote:
Collect Vote at Sequence=6119
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumVoteUpdate:
Dispatching vote type 0 context 12 to node 1
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumSendUpdateOnVote:
Decision Routine returns=183
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumSendUpdateOnVote:
Returning status=0
00000874.00000a64::2004/04/02-08:50:17.501 [DM]DmpCheckpointTimerCb-
taking a checkpoint
00000874.00000a64::2004/04/02-08:50:17.501 [LM] LogReset entry...
00000874.00000a64::2004/04/02-08:50:17.501 [LM] LogpReset entry...
00000874.00000a64::2004/04/02-08:50:17.517 [LM] LogpCreate : Entry
00000874.00000a64::2004/04/02-08:50:17.517 [LM] LogpMountLog : Entry
pLog=0x04511198
00000874.00000a64::2004/04/02-08:50:17.517 [LM]
LogpMountLog::Quorumlog File size=0x00000000
00000874.00000a64::2004/04/02-08:50:17.517 [LM] LogpInitLog : Entry
pLog=0x04511198
00000874.00000a64::2004/04/02-08:50:17.532 [LM] LogpAppendPage :
Writing 1024 bytes to disk at offset 0x00000000
00000874.00000a64::2004/04/02-08:50:17.548 [LM] LogpInitLog :
NextLsn=0x00000408 FileAlloc=0x00000800 ActivePageOffset=0x00000400
00000874.00000a64::2004/04/02-08:50:17.548 [LM] LogpCreate : Exit with
success
00000874.00000a64::2004/04/02-08:50:17.564 [LM] LogGetLastChkPoint::
Entry
00000874.00000a64::2004/04/02-08:50:17.595 [LM] LogGetLastChkPoint:
ChkPt File Q:\MSCS\chk17DC.tmp ChkPtSeq=6108 ChkPtLsn=0x00000408
Checksum=104661
00000874.00000a64::2004/04/02-08:50:17.595 [LM] LogGetLastChkPoint
exit, returning 0x00000000
00000874.00000a64::2004/04/02-08:50:17.595 [LM] LogCheckPoint entry
00000874.00000a64::2004/04/02-08:50:17.610 [DM] DmpGetSnapShotCb:
DmpGetDatabase returned 0x00000000
00000874.00000a64::2004/04/02-08:50:17.610 [LM] DmpGetSnapshotCb:
Checkpoint file name=Q:\MSCS\chk17DC.tmp Seq#=6108
00000874.00000a64::2004/04/02-08:50:17.642 [LM] LogCheckPoint:
ChkPtFile=Q:\MSCS\chk17DC.tmp Chkpt Trid=6108 CheckSum=105158
00000874.00000a64::2004/04/02-08:50:17.642 [LM] LogFlush :
pLog=0x04511198 writing the 1024 bytes for active page at offset
0x00000400
00000874.00000a64::2004/04/02-08:50:17.642 [LM] LogCheckPoint:
EndChkpt written. EndChkPtLsn =0x00000438 ChkPt Seq=6108 ChkPt
FileName=Q:\MSCS\chk17DC.tmp
00000874.00000a64::2004/04/02-08:50:17.642 [LM] LogpCheckpoint :
Writing 1024 bytes to disk at offset 0x00000000
00000874.00000a64::2004/04/02-08:50:17.657 [LM] LogCheckPoint Exit
00000874.00000a64::2004/04/02-08:50:17.657 [LM] LogGetLastChkPoint::
Entry
00000874.00000a64::2004/04/02-08:50:17.657 [LM] LogGetLastChkPoint:
ChkPt File Q:\MSCS\chk17DC.tmp ChkPtSeq=6108 ChkPtLsn=0x00000408
Checksum=105158
00000874.00000a64::2004/04/02-08:50:17.657 [LM] LogGetLastChkPoint
exit, returning 0x00000000
00000874.00000a64::2004/04/02-08:50:17.673 [LM] LogpReset exit,
returning 0x00000000
00000874.00000a64::2004/04/02-08:50:17.673 [LM] LogReset exit,
returning 0x00000000
00000870.0000086c::2004/04/02-10:20:07.734
that's all. Q: is the quorum disk.
Did anybody have the same problem? Perhaps with different hardware?
rgds
R.J.
we have two HP DL-Server running with W2k AS. Storage is the XP 512
from HP too. There are only fileshares on that cluster.
The problem is that both clusternodes stopps reacting on client
requests for fileservice unexpected with undefined times between the
errors without switching to other node. There is going just "nothing"
on the cluster. The cluster-service seems to run all the
time.(services-snapin).Typing "cluster res" on the command line
returns a normal output: all resources are online and on the correct
node. We need to restart both nodes to get the cluster up and serving
files again.
Here is a part of cluster.log concerning the last "crash-time":
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumSendUpdateOnVote:
Type=0 Context=12
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumSendUpdateOnVote:
Collect Vote at Sequence=6119
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumVoteUpdate:
Dispatching vote type 0 context 12 to node 1
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumSendUpdateOnVote:
Decision Routine returns=183
00000874.0000106c::2004/04/02-05:27:37.923 [GUM] GumSendUpdateOnVote:
Returning status=0
00000874.00000a64::2004/04/02-08:50:17.501 [DM]DmpCheckpointTimerCb-
taking a checkpoint
00000874.00000a64::2004/04/02-08:50:17.501 [LM] LogReset entry...
00000874.00000a64::2004/04/02-08:50:17.501 [LM] LogpReset entry...
00000874.00000a64::2004/04/02-08:50:17.517 [LM] LogpCreate : Entry
00000874.00000a64::2004/04/02-08:50:17.517 [LM] LogpMountLog : Entry
pLog=0x04511198
00000874.00000a64::2004/04/02-08:50:17.517 [LM]
LogpMountLog::Quorumlog File size=0x00000000
00000874.00000a64::2004/04/02-08:50:17.517 [LM] LogpInitLog : Entry
pLog=0x04511198
00000874.00000a64::2004/04/02-08:50:17.532 [LM] LogpAppendPage :
Writing 1024 bytes to disk at offset 0x00000000
00000874.00000a64::2004/04/02-08:50:17.548 [LM] LogpInitLog :
NextLsn=0x00000408 FileAlloc=0x00000800 ActivePageOffset=0x00000400
00000874.00000a64::2004/04/02-08:50:17.548 [LM] LogpCreate : Exit with
success
00000874.00000a64::2004/04/02-08:50:17.564 [LM] LogGetLastChkPoint::
Entry
00000874.00000a64::2004/04/02-08:50:17.595 [LM] LogGetLastChkPoint:
ChkPt File Q:\MSCS\chk17DC.tmp ChkPtSeq=6108 ChkPtLsn=0x00000408
Checksum=104661
00000874.00000a64::2004/04/02-08:50:17.595 [LM] LogGetLastChkPoint
exit, returning 0x00000000
00000874.00000a64::2004/04/02-08:50:17.595 [LM] LogCheckPoint entry
00000874.00000a64::2004/04/02-08:50:17.610 [DM] DmpGetSnapShotCb:
DmpGetDatabase returned 0x00000000
00000874.00000a64::2004/04/02-08:50:17.610 [LM] DmpGetSnapshotCb:
Checkpoint file name=Q:\MSCS\chk17DC.tmp Seq#=6108
00000874.00000a64::2004/04/02-08:50:17.642 [LM] LogCheckPoint:
ChkPtFile=Q:\MSCS\chk17DC.tmp Chkpt Trid=6108 CheckSum=105158
00000874.00000a64::2004/04/02-08:50:17.642 [LM] LogFlush :
pLog=0x04511198 writing the 1024 bytes for active page at offset
0x00000400
00000874.00000a64::2004/04/02-08:50:17.642 [LM] LogCheckPoint:
EndChkpt written. EndChkPtLsn =0x00000438 ChkPt Seq=6108 ChkPt
FileName=Q:\MSCS\chk17DC.tmp
00000874.00000a64::2004/04/02-08:50:17.642 [LM] LogpCheckpoint :
Writing 1024 bytes to disk at offset 0x00000000
00000874.00000a64::2004/04/02-08:50:17.657 [LM] LogCheckPoint Exit
00000874.00000a64::2004/04/02-08:50:17.657 [LM] LogGetLastChkPoint::
Entry
00000874.00000a64::2004/04/02-08:50:17.657 [LM] LogGetLastChkPoint:
ChkPt File Q:\MSCS\chk17DC.tmp ChkPtSeq=6108 ChkPtLsn=0x00000408
Checksum=105158
00000874.00000a64::2004/04/02-08:50:17.657 [LM] LogGetLastChkPoint
exit, returning 0x00000000
00000874.00000a64::2004/04/02-08:50:17.673 [LM] LogpReset exit,
returning 0x00000000
00000874.00000a64::2004/04/02-08:50:17.673 [LM] LogReset exit,
returning 0x00000000
00000870.0000086c::2004/04/02-10:20:07.734
that's all. Q: is the quorum disk.
Did anybody have the same problem? Perhaps with different hardware?
rgds
R.J.