PDC crashed, now what?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Windows Server 2003 Standard SP1

A few weeks ago we had our PDC crash due to a hardware failure. It took 3
days to get it back online. We had to rebuild the RAID array and re-install
Windows. The server was promoted using the Manage Your Server -> Domain
Controller utility. The server, when rebuilt, was given the same name,
DOGPOUND.

Everything seemed fine until we recently tried to add another computer to
the domain. It failed with the error message "The directory service was
unable to allocate a relative identifier".

Dcdiag shows some errors. No time servers are advertising themselves on the
domain. The "KnowsOfRoleHolders" test failed, as did "Services" and
"Advertising" on the backup DC, ROVER. DOGPOUND failed tests "Advertising",
"KnowsOfRoleHolders", "RIDManager", "Services", "kccevent" and "systemlog".
The FsmoCheck tests all failed: PDC could not be located, time server could
not be located, both saying the server with the PDC role could not be located.

I've tried doing role transfers and seizures, but nothing has seemed to help
at all. Dcdiag still fails on the same tests every time. Now I am totally
lost. What do I need to do now?

When the crashed server was rebuilt and promoted, apparently we were given
the option for repairing Active Directory or something (I didn't do this so
I'm going by what an employee tells me). I've never seen that before so I
have no idea what this guy is talking about. It seems like these problems
are all pointing to the previous PDC not being found. Getting AD services
pointing away from that PDC would seem to be the answer, but I can't seem to
make it happen.

Other than not being able to add computers, everything else on the domain
seems to be working fine. For now. I'm sure it's all building up like a
volcano ready to come crashing down. Please help?

TIA
 
Does this mean that you've rebuilt the domain ( = new domain, with the same
name),
or did you use a System State backup ? (which is what you should do - if you
still have a System State backup available)
 
c0d3r said:
Does this mean that you've rebuilt the domain ( = new domain, with the same
name),
or did you use a System State backup ? (which is what you should do - if you
still have a System State backup available)

Actually, this means neither. There were other DCs in the domain. They
remained active during the 3 days it took to rebuild this DC. Once the RAID
array was rebuilt and Windows Server 2003 SP1 was loaded back onto the
crashed DC, it was renamed using the same name it used before the crash and
was then promo'd to a DC using the Domain Controller utility in Manage Your
Server. No domain had to be rebuilt.

I'm sure we do have a System State backup from before the crash. Is that
the best thing to do at this point, go ahead and restore the System State, 3
weeks into using the rebuilt DC, or do we just need to clean the AD metadata
that uses the previous DC's RID?

Since moving around the roles earlier and allowing replication to complete
successfully, we can now add computers to the domain, but dcdiag still fails
most tests and indicates no time server on the network. I know we're still
looking at having problems with other items down the line from now.

TIA
 
If you have other DC's running, you should indeed rebuild your DC using a
regular dcpromo.

If a DC fails, the first thing you should to is transfer the FSMO roles that
were assigned to that DC, to another server.
(http://support.microsoft.com/kb/255504)
In case it was the PDC emulator, and if you have a trust with a NT4 domain,
you'll have to verify that trust as well (after seizing the PDC emulator
role first of course)
The NT4 DC's would probably be looking at the DC that does no lonnger
exist - at least, it might take a while before the old NT4's know that the
PDC of the AD was moved to another server

When rebuilding a DC, you have 2 options :
1. remove the old DC from AD and install the DC using the same name
2. install AD on the new DC but use a different hostname

You've used scenario 1, right ? . Now, when removing a DC using the proper
way (dcpromo), everything should be cleaned up. However it might still take
a couple of days (especially on larger domains) before all data is removed
from all DC's . Since your DC died, you'll have to clean up yourself
(metadata cleanup) before putting back a DC with the same name:
http://www.petri.co.il/delete_failed_dcs_from_ad.htm. A new DC with the
same hostname will still have another id

Since you've already loaded your DC back, using the same hostname, (but
without cleaning up the references to the old machine, with its old id), I
would :

1. remove the new DC from the domain again (dcpromo it out)
2. clean up metadata
3. wait a little until all references are removed from all DC's
4. dcpromo it back in


c

https://petersblog.dyndns.org:8899
 
c0d3r said:
If you have other DC's running, you should indeed rebuild your DC using a
regular dcpromo.

If a DC fails, the first thing you should to is transfer the FSMO roles that
were assigned to that DC, to another server.
(http://support.microsoft.com/kb/255504)
In case it was the PDC emulator, and if you have a trust with a NT4 domain,
you'll have to verify that trust as well (after seizing the PDC emulator
role first of course)
The NT4 DC's would probably be looking at the DC that does no lonnger
exist - at least, it might take a while before the old NT4's know that the
PDC of the AD was moved to another server

When rebuilding a DC, you have 2 options :
1. remove the old DC from AD and install the DC using the same name
2. install AD on the new DC but use a different hostname

You've used scenario 1, right ? . Now, when removing a DC using the proper
way (dcpromo), everything should be cleaned up. However it might still take
a couple of days (especially on larger domains) before all data is removed
from all DC's . Since your DC died, you'll have to clean up yourself
(metadata cleanup) before putting back a DC with the same name:
http://www.petri.co.il/delete_failed_dcs_from_ad.htm. A new DC with the
same hostname will still have another id

Since you've already loaded your DC back, using the same hostname, (but
without cleaning up the references to the old machine, with its old id), I
would :

1. remove the new DC from the domain again (dcpromo it out)
2. clean up metadata
3. wait a little until all references are removed from all DC's
4. dcpromo it back in


c


Those 4 steps are exactly what I ended up doing. Add to that removing a
crusty old entry in FRS pointing to a DC that was removed from the domain
years ago and configuring Windows Time Service in the Default Domain
Controllers Policy, and walla (!) everything is working perfectly. dcdiag
tests all pass now, AD reports no errors on any DCs and it seems we're back
in business!

Thanks again for the assistance :)
 
kickballmvp2006 said:
Did you do the metadata cleanup on all the domain controllers or do you
just have to do it on one of them?


--
kickballmvp2006
------------------------------------------------------------------------
kickballmvp2006's Profile: http://forums.techarena.in/member.php?userid=29383
View this thread: http://forums.techarena.in/showthread.php?t=820129

http://forums.techarena.in


Once I demoted the one, there was only one DC left in the domain. I did the
cleanup on that one after waiting an hour or so after demotion. Then I
waited overnight and promo'd the other DC the next morning.
 
assuming that your DC's still replicate, then you should do it on just one

don't forget to clean up all DNS entries (in all zones) as well...
 
Back
Top