DC Disaster recovery

  • Thread starter Thread starter Joao Paulo
  • Start date Start date
J

Joao Paulo

Hello all,

I have 2 Windows 2000 DCs and the disk of one of them crashed. My backup is
4 days old. I swaped the disk and installed a fresh copy of Windows 2000 +
Backup Exec Remote agent + renamed the server with the same name as before.
I then restarted in Active Directory Restore Mode and run the System State
and C:\ restore.

After the restore I could not logon using my domain adminsitrator account
nor could I synchronize the running server with it.

Is there another tricky way to restore Windows 2000? Wasn't it supposed to
receive the updates from the running server after the restore?

Hope somebody have a clue.

Thank you
John
 
Joao,

I might have simply installed WIN2000, ran dcpromo on it and let it receive
everything from the existing WIN2000 Domain Controller. This would have
been the wiser choice. But hindsight is 20/20! ;-)

Are you stating that the 'new' DC did not replicate with the existing Domain
Controller? Did you give it a minute ( or 15 )? What did you see to give
you the impression that you needed to go into Active Directory Restore Mode?

If the crashed DC was not properly removed from Active Directory then you
would have needed to do a Metadata Cleanup. Did you do this? I am assuming
not!

And using the same name as the crashed DC will give you problems - usually.
But I think that you are seeing this right now.

I might run dcpromo on the 'new' Domain Controller ( if that works - if not
try the /forceremoval switch ) and then do a metadata cleanup. Make sure
that 'DC02' is completely gone. You will need to make sure that there are
no references to 'DC02' in DNS and use ADSIEdit to get rid of all
references. You might also have to go into Active Directory Sites and
Services and make sure that the object is no longer there.

--
Cary W. Shultz
Roanoke, VA 24012
Microsoft Active Directory MVP

http://www.activedirectory-win2000.com
http://www.grouppolicy-win2000.com
 
Why did you feel the need to do a restore if you have one DC still standing?
At this point, I'd recommend that you call PSS.

--

Sincerely,
Dèjì Akómöláfé, MCSE+M MCSA+M MCP+I
Microsoft MVP - Directory Services
www.readymaids.com - we know IT
www.akomolafe.com
Do you now realize that Today is the Tomorrow you were worried about
Yesterday? -anon
 
Thank you for the reply Cary,

The replication issue happend one day ago. Today I could logon to the failed
DC (let's call it DC2) and let it on the network to see if it received the
updates from DC1. The reason why I didn't simply installed a fresh DC and
ran dcpromo is because there is an internal procedure to build servers
(including DCs) that it would consume a lot of time and because I couldn't
let just 1 DC run due to lack of redundancy as they provide access to our
core business customers (students). We simply couldn't afford waiting any
longer.

Well, right now I can replicate DC2 (which is the RID, PDC and
Infrastructure master) with any DC on the network (we have 3 domains).
However, both DC2 and DC1 are showing up as RID and Infrastructure masters
now. Just when I was writing this message the helpdesk told me that nobody
could logon to the network. I unpluged the network cable from DC2. DC1 could
not contact the GC according to the event log. I restarted DC1 and
everything seems to be "fine" now.

I wonder what the next step would be.

Thank you
John




Right now I can logon
 
Cary,
If I run dcpromo on DC2 to demote it and then run again to promote it what
are the possible issues once this is the RID and Infrastructure master? I
can't transfer the roles because it is out of the network now. Even if I
connect it I can't transfer because it says "the current DC is the
operations master. To transfer the OM role to another computer, you must
first connect to it". But this message appears from either DC1 or DC2. So I
can't transfer anything from any DC.

Any suggestions?

Thank you
Joao Paulo
 
It would seem that you have created a nice mess for yourself.

Some thought and suggestions in-line.....

--
Cary W. Shultz
Roanoke, VA 24012
Microsoft Active Directory MVP

http://www.activedirectory-win2000.com
http://www.grouppolicy-win2000.com



Joao Paulo said:
Thank you for the reply Cary,

The replication issue happend one day ago. Today I could logon to the
failed
DC (let's call it DC2) and let it on the network to see if it received the
updates from DC1.

So, it is able to replicate all three of the Naming Contexts....the Schema
NC and the Configuration NC with all Domain Controllers in the Forest and
then the Domain NC with the Domain Controllers in that specific domain.
That would be a step in the right direction.
The reason why I didn't simply installed a fresh DC and
ran dcpromo is because there is an internal procedure to build servers
(including DCs) that it would consume a lot of time and because I couldn't
let just 1 DC run due to lack of redundancy as they provide access to our
core business customers (students). We simply couldn't afford waiting any
longer.

At the risk of sounding pretentious, it might have been worth the wait.
While it is not a Best Practice by any means, sometimes it is okay to have
only one Domain Controller ( for a short period of time ). Given specific
situations, naturally. And this sounds like one of those situations.
Well, right now I can replicate DC2 (which is the RID, PDC and
Infrastructure master) with any DC on the network (we have 3 domains).
However, both DC2 and DC1 are showing up as RID and Infrastructure masters
now.

This is a problem. You will have to seize the RID Master and the
Infrastructure Master roles from one of those two DCs. And when you seize
one of the FSMO Roles from a DC ( using ntdsutil ) the DC from which you
seized that role should no longer be on the network.

What happens if you run 'netdom query fsmo' from any of the clients or
Servers ( you will have to have installed the Support Tools on the system in
question - or just the netdom utility... )?

What happens if you run replmon? There is a way to check which DC holds
each of the five roles.

Just when I was writing this message the helpdesk told me that nobody
could logon to the network. I unpluged the network cable from DC2. DC1
could
not contact the GC according to the event log. I restarted DC1 and
everything seems to be "fine" now.

Things are not really all that fine! There is a little bit to clean up. If
you plug in DC2 you will notice that there will be a problem again. And, by
unplugging DC2 are you not in a situation whereby you have only one DC?
I wonder what the next step would be.
 
Joao,

I would concentrate on cleaning up Active Directory first and foremost. You
have one Domain Controller that is fine ( DC1 ) and one that is causing you
problems ( DC2 ).

I would make crystal clear what DC holds what FSMO Roles. There seems to be
a problem with the RID Master and the Infrastructure Master. Fix this. If
you have to seize it from DC2 to DC1 using ntdsutil then do this. However,
once you do this then DC2 can not go back on the network.

Typically if you run dcpromo on a Domain Controller that holds any of the
FSMO Roles during the dcpromo process those specific roles will be
transferred to another Domain Controller. I would bet that this will not
work in this case. You are probably stuck with using ntdsutil to seize
them.

Maybe someone else has another idea?

Once you do this I would simply do a Metadata Cleanup. Get rid of all
references to any Domain Controller that is not actually there. I do not
remember if I provided a link to the MSKB Article on how to use ntdsutil to
do this. It is readily available.

I would then completely wipe out 'DC2' and start over. I would not simply
dcpromo ( to demote it ) and then dcpromo it again ( to promote it ). I
would wipe everything and start new. You can follow your internal
procedures at this point. It seems that your concern about not having two
Domain Controllers ( which is a very very good concern to have ) is still a
concern. Unless I have read something incorrectly or missed something you
currently have only one Domain Controller.

Make sure to install DNS on the 'new' Domain Controller and make sure that
it is DDNS ( assuming that you are running DDNS ). I would also *typically*
suggest that you make this DC a Global Catalog Server. But, since you
stated that you have three domains there is a potential problem. The DC
that holds the FSMO Role of Infrastructure Master ( right now a
problem..... ) should not be a Global Catalog Server -UNLESS- all Domain
Controllers are Global Catalog Servers. I would think that not all of your
Domain Controllers are Global Catalog Servers.....based on your other post
about not being able to log on. Which Domain Controllers are Global Catalog
Servers?

--
Cary W. Shultz
Roanoke, VA 24012
Microsoft Active Directory MVP

http://www.activedirectory-win2000.com
http://www.grouppolicy-win2000.com
 
This is the reason I suggested PSS. Unless you are familiar with metadata
cleanup, I still recommend that you get some assistance before you proceed
further. At this point, you are still in a recoverable shape. This
(http://www.readymaids.com/Portals/1/Docs/xferfsmos.htm) can help you a
little, although it doesn't go into the other post-cleanup tasks you will
have to do (ADSIEdit cleanup, DNS cleanup, etc)

--

Sincerely,
Dèjì Akómöláfé, MCSE+M MCSA+M MCP+I
Microsoft MVP - Directory Services
www.readymaids.com - we know IT
www.akomolafe.com
Do you now realize that Today is the Tomorrow you were worried about
Yesterday? -anon
 
Guys,

Surprinsingly DC2 is back online again and the only step taken was restoring
System State from a previous backup (3 or 4 days old). It was just a matter
of waiting for the replication to finish BUT the massive logon problem that
I had in parallel to that was enough to worry and take the box offline
again. Then I decided to turn it on after hours and see what happened.
Everything run just fine, I added new accounts and computers and saw them
being replicated as usual.

Because I forced the PDC and Infrastructure master to be transfered to DC1,
when I put DC2 back it "thought" it had those FMSO roles and that's why both
of them were showing as PDC and Infra (very concerning at that stage).
Perhaps the issue of people not being able to logon was simply because DC2
was receiving the updates and for some reason DC1 wasn't able to provide
logon access (?). Then I restarted DC1 and everyone could logon again.

We have 3 domains (will consolidate in the future, hopefully): 1, 2, and 3.
GC is located in domain 1. DC1 and DC2 in domain 3. Domain 2 is for staff.

Indeed, the tips provided by you will be very useful when we consolidate the
domains (in case we have issues demoting DCs).

I do appreciate your help on this.

Thank you
Joao Paulo
 
Back
Top