AD's in virtual environments... rolling back

Gabrie · Oct 31, 2005

Hi

In our testlab I have a fairly good copy of the production environment
running within virtual machines. What I would like is to make some kind
of snapshot before someone starts playing in the testlab. So that after
they leave (and messed up the testlab) I can revert back to those
snapshots.

Now my main concern in this is:
- How to make sure that before making a snapshot all replication trafic
is complete.
- Should snapshots be made at the exact same time or is it ok to first
make a snapshot from the first DC and later of the second DC?
- How to revive those snapshots? Will just bringing them online be
enough or do I have to force replication? Will the dc's have no
problems being down for some time?

With kind regards,
Gabrie

Paul Bergson · Oct 31, 2005

If you have multiple dc's in your test lab and you only restore one of the
dc's the others will attempt to get the reset dc up to date via time stamps
problem, is tombstoning will bite you if the revival from the snapshot is
more than 60 days old. So if you are going to do a restore this dc has to
be authorities and the other dc's should be rebuilt if you want the reset DC
to be the authoritative one.

--

Paul Bergson MCT, MCSE, MCSA, CNE, CNA, CCA

This posting is provided "AS IS" with no warranties, and confers no rights.

Paul Bergson · Oct 31, 2005

authorities = authoritative (Stupid spell check)

--

Paul Bergson MCT, MCSE, MCSA, CNE, CNA, CCA

This posting is provided "AS IS" with no warranties, and confers no rights.

Gabrie · Nov 1, 2005

Hi

Well I have to restore ALL dc's at one. For example someone has been
testing a tool that changes the Schema or adds new OU's or policies or
whatever. I would then like to go back to the moment before the test
started.

Gabrie

Paul Bergson · Nov 1, 2005

You could take all dc's offline but the one for restore -or- you will have
to restore them all. AD is a multimaster database and the other DC's have
been updated via replication. You can;t restore one of the DC's and expect
the others to roll updates backwards, it doesn't work that way.

--

Paul Bergson MCT, MCSE, MCSA, CNE, CNA, CCA

This posting is provided "AS IS" with no warranties, and confers no rights.

Jorge_de_Almeida_Pinto · Nov 1, 2005

Hi

In our testlab I have a fairly good copy of the production
environment
running within virtual machines. What I would like is to make
some kind
of snapshot before someone starts playing in the testlab. So
that after
they leave (and messed up the testlab) I can revert back to
those
snapshots.

Now my main concern in this is:
- How to make sure that before making a snapshot all
replication trafic
is complete.
- Should snapshots be made at the exact same time or is it ok
to first
make a snapshot from the first DC and later of the second DC?
- How to revive those snapshots? Will just bringing them
online be
enough or do I have to force replication? Will the dc's have
no
problems being down for some time?

With kind regards,
Gabrie

When thinking about a testlab using a virtual environment you might be
bitten by 2 different issues:
(1) USN rollbacks
(2) Time between replication cycle before the snapshot and the
replication cycle after starting the VMs again

(1) USN rollbacks....
If you create a snapshot of some DC while the others are up and
running the state of the other DCs might change. This can happen when
you create a snapshot for each DC while the others are running. When
will trouble hit the fan? It will hit the fan the moment you revert
all DCs to the snaphots created and youâ€™re DCs will suffer from USN
rollback!

For more information on the USN rollback issue see:
MS-KBQ885875_How to detect and recover from a USN rollback in Windows
2000 Server
MS-KBQ875495_How to detect and recover from a USN rollback in Windows
Server 2003

POSSIBLE SOLUTIONS:
(A) Power Off ALL DCs FIRST and THEN create the snapshots for each
DC
(B) Suspend ALL DCs FIRST and THEN create the snapshots for each DC

When the test environment is screwed, you can revert to the snapshots
without a problem

(2) Time between replication cycle before the snapshot and the
replication cycle after starting the VMs again...
DCs keep track of the last time these DCs successfully replicated with
each other. If the time between a certain replication cycle exceeds
the tombstone lifetime the DCs do not trust each other anymore as
lingering objects MAY exist on the disconnected DC/replica. To protect
themselves, replication is not allowed and therefore halted. This
reported through event ID 2042. In this case we are talking about W2K3
DCs. I my memory serves me right this protection does not exist for
W2K (and Iâ€™m not talking about "Strict Replication Consistency"
which is support by W2K DCs with a certain SP and W2K3 DCs)
By default the value of "Allow Replication With Divergent and Corrupt
Partner" is set to 0 (zero). When not specified it defaults to that
value. To allow replication between those DCs the value must be set to
1, but before doing that any existing lingering object MUST BE
removed/cleaned (e.g. repadmin)
DCs will also report event ID 1864 if they have not replicated for a
certain time with a certain DC!

For more information on exceeding the tombstone lifetime or lingering
objects see:
* Fixing Replication Lingering Object Problems (Event IDs 1388, 1988,
2042)
-->
http://www.microsoft.com/technet/pr...ons/4a1f420d-25d6-417c-9d8b-6e22f472ef3c.mspx
* Event ID 1388 or 1988: A lingering object is detected
-->
http://www.microsoft.com/technet/pr...ons/77dbd146-f265-4d64-bdac-605ecbf1035f.mspx
* A deleted account remains in the Address Book, e-mail is not
received, or a duplicate account exists
-->
http://www.microsoft.com/technet/pr...ons/9b1c2595-4fe2-457a-8868-a9025a307c63.mspx
* Event ID 2042: It has been too long since this machine replicated
-->
http://www.microsoft.com/technet/pr...ons/34c15446-b47f-4d51-8e4a-c14527060f90.mspx

POSSIBLE SOLUTIONS:
(A) If the DCs are W2K3 configure them with "Allow Replication With
Divergent and Corrupt Partner" set to 1. Although DCs are then able to
replicate with each other you still might end up with lingering
objects. And if "Strict Replication Consistency" is set to 1,
replication with a DC for a certain naming context that contains the
lingering objects will still be halted. In that case you first need to
clean the lingering objects BEFORE setting "Allow Replication With
Divergent and Corrupt Partner" to 1. Although this is possible, I do
not recommend it, because if not done correctly you might end up with
problems
(B) For this to work correctly you need to determine (by guessing)
what the maximum disconnection time will be until the virtual machines
are updated with new updates to represent the production environment.
As you may know the "Tombstone Lifetime" of a freshly installed W2K
AD, of a freshly installed W2K3 AD, of any existing AD to W2K3 AD (any
SP) will default to 60 days. In the case of a freshly installed
W2K3SP1 AD the value will be 180 days. So if you think you will update
the virtual machines within 180 days use 180 days as the "Tombstone
Lifetime". If you expect to update the virtual machines within 360
days use 360 days as the "Tombstone Lifetime". If you implement new
updates you should then create snapshots as mentioned earlier and that
moment will be the new starting point from where the period of the
"Tombstone Lifetime" starts counting. It is although possible that the
"Tombstone Lifetime" value in the test environment does not match the
value in the production environment. For me this would be acceptable
against having the risk of lingering objects and the need to clean
them!

Hope this helps and you understand what Iâ€™m trying to say!
Good luck!

AD's in virtual environments... rolling back

Gabrie

Paul Bergson

Paul Bergson

Gabrie

Paul Bergson

Jorge_de_Almeida_Pinto