Hi
In our testlab I have a fairly good copy of the production
environment
running within virtual machines. What I would like is to make
some kind
of snapshot before someone starts playing in the testlab. So
that after
they leave (and messed up the testlab) I can revert back to
those
snapshots.
Now my main concern in this is:
- How to make sure that before making a snapshot all
replication trafic
is complete.
- Should snapshots be made at the exact same time or is it ok
to first
make a snapshot from the first DC and later of the second DC?
- How to revive those snapshots? Will just bringing them
online be
enough or do I have to force replication? Will the dc's have
no
problems being down for some time?
With kind regards,
Gabrie
When thinking about a testlab using a virtual environment you might be
bitten by 2 different issues:
(1) USN rollbacks
(2) Time between replication cycle before the snapshot and the
replication cycle after starting the VMs again
(1) USN rollbacks....
If you create a snapshot of some DC while the others are up and
running the state of the other DCs might change. This can happen when
you create a snapshot for each DC while the others are running. When
will trouble hit the fan? It will hit the fan the moment you revert
all DCs to the snaphots created and you’re DCs will suffer from USN
rollback!
For more information on the USN rollback issue see:
MS-KBQ885875_How to detect and recover from a USN rollback in Windows
2000 Server
MS-KBQ875495_How to detect and recover from a USN rollback in Windows
Server 2003
POSSIBLE SOLUTIONS:
(A) Power Off ALL DCs FIRST and THEN create the snapshots for each
DC
(B) Suspend ALL DCs FIRST and THEN create the snapshots for each DC
When the test environment is screwed, you can revert to the snapshots
without a problem
(2) Time between replication cycle before the snapshot and the
replication cycle after starting the VMs again...
DCs keep track of the last time these DCs successfully replicated with
each other. If the time between a certain replication cycle exceeds
the tombstone lifetime the DCs do not trust each other anymore as
lingering objects MAY exist on the disconnected DC/replica. To protect
themselves, replication is not allowed and therefore halted. This
reported through event ID 2042. In this case we are talking about W2K3
DCs. I my memory serves me right this protection does not exist for
W2K (and I’m not talking about "Strict Replication Consistency"
which is support by W2K DCs with a certain SP and W2K3 DCs)
By default the value of "Allow Replication With Divergent and Corrupt
Partner" is set to 0 (zero). When not specified it defaults to that
value. To allow replication between those DCs the value must be set to
1, but before doing that any existing lingering object MUST BE
removed/cleaned (e.g. repadmin)
DCs will also report event ID 1864 if they have not replicated for a
certain time with a certain DC!
For more information on exceeding the tombstone lifetime or lingering
objects see:
* Fixing Replication Lingering Object Problems (Event IDs 1388, 1988,
2042)
-->
http://www.microsoft.com/technet/pr...ons/4a1f420d-25d6-417c-9d8b-6e22f472ef3c.mspx
* Event ID 1388 or 1988: A lingering object is detected
-->
http://www.microsoft.com/technet/pr...ons/77dbd146-f265-4d64-bdac-605ecbf1035f.mspx
* A deleted account remains in the Address Book, e-mail is not
received, or a duplicate account exists
-->
http://www.microsoft.com/technet/pr...ons/9b1c2595-4fe2-457a-8868-a9025a307c63.mspx
* Event ID 2042: It has been too long since this machine replicated
-->
http://www.microsoft.com/technet/pr...ons/34c15446-b47f-4d51-8e4a-c14527060f90.mspx
POSSIBLE SOLUTIONS:
(A) If the DCs are W2K3 configure them with "Allow Replication With
Divergent and Corrupt Partner" set to 1. Although DCs are then able to
replicate with each other you still might end up with lingering
objects. And if "Strict Replication Consistency" is set to 1,
replication with a DC for a certain naming context that contains the
lingering objects will still be halted. In that case you first need to
clean the lingering objects BEFORE setting "Allow Replication With
Divergent and Corrupt Partner" to 1. Although this is possible, I do
not recommend it, because if not done correctly you might end up with
problems
(B) For this to work correctly you need to determine (by guessing)
what the maximum disconnection time will be until the virtual machines
are updated with new updates to represent the production environment.
As you may know the "Tombstone Lifetime" of a freshly installed W2K
AD, of a freshly installed W2K3 AD, of any existing AD to W2K3 AD (any
SP) will default to 60 days. In the case of a freshly installed
W2K3SP1 AD the value will be 180 days. So if you think you will update
the virtual machines within 180 days use 180 days as the "Tombstone
Lifetime". If you expect to update the virtual machines within 360
days use 360 days as the "Tombstone Lifetime". If you implement new
updates you should then create snapshots as mentioned earlier and that
moment will be the new starting point from where the period of the
"Tombstone Lifetime" starts counting. It is although possible that the
"Tombstone Lifetime" value in the test environment does not match the
value in the production environment. For me this would be acceptable
against having the risk of lingering objects and the need to clean
them!
Hope this helps and you understand what I’m trying to say!
Good luck!