Active Directory in a mess

  • Thread starter Thread starter Steve
  • Start date Start date
S

Steve

(Sorry this is long...)

Basically I've inherited a network at short notice, with several
problems and several people making problems... and as I'm no Windows
expert I wonder if I could ask some advice?

The domain (Windows 2003) caters for around 150-200 computers max, of
which maybe half will be being used at any one time. It is provided
for one school, but two departments. Therefore, it is divided
logically into 2, and each half is being looked after by a different
admin. There is little communication between them (the admins that
is!). As far as I can see it, the only reason they share the same
domain is because the users need to be able to log on in either
department, so it looks like they don't want to maintain two lists of
users.

The above situation scares me somewhat, and today my worries were
justified when one of the admins re-connected a previous DC after it
had been disconnected for a few months. The first I noticed was all my
GPO-installed software uninstalled itself at boot-time. I then noticed
errors in the some event log, that more than one object had the same
upn, which was because 12 users suddenly had a blank login name!. I
fixed that pretty quickly though. After booting the rogue DC off the
network and fixing the remaining replication issues I now find that my
machine, although it can re-join the domain, cannot authenticate/locate
a domain controller, hence no GPOs are being applied. GPResult shows
"Access Denied" for each of them. Incidentally, GPResult lists my
machine as being a member of security group "Null SID". The error in
my event log is "Kerberos - PAC authentication failure". Before I left
work I noticed the same error on another machine for which I'm not
responsible... I fear the worst come Monday morning. There is also a
DNS error on the main DC, "DNS received a critical failure from the
Active Directory" (or words to that effect).

I guess the old DC, although it shouldn't replicate old objects,
somehow screwed up AD and DNS in a way that I can't seem to repair. If
I were to remove the domain forward lookup zone, and use the DNS
facility to re-create the AD stuff in DNS, will that work, or will it
completely mess up everybody?

I have a backup of AD from 2 days ago, but since then I've transferred
a dozen or so machines from another (NT) domain to this one and it's a
real pain in the ass. I don't want to lose that. How easy/hard is it
to do an authoritative restore, and is that the only/best way to fix
this?

What is also of interest to me is, why we're not running 2 separate
domains? If the admins don't really know what the other is doing,
because they work in different sites (yes, the domain is in two
physically separate locations) this kind of event I'm sure will happen
all too often.

Given the requirement that the same set of users will need to log in at
either site and access their information, dos it make sense to remain
as one domain, or have two separate domains with trusts? Would
accounts have to be synchronized between both domains? If there was a
specially created top-level domain, and these two departments each had
a child domain, is that the better way to go?

Any help would be most appreciated.

Kind regards,
Steve :)
 
Steve,
Yeap, it is the mess. I have one off-line DC which is TOTALLY connected its
own network,
not whatsoever that DC connecting to my production will be "fatality" for my
AD2000.

Any way, you must be looking first for what roll domain controllers keep,
starting from that!!
I don't know what backup of domain controller you have, I hope you have the
first domain
controller's backup in hand, that will be helpful that doing authoritive
restore of the first
domain controller will be causing other domain controller replicating from
the restored first
domain controller, it might be fixed that fast.

BTW, the admin who connected that old DC up should be getting official
warning!!

Hope it helps,
JPTH
 
for me it is difficult to say what you should do in technical terms as I
read a lot is wrong, but it is scattered all over the place.

can you just summarize what happened, the issues that came by because of
that, what you have done until now, which DC is creeming what errors, etc,
etc.

What is also of interest to me is, why we're not running 2 separate
domains? If the admins don't really know what the other is doing,
because they work in different sites (yes, the domain is in two
physically separate locations) this kind of event I'm sure will happen
all too often.

two separate domains within the same forest is NO difference in terms of
issues that can arise.

Given the requirement that the same set of users will need to log in at
either site and access their information, dos it make sense to remain
as one domain, or have two separate domains with trusts? Would
accounts have to be synchronized between both domains? If there was a
specially created top-level domain, and these two departments each had
a child domain, is that the better way to go?

either those admins should start work together and live within the same
forest/domain or you should kick either one out or you should create a
separate domain, each within its own forest (with trusts). The issue with
the latter setup is that you need to sync the GAL between the forest
(GALsync with IIFP), if needed also sync free/busy stuff and you need to ACL
the resources in a way that users from either forest can access what is
needed

--

Cheers,
(HOPEFULLY THIS INFORMATION HELPS YOU!)

# Jorge de Almeida Pinto # MVP Windows Server - Directory Services

BLOG (WEB-BASED)--> http://blogs.dirteam.com/blogs/jorge/default.aspx
BLOG (RSS-FEEDS)--> http://blogs.dirteam.com/blogs/jorge/rss.aspx
 
Steve said:
(Sorry this is long...)

Basically I've inherited a network at short notice, with several
problems and several people making problems... and as I'm no Windows
expert I wonder if I could ask some advice?

Sure, but if you have a real mess as it seems we may
need to start with a few basics, clean those up, and
then see where you are.
The domain (Windows 2003) caters for around 150-200 computers max, of
which maybe half will be being used at any one time. It is provided
for one school, but two departments. Therefore, it is divided
logically into 2, and each half is being looked after by a different
admin. There is little communication between them (the admins that
is!). As far as I can see it, the only reason they share the same
domain is because the users need to be able to log on in either
department, so it looks like they don't want to maintain two lists of
users.

Login from two departments is a form of sharing resources
so that having them in the same domain, or at least trusted
domains within one forest, makes excellent sense.

The above situation scares me somewhat, and today my worries were
justified when one of the admins re-connected a previous DC after it
had been disconnected for a few months. The first I noticed was all my
GPO-installed software uninstalled itself at boot-time.

Bad idea, and this COULD account for your GPO problems.
Chances are the 'old-returned' DC was not synchronized and
unable to synchronize due to either being old or out of time
sync -- if you have ANY good DCs, remove this one.

I then noticed
errors in the some event log, that more than one object had the same
upn, which was because 12 users suddenly had a blank login name!. I
fixed that pretty quickly though. After booting the rogue DC off the
network and fixing the remaining replication issues I now find that my
machine, although it can re-join the domain, cannot authenticate/locate
a domain controller, hence no GPOs are being applied.

Chances are it authenticated -- not quite sure how -- with the
old DC and perhaps changed its account password, now it
can't authenticate with the others. (Once the others work,
DCPromo this to NON-DC, then add it back to the domain
and optionally DCPromo it again to DC -- I call this double
DCPromo a "DCPromo cycle".)


RESET any such non-DC machines (AD Users and Computers,
right click on Computer, RESET.)

Also check TIME on every machine, and even if they look
the same check TIME ZONE (having time zone wrong and
then setting the machines to "look right" will cause time to
be off by one or more hours.)
GPResult shows
"Access Denied" for each of them. Incidentally, GPResult lists my
machine as being a member of security group "Null SID". The error in
my event log is "Kerberos - PAC authentication failure". Before I left
work I noticed the same error on another machine for which I'm not
responsible... I fear the worst come Monday morning. There is also a
DNS error on the main DC, "DNS received a critical failure from the
Active Directory" (or words to that effect).

You need DNS to work for AD to work. Both replication and
authentication.

What type of DNS do you use? It might be easier to disable all
but one of them (the best) and set EVERY MACHINE (DC,
Server, AND workstation) to use strictly this DNS server until
you get everything working. (It won't be fault tolerant since this
becomes a single point of failure but you only do this for a few
hours to a few days at most until everything works and is
replicated.)

Set the (single) DNS server in each computers NIC->IP
properties. (Remember: set the DCs AND the other computers.)

(And you might wish to do your Restore below if this is
truly screwed up.)

If you get even ONE DC that seems reliable then you can
DCPromo cycle all of the rest (but do them one at a time
unless you are sure you have at least one working one.)

Also, locate your Five ("FSMO") single masters and make
sure your "main DC" has these (transfer the roles if necessary)
is a GC -- in fact, at some point make every DC a GC but
be certain you don't have more than one of the single master
roles.
I guess the old DC, although it shouldn't replicate old objects,
somehow screwed up AD and DNS in a way that I can't seem to repair.

Why? How do you know it is "screwed up"?

Run DCDiag on each and every DC, send the output
to a file (>filename.txt) and search for FAIL and WARN,
then fix all of these.
If
I were to remove the domain forward lookup zone, and use the DNS
facility to re-create the AD stuff in DNS, will that work, or will it
completely mess up everybody?

Generally, a DC will re-register itself if you set all of
them to use the DNS Server(s) with the Dynamic Zone.

DCDiag /fix (or restarting NetLogon) should do the trick
once you have the machines set to use the working DNS
server.
I have a backup of AD from 2 days ago, but since then I've transferred
a dozen or so machines from another (NT) domain to this one and it's a
real pain in the ass. I don't want to lose that. How easy/hard is it
to do an authoritative restore, and is that the only/best way to fix
this?

It should NOT be a pain, so this increases the odds you
have your CLIENT DNS settings messed up.
What is also of interest to me is, why we're not running 2 separate
domains? If the admins don't really know what the other is doing,
because they work in different sites (yes, the domain is in two
physically separate locations) this kind of event I'm sure will happen
all too often.

Probably not. Nothing definitely wrong with this unless
they are both going to modify the Domain GPOs or something
else domain wide rather than work in their own OUs mostly.

Also, "Site" is a technical term and (eventually) you will
want to setup true Sites, define them by creating Subnets,
and adding Site Links, but you can wait until the DCs are
working and replicating before you do this in Sites and
Services.
Given the requirement that the same set of users will need to log in at
either site and access their information, dos it make sense to remain
as one domain, or have two separate domains with trusts? Would
accounts have to be synchronized between both domains? If there was a
specially created top-level domain, and these two departments each had
a child domain, is that the better way to go?


Consider DNS for AD
1) Dynamic for the zone supporting AD
2) All internal DNS clients NIC\IP properties must specify SOLELY
that internal, dynamic DNS server (set.)
3) DCs and even DNS servers are DNS clients too -- see #2
4) If you have more than one Domain, every DNS server must
be able to resolve ALL domains (either directly or indirectly)

netdiag /fix

....or maybe:

dcdiag /fix

(Win2003 can do this from Support tools):
nltest /dsregdns /server:DC-ServerNameGoesHere
http://support.microsoft.com/kb/q260371/

Ensure that DNS zones/domains are fully replicated to all DNS
servers for that (internal) zone/domain.

Also useful may be running DCDiag on each DC, sending the
output to a text file, and searching for FAIL, ERROR, WARN.

Single Label domain zone names are a problem Google:
[ "SINGLE LABEL" domain names DNS 2000 | 2003 microsoft: ]
 
Best thing to do (my personal opinion), call MS Tech Support...on the phone
for resolving
the problem,,,,cost $$, not that much, but it will definitely resolve
it...documenting the procedure troubleshooting step for furture problem
happening like this.

Hope you get all resolved!!
JPTH

Herb Martin said:
Steve said:
(Sorry this is long...)

Basically I've inherited a network at short notice, with several
problems and several people making problems... and as I'm no Windows
expert I wonder if I could ask some advice?

Sure, but if you have a real mess as it seems we may
need to start with a few basics, clean those up, and
then see where you are.
The domain (Windows 2003) caters for around 150-200 computers max, of
which maybe half will be being used at any one time. It is provided
for one school, but two departments. Therefore, it is divided
logically into 2, and each half is being looked after by a different
admin. There is little communication between them (the admins that
is!). As far as I can see it, the only reason they share the same
domain is because the users need to be able to log on in either
department, so it looks like they don't want to maintain two lists of
users.

Login from two departments is a form of sharing resources
so that having them in the same domain, or at least trusted
domains within one forest, makes excellent sense.

The above situation scares me somewhat, and today my worries were
justified when one of the admins re-connected a previous DC after it
had been disconnected for a few months. The first I noticed was all my
GPO-installed software uninstalled itself at boot-time.

Bad idea, and this COULD account for your GPO problems.
Chances are the 'old-returned' DC was not synchronized and
unable to synchronize due to either being old or out of time
sync -- if you have ANY good DCs, remove this one.

I then noticed
errors in the some event log, that more than one object had the same
upn, which was because 12 users suddenly had a blank login name!. I
fixed that pretty quickly though. After booting the rogue DC off the
network and fixing the remaining replication issues I now find that my
machine, although it can re-join the domain, cannot authenticate/locate
a domain controller, hence no GPOs are being applied.

Chances are it authenticated -- not quite sure how -- with the
old DC and perhaps changed its account password, now it
can't authenticate with the others. (Once the others work,
DCPromo this to NON-DC, then add it back to the domain
and optionally DCPromo it again to DC -- I call this double
DCPromo a "DCPromo cycle".)


RESET any such non-DC machines (AD Users and Computers,
right click on Computer, RESET.)

Also check TIME on every machine, and even if they look
the same check TIME ZONE (having time zone wrong and
then setting the machines to "look right" will cause time to
be off by one or more hours.)
GPResult shows
"Access Denied" for each of them. Incidentally, GPResult lists my
machine as being a member of security group "Null SID". The error in
my event log is "Kerberos - PAC authentication failure". Before I left
work I noticed the same error on another machine for which I'm not
responsible... I fear the worst come Monday morning. There is also a
DNS error on the main DC, "DNS received a critical failure from the
Active Directory" (or words to that effect).

You need DNS to work for AD to work. Both replication and
authentication.

What type of DNS do you use? It might be easier to disable all
but one of them (the best) and set EVERY MACHINE (DC,
Server, AND workstation) to use strictly this DNS server until
you get everything working. (It won't be fault tolerant since this
becomes a single point of failure but you only do this for a few
hours to a few days at most until everything works and is
replicated.)

Set the (single) DNS server in each computers NIC->IP
properties. (Remember: set the DCs AND the other computers.)

(And you might wish to do your Restore below if this is
truly screwed up.)

If you get even ONE DC that seems reliable then you can
DCPromo cycle all of the rest (but do them one at a time
unless you are sure you have at least one working one.)

Also, locate your Five ("FSMO") single masters and make
sure your "main DC" has these (transfer the roles if necessary)
is a GC -- in fact, at some point make every DC a GC but
be certain you don't have more than one of the single master
roles.
I guess the old DC, although it shouldn't replicate old objects,
somehow screwed up AD and DNS in a way that I can't seem to repair.

Why? How do you know it is "screwed up"?

Run DCDiag on each and every DC, send the output
to a file (>filename.txt) and search for FAIL and WARN,
then fix all of these.
If
I were to remove the domain forward lookup zone, and use the DNS
facility to re-create the AD stuff in DNS, will that work, or will it
completely mess up everybody?

Generally, a DC will re-register itself if you set all of
them to use the DNS Server(s) with the Dynamic Zone.

DCDiag /fix (or restarting NetLogon) should do the trick
once you have the machines set to use the working DNS
server.
I have a backup of AD from 2 days ago, but since then I've transferred
a dozen or so machines from another (NT) domain to this one and it's a
real pain in the ass. I don't want to lose that. How easy/hard is it
to do an authoritative restore, and is that the only/best way to fix
this?

It should NOT be a pain, so this increases the odds you
have your CLIENT DNS settings messed up.
What is also of interest to me is, why we're not running 2 separate
domains? If the admins don't really know what the other is doing,
because they work in different sites (yes, the domain is in two
physically separate locations) this kind of event I'm sure will happen
all too often.

Probably not. Nothing definitely wrong with this unless
they are both going to modify the Domain GPOs or something
else domain wide rather than work in their own OUs mostly.

Also, "Site" is a technical term and (eventually) you will
want to setup true Sites, define them by creating Subnets,
and adding Site Links, but you can wait until the DCs are
working and replicating before you do this in Sites and
Services.
Given the requirement that the same set of users will need to log in at
either site and access their information, dos it make sense to remain
as one domain, or have two separate domains with trusts? Would
accounts have to be synchronized between both domains? If there was a
specially created top-level domain, and these two departments each had
a child domain, is that the better way to go?


Consider DNS for AD
1) Dynamic for the zone supporting AD
2) All internal DNS clients NIC\IP properties must specify SOLELY
that internal, dynamic DNS server (set.)
3) DCs and even DNS servers are DNS clients too -- see #2
4) If you have more than one Domain, every DNS server must
be able to resolve ALL domains (either directly or indirectly)

netdiag /fix

...or maybe:

dcdiag /fix

(Win2003 can do this from Support tools):
nltest /dsregdns /server:DC-ServerNameGoesHere
http://support.microsoft.com/kb/q260371/

Ensure that DNS zones/domains are fully replicated to all DNS
servers for that (internal) zone/domain.

Also useful may be running DCDiag on each DC, sending the
output to a text file, and searching for FAIL, ERROR, WARN.

Single Label domain zone names are a problem Google:
[ "SINGLE LABEL" domain names DNS 2000 | 2003 microsoft: ]


--
Herb Martin, MCSE, MVP
Accelerated MCSE
http://www.LearnQuick.Com
[phone number on web site]
Any help would be most appreciated.

Kind regards,
Steve :)
 
J.H said:
Best thing to do (my personal opinion), call MS Tech Support...on the
phone
for resolving
the problem,,,,cost $$, not that much, but it will definitely resolve
it...documenting the procedure troubleshooting step for furture problem
happening like this.

If I had that big a mess, then I would call a consultant
before I would try to work through such a dispersed
set of issues with phone support.

My experience indicates this is NOT where phone
support shines (i.e., when the problem is multiple and
not focused on one or two clear problems.)

--
Herb Martin, MCSE, MVP
Accelerated MCSE
http://www.LearnQuick.Com
[phone number on web site]
Hope you get all resolved!!
JPTH

Herb Martin said:
Steve said:
(Sorry this is long...)

Basically I've inherited a network at short notice, with several
problems and several people making problems... and as I'm no Windows
expert I wonder if I could ask some advice?

Sure, but if you have a real mess as it seems we may
need to start with a few basics, clean those up, and
then see where you are.
The domain (Windows 2003) caters for around 150-200 computers max, of
which maybe half will be being used at any one time. It is provided
for one school, but two departments. Therefore, it is divided
logically into 2, and each half is being looked after by a different
admin. There is little communication between them (the admins that
is!). As far as I can see it, the only reason they share the same
domain is because the users need to be able to log on in either
department, so it looks like they don't want to maintain two lists of
users.

Login from two departments is a form of sharing resources
so that having them in the same domain, or at least trusted
domains within one forest, makes excellent sense.

The above situation scares me somewhat, and today my worries were
justified when one of the admins re-connected a previous DC after it
had been disconnected for a few months. The first I noticed was all my
GPO-installed software uninstalled itself at boot-time.

Bad idea, and this COULD account for your GPO problems.
Chances are the 'old-returned' DC was not synchronized and
unable to synchronize due to either being old or out of time
sync -- if you have ANY good DCs, remove this one.

I then noticed
errors in the some event log, that more than one object had the same
upn, which was because 12 users suddenly had a blank login name!. I
fixed that pretty quickly though. After booting the rogue DC off the
network and fixing the remaining replication issues I now find that my
machine, although it can re-join the domain, cannot authenticate/locate
a domain controller, hence no GPOs are being applied.

Chances are it authenticated -- not quite sure how -- with the
old DC and perhaps changed its account password, now it
can't authenticate with the others. (Once the others work,
DCPromo this to NON-DC, then add it back to the domain
and optionally DCPromo it again to DC -- I call this double
DCPromo a "DCPromo cycle".)


RESET any such non-DC machines (AD Users and Computers,
right click on Computer, RESET.)

Also check TIME on every machine, and even if they look
the same check TIME ZONE (having time zone wrong and
then setting the machines to "look right" will cause time to
be off by one or more hours.)
GPResult shows
"Access Denied" for each of them. Incidentally, GPResult lists my
machine as being a member of security group "Null SID". The error in
my event log is "Kerberos - PAC authentication failure". Before I left
work I noticed the same error on another machine for which I'm not
responsible... I fear the worst come Monday morning. There is also a
DNS error on the main DC, "DNS received a critical failure from the
Active Directory" (or words to that effect).

You need DNS to work for AD to work. Both replication and
authentication.

What type of DNS do you use? It might be easier to disable all
but one of them (the best) and set EVERY MACHINE (DC,
Server, AND workstation) to use strictly this DNS server until
you get everything working. (It won't be fault tolerant since this
becomes a single point of failure but you only do this for a few
hours to a few days at most until everything works and is
replicated.)

Set the (single) DNS server in each computers NIC->IP
properties. (Remember: set the DCs AND the other computers.)

(And you might wish to do your Restore below if this is
truly screwed up.)

If you get even ONE DC that seems reliable then you can
DCPromo cycle all of the rest (but do them one at a time
unless you are sure you have at least one working one.)

Also, locate your Five ("FSMO") single masters and make
sure your "main DC" has these (transfer the roles if necessary)
is a GC -- in fact, at some point make every DC a GC but
be certain you don't have more than one of the single master
roles.
I guess the old DC, although it shouldn't replicate old objects,
somehow screwed up AD and DNS in a way that I can't seem to repair.

Why? How do you know it is "screwed up"?

Run DCDiag on each and every DC, send the output
to a file (>filename.txt) and search for FAIL and WARN,
then fix all of these.
If
I were to remove the domain forward lookup zone, and use the DNS
facility to re-create the AD stuff in DNS, will that work, or will it
completely mess up everybody?

Generally, a DC will re-register itself if you set all of
them to use the DNS Server(s) with the Dynamic Zone.

DCDiag /fix (or restarting NetLogon) should do the trick
once you have the machines set to use the working DNS
server.
I have a backup of AD from 2 days ago, but since then I've transferred
a dozen or so machines from another (NT) domain to this one and it's a
real pain in the ass. I don't want to lose that. How easy/hard is it
to do an authoritative restore, and is that the only/best way to fix
this?

It should NOT be a pain, so this increases the odds you
have your CLIENT DNS settings messed up.
What is also of interest to me is, why we're not running 2 separate
domains? If the admins don't really know what the other is doing,
because they work in different sites (yes, the domain is in two
physically separate locations) this kind of event I'm sure will happen
all too often.

Probably not. Nothing definitely wrong with this unless
they are both going to modify the Domain GPOs or something
else domain wide rather than work in their own OUs mostly.

Also, "Site" is a technical term and (eventually) you will
want to setup true Sites, define them by creating Subnets,
and adding Site Links, but you can wait until the DCs are
working and replicating before you do this in Sites and
Services.
Given the requirement that the same set of users will need to log in at
either site and access their information, dos it make sense to remain
as one domain, or have two separate domains with trusts? Would
accounts have to be synchronized between both domains? If there was a
specially created top-level domain, and these two departments each had
a child domain, is that the better way to go?


Consider DNS for AD
1) Dynamic for the zone supporting AD
2) All internal DNS clients NIC\IP properties must specify SOLELY
that internal, dynamic DNS server (set.)
3) DCs and even DNS servers are DNS clients too -- see #2
4) If you have more than one Domain, every DNS server must
be able to resolve ALL domains (either directly or
indirectly)

netdiag /fix

...or maybe:

dcdiag /fix

(Win2003 can do this from Support tools):
nltest /dsregdns /server:DC-ServerNameGoesHere
http://support.microsoft.com/kb/q260371/

Ensure that DNS zones/domains are fully replicated to all DNS
servers for that (internal) zone/domain.

Also useful may be running DCDiag on each DC, sending the
output to a text file, and searching for FAIL, ERROR, WARN.

Single Label domain zone names are a problem Google:
[ "SINGLE LABEL" domain names DNS 2000 | 2003 microsoft: ]


--
Herb Martin, MCSE, MVP
Accelerated MCSE
http://www.LearnQuick.Com
[phone number on web site]
Any help would be most appreciated.

Kind regards,
Steve :)
 
the following steps are to be followed :

a) run MPSreports on all DCS...u can get it from support.microsoft.com, file
is mpsrpt_dirsvc.

b) Check netdiag and dcdiag for errors.

c) The old DC which has been joined newly to the domain had got lingering
objects.....its replicated with your good DCS (not quite sure how)...but
somehow...and then you got a lot of lingering objects in AD.

d) Now the rogue DC has changed its machine account password and it will get
access denied when trying to replicate with other DCs, as the other PDC's
krbtgt will not recognize the old kerberos ticket of this DC and will not
communicate with it. This is generally called, in short, a broken secure
channel issue.

e) You have to take that rogue DC down.....since its secure channel is
broken, you won't be able to dcpromo it down gracefully. You need to run a
command dcpromo /forceremoval. Even if this doesn't work, do this :

open regedit on that bad DC and
goto->regedit->HKLM->system->ccs->control->product options->product
type->should say LanmanNT.....change it to ServerNT and reboot...it will go
down as a member server.

f) Perform metadata cleanup on the good PDC emulator and remove all its
entries from AD.

g) Once its gone down, start investigating DNS. The DCs should point to each
other for preferred and alternate DNS.

h) Open up the AD-integrated zones and check if the DCs' GUIDs or C-Name
records are regietered under ._msdcs folder.

i) Try pinging the GUID of each DC, to and fro and see if ping is
successful...if it is...know DNS for AD replication is working fine, as AD
replication uses these GUIDs of each other to replicate.

j) If DNS is fine, then please check the dcdiags of the dcs in order to
troubleshoot further.....3 tools are sufficient to troubleshoot AD
replication issues : dcdiag / netdiag in verbose mode, the command repadmin,
and replmon.



Herb Martin said:
Steve said:
(Sorry this is long...)

Basically I've inherited a network at short notice, with several
problems and several people making problems... and as I'm no Windows
expert I wonder if I could ask some advice?

Sure, but if you have a real mess as it seems we may
need to start with a few basics, clean those up, and
then see where you are.
The domain (Windows 2003) caters for around 150-200 computers max, of
which maybe half will be being used at any one time. It is provided
for one school, but two departments. Therefore, it is divided
logically into 2, and each half is being looked after by a different
admin. There is little communication between them (the admins that
is!). As far as I can see it, the only reason they share the same
domain is because the users need to be able to log on in either
department, so it looks like they don't want to maintain two lists of
users.

Login from two departments is a form of sharing resources
so that having them in the same domain, or at least trusted
domains within one forest, makes excellent sense.

The above situation scares me somewhat, and today my worries were
justified when one of the admins re-connected a previous DC after it
had been disconnected for a few months. The first I noticed was all my
GPO-installed software uninstalled itself at boot-time.

Bad idea, and this COULD account for your GPO problems.
Chances are the 'old-returned' DC was not synchronized and
unable to synchronize due to either being old or out of time
sync -- if you have ANY good DCs, remove this one.

I then noticed
errors in the some event log, that more than one object had the same
upn, which was because 12 users suddenly had a blank login name!. I
fixed that pretty quickly though. After booting the rogue DC off the
network and fixing the remaining replication issues I now find that my
machine, although it can re-join the domain, cannot authenticate/locate
a domain controller, hence no GPOs are being applied.

Chances are it authenticated -- not quite sure how -- with the
old DC and perhaps changed its account password, now it
can't authenticate with the others. (Once the others work,
DCPromo this to NON-DC, then add it back to the domain
and optionally DCPromo it again to DC -- I call this double
DCPromo a "DCPromo cycle".)


RESET any such non-DC machines (AD Users and Computers,
right click on Computer, RESET.)

Also check TIME on every machine, and even if they look
the same check TIME ZONE (having time zone wrong and
then setting the machines to "look right" will cause time to
be off by one or more hours.)
GPResult shows
"Access Denied" for each of them. Incidentally, GPResult lists my
machine as being a member of security group "Null SID". The error in
my event log is "Kerberos - PAC authentication failure". Before I left
work I noticed the same error on another machine for which I'm not
responsible... I fear the worst come Monday morning. There is also a
DNS error on the main DC, "DNS received a critical failure from the
Active Directory" (or words to that effect).

You need DNS to work for AD to work. Both replication and
authentication.

What type of DNS do you use? It might be easier to disable all
but one of them (the best) and set EVERY MACHINE (DC,
Server, AND workstation) to use strictly this DNS server until
you get everything working. (It won't be fault tolerant since this
becomes a single point of failure but you only do this for a few
hours to a few days at most until everything works and is
replicated.)

Set the (single) DNS server in each computers NIC->IP
properties. (Remember: set the DCs AND the other computers.)

(And you might wish to do your Restore below if this is
truly screwed up.)

If you get even ONE DC that seems reliable then you can
DCPromo cycle all of the rest (but do them one at a time
unless you are sure you have at least one working one.)

Also, locate your Five ("FSMO") single masters and make
sure your "main DC" has these (transfer the roles if necessary)
is a GC -- in fact, at some point make every DC a GC but
be certain you don't have more than one of the single master
roles.
I guess the old DC, although it shouldn't replicate old objects,
somehow screwed up AD and DNS in a way that I can't seem to repair.

Why? How do you know it is "screwed up"?

Run DCDiag on each and every DC, send the output
to a file (>filename.txt) and search for FAIL and WARN,
then fix all of these.
If
I were to remove the domain forward lookup zone, and use the DNS
facility to re-create the AD stuff in DNS, will that work, or will it
completely mess up everybody?

Generally, a DC will re-register itself if you set all of
them to use the DNS Server(s) with the Dynamic Zone.

DCDiag /fix (or restarting NetLogon) should do the trick
once you have the machines set to use the working DNS
server.
I have a backup of AD from 2 days ago, but since then I've transferred
a dozen or so machines from another (NT) domain to this one and it's a
real pain in the ass. I don't want to lose that. How easy/hard is it
to do an authoritative restore, and is that the only/best way to fix
this?

It should NOT be a pain, so this increases the odds you
have your CLIENT DNS settings messed up.
What is also of interest to me is, why we're not running 2 separate
domains? If the admins don't really know what the other is doing,
because they work in different sites (yes, the domain is in two
physically separate locations) this kind of event I'm sure will happen
all too often.

Probably not. Nothing definitely wrong with this unless
they are both going to modify the Domain GPOs or something
else domain wide rather than work in their own OUs mostly.

Also, "Site" is a technical term and (eventually) you will
want to setup true Sites, define them by creating Subnets,
and adding Site Links, but you can wait until the DCs are
working and replicating before you do this in Sites and
Services.
Given the requirement that the same set of users will need to log in at
either site and access their information, dos it make sense to remain
as one domain, or have two separate domains with trusts? Would
accounts have to be synchronized between both domains? If there was a
specially created top-level domain, and these two departments each had
a child domain, is that the better way to go?


Consider DNS for AD
1) Dynamic for the zone supporting AD
2) All internal DNS clients NIC\IP properties must specify SOLELY
that internal, dynamic DNS server (set.)
3) DCs and even DNS servers are DNS clients too -- see #2
4) If you have more than one Domain, every DNS server must
be able to resolve ALL domains (either directly or indirectly)

netdiag /fix

....or maybe:

dcdiag /fix

(Win2003 can do this from Support tools):
nltest /dsregdns /server:DC-ServerNameGoesHere
http://support.microsoft.com/kb/q260371/

Ensure that DNS zones/domains are fully replicated to all DNS
servers for that (internal) zone/domain.

Also useful may be running DCDiag on each DC, sending the
output to a text file, and searching for FAIL, ERROR, WARN.

Single Label domain zone names are a problem Google:
[ "SINGLE LABEL" domain names DNS 2000 | 2003 microsoft: ]


--
Herb Martin, MCSE, MVP
Accelerated MCSE
http://www.LearnQuick.Com
[phone number on web site]
Any help would be most appreciated.

Kind regards,
Steve :)
 
Back
Top