G
Guest
Hi,
One UK Secondary School, 9 servers of which 3 are Domain Controllers. One DC
runs Exchange 2003, One runs SQL 2000 & SMS 2003. All servers are Windows
Server Enterprise 2003 running at 2003 domain and 2003 forest level. All DC
are GC. Single domain, two buildings linked by fibre, dc's in each building.
Domain created November 2003.
We have 650 clients all running Windows XP PRO with a mix of SP1 & SP2. Our
network is behind ISA 2000 and we run Symantec AV 8.1.
At the start of the new school year in September we had about two weeks of
network outages, sporadic in nature they lasted for a couple of hours at a
time, these outages caused replication problems. Once we had about a week of
stable network operation when the replication errors became more frequent.
Our network monitoring at that time consisted of pinging servers, all the
pings were fine but the replication errors continued.
I ran a Netdiag and DCdiag which failed on FSMO, our PDC was down. I seized
the role and a couple of days later DCdiag passed.FSMO. Replication issues
continued so I installed a trial copy of Solar Winds, and after a couple of
days one of our core switches was showing 60% packet loss. Said switch was
replaced and replication errors diminished only to be replaced by GP 1038
and 1050 errors (unable to find gpt.ini).
Under guidance from MS PSS we uninstalled DNS on all but one DC, deleted the
zone and restarted the remaining DC. Our zone was re created and we
re-installed dns on another server. We reduced our DC's from 5 to 2 and left
the AD to replicate and synchronize over this last weeked.
Monday Morning and we arrived at work to discover a fibre module had failed
sometime Saturday. We replaced the module, network operational again and GP
seemed to be applying to our clients once again. On checking event logs
there was a occasional USERENV error.
Next we discovered our Exchange server 2003 had failed, the Information
Store wouldn't mount aznd event logs showed Topology errors. Checking in
system manager the exchange server didn't know about a GC server (we had
one!)[dns had appropriate server records], so I made the exchange server a
GC server and IS started and stores were mounted. Now we have a MAPI
external RPC client error in the logs, MS PSS says don't worry but whilst we
can receive mail we are unable to send mail. Exchange reports unable to bind
to DNS.
However DNS is working correctly from the Exchange server.
I am fed up. We have half term next week and I am considering our options
which appear to be:
1. Continue to fire fight errors and hope that it all comes under control.
Cons can't guarantee that this will solve our problems
2. Take down the domain, rebuild the servers and domain, rejoin the clients
(all 650!). Cons, time taken ADV, we will have a working domain again that
we know is fine. No more fire fighting for a while.
or
Could we bring up a new domain, move the clients and users to the new
domain, migrate the group policies and be back online or would our GC be
suspect?
How about a new forest with a new domain (new GC), could we migrate (or
move) our clients & users to the new domain in a new forest? Or would we
have to rejoin? I can script user creation so the biggest problem would be
the computer accounts.
Sorry about the long post. Hope someone has comments & advice on this.
I am at my wits end.
Andy.
One UK Secondary School, 9 servers of which 3 are Domain Controllers. One DC
runs Exchange 2003, One runs SQL 2000 & SMS 2003. All servers are Windows
Server Enterprise 2003 running at 2003 domain and 2003 forest level. All DC
are GC. Single domain, two buildings linked by fibre, dc's in each building.
Domain created November 2003.
We have 650 clients all running Windows XP PRO with a mix of SP1 & SP2. Our
network is behind ISA 2000 and we run Symantec AV 8.1.
At the start of the new school year in September we had about two weeks of
network outages, sporadic in nature they lasted for a couple of hours at a
time, these outages caused replication problems. Once we had about a week of
stable network operation when the replication errors became more frequent.
Our network monitoring at that time consisted of pinging servers, all the
pings were fine but the replication errors continued.
I ran a Netdiag and DCdiag which failed on FSMO, our PDC was down. I seized
the role and a couple of days later DCdiag passed.FSMO. Replication issues
continued so I installed a trial copy of Solar Winds, and after a couple of
days one of our core switches was showing 60% packet loss. Said switch was
replaced and replication errors diminished only to be replaced by GP 1038
and 1050 errors (unable to find gpt.ini).
Under guidance from MS PSS we uninstalled DNS on all but one DC, deleted the
zone and restarted the remaining DC. Our zone was re created and we
re-installed dns on another server. We reduced our DC's from 5 to 2 and left
the AD to replicate and synchronize over this last weeked.
Monday Morning and we arrived at work to discover a fibre module had failed
sometime Saturday. We replaced the module, network operational again and GP
seemed to be applying to our clients once again. On checking event logs
there was a occasional USERENV error.
Next we discovered our Exchange server 2003 had failed, the Information
Store wouldn't mount aznd event logs showed Topology errors. Checking in
system manager the exchange server didn't know about a GC server (we had
one!)[dns had appropriate server records], so I made the exchange server a
GC server and IS started and stores were mounted. Now we have a MAPI
external RPC client error in the logs, MS PSS says don't worry but whilst we
can receive mail we are unable to send mail. Exchange reports unable to bind
to DNS.
However DNS is working correctly from the Exchange server.
I am fed up. We have half term next week and I am considering our options
which appear to be:
1. Continue to fire fight errors and hope that it all comes under control.
Cons can't guarantee that this will solve our problems
2. Take down the domain, rebuild the servers and domain, rejoin the clients
(all 650!). Cons, time taken ADV, we will have a working domain again that
we know is fine. No more fire fighting for a while.
or
Could we bring up a new domain, move the clients and users to the new
domain, migrate the group policies and be back online or would our GC be
suspect?
How about a new forest with a new domain (new GC), could we migrate (or
move) our clients & users to the new domain in a new forest? Or would we
have to rejoin? I can script user creation so the biggest problem would be
the computer accounts.
Sorry about the long post. Hope someone has comments & advice on this.
I am at my wits end.
Andy.