DHCP Strangest Problem I ever Seen in my life

  • Thread starter Thread starter Emiliano G. Estevez
  • Start date Start date
E

Emiliano G. Estevez

Hi,



I have three domain controllers two are in a site and the other is in
another site, on the three domain controllers I have setup DHCP, wins and
DNS, they all have the latest hot fixes, services packs, and the three DHCP
servers are authorized in AD, the problem I have is that suddenly the
computers in my network that have to renew his IP address don't get one, and
the worst thing is that when the computer is restarted with the patch cord
plugged the system gets a lot of time to logon and once the system is on
line and we check for the IP address the IP address is 0.0.0.0 this is weird
because the system in the case that a DHCP server is not found and can
contact the default gateway will renew his IP address and if the default
gateway is no found the system will get an APIPA well that doesn't work, if
we set the IP address of the workstation to a fixed IP address the problem
is gone, but this is no viable because I have almost 300 workstations, other
weird thing is that if I unplugged the patch cord the computers boots up
normally, I logon to the workstation with cached credentials and then I
plugged the patch cord, go to the cmd and type ipconfig /renew and I get an
IP address.

I put a sniffer on my switchs and the network cards when they are set for
DHCP when they are restarting doesn't send any dhcpinform packets in fact
the doesn't send any packets at all, If I disable the dhcp service on my DC
and setup my catalyst 3550 as a dhcp server the problem is exactly the same
so I figured out that the dhcp service from MS is not involved but maybe I
am missing something, I am very frustrated about this and there is a week
now that I can't solve this problem, please I need a hand on this.



Best Regards.
 
I have three domain controllers two are in a site and the other is in
another site, on the three domain controllers I have setup DHCP, wins and
DNS, they all have the latest hot fixes, services packs, and the three DHCP
servers are authorized in AD, the problem I have is that suddenly the
computers in my network that have to renew his IP address don't get one, and
the worst thing is that when the computer is restarted with the patch cord
plugged the system gets a lot of time to logon and once the system is on
line and we check for the IP address the IP address is 0.0.0.0 this is weird
because the system in the case that a DHCP server is not found and can
contact the default gateway will renew his IP address and if the default
gateway is no found the system will get an APIPA well that doesn't work,
if

Yes, that should happen -- unless the machines have disabled APIPA
through the registry or a policy setting.

General method when you have a problem "this weird" -- put a network
monitor (NetMon, Ethereal, WinDump, Sniffer) on the line and watch
the exchange.

DHCP traffic is easy to filter and isolate.
we set the IP address of the workstation to a fixed IP address the problem
is gone, but this is no viable because I have almost 300 workstations, other
weird thing is that if I unplugged the patch cord the computers boots up
normally, I logon to the workstation with cached credentials and then I
plugged the patch cord, go to the cmd and type ipconfig /renew and I get an
IP address.

My guess would be that you have some sort of "hub/switch" hardware
problem where the port is being shutdown, thus convincing the machines
that they are not plugged into a cable (link detect enabled.)
I put a sniffer on my switchs and the network cards when they are set for
DHCP when they are restarting doesn't send any dhcpinform packets in fact

That would probably be DHCPDiscover (inform is mostly used between
DHCP servers for things like "authorization" info.)

Expect this:
DHCPDiscover (from client)
DHCPOffer (from server)
DHCPRequest (from client)
DHCPAck or NACK (from client)
the doesn't send any packets at all, If I disable the dhcp service on my DC
and setup my catalyst 3550 as a dhcp server the problem is exactly the same
so I figured out that the dhcp service from MS is not involved but maybe I
am missing something, I am very frustrated about this and there is a week
now that I can't solve this problem, please I need a hand on this.

You have pretty much isolated it to the CLIENTS or net hardware.
Either the clients are not making the request, or it isn't getting through.

What happens if you monitor FROM the client (to see if they think they
are sending requests.)?

IF the clients are not sending, then the question arises as to whether
the switch is confusing them or turning of their port so they think they
are disconnected OR if they are just in error on their own.

I would tend to suspect the former, since most people in the world
are not having such problems with Windows clients.

Are you familiar with "link detect" and how a switch might interact
with that setting to confuse the client or to just block the request even
if the client made it?
 
Do you have any routers between your dhcp server and your client?
-Routers don't forward broadcast traffic - DHCP.
You can configure a cisco router to forward dhcp requests to a particular
dhcp server in it's configuration:

Router(config-if)# ip helper-address <ip address of dhcp server>

Give that a go, in fact, on all of your routers with client segments
attatched to them.
 
I all ready have setup my routers with ip helper addresses and that is not
the problem, something I forget to mension is that if I remove a one of the
trouble computers from the domain and then restart that computer I works
flawlesly, If I add the computer again to the domain the problem persist,
It's not related to group policies because I have applied an antipolicy rule
for every gpo i have setup.

Best regards.
 
All my switches are Cisco Systems and one of them wich is the core is a
Catalyst 3550 all the ports in all the switches are set to full duplex and
100 MB port fast (wich means that doesn't check for spanning tree states)
spanning tree is setup only on the trunk ports. I debuged the switches and I
don't find anything unnusual, i used ethereal and etherpeek to snif the
network and the clients don't send a single packet to the dhcp, one of the
test I made is to isolate one of my DC in a hub and put two of the
workstation with the problem on it, and the problem persist. I really don't
know what else to do.

Best regards.
 
What you describe is classic behavior of spanning tree being enabled on the
switches. Every last time I've seen this behavior it was a switch issue,
either because someone enabled spanning tree, DTP, PAgP, had outdated
firmware, etc. In fact you've kind of already proven that it's a switch
issue, we just don't know what exactly. Here are some articles that talk
about some of these same types of issues:

202840 - A Client Connected to an Ethernet Switch May Receive Several
Logon-Related Error Messages During Startup
(http://support.microsoft.com/?id=202840).

168455 - DHCP Renewal Failures on Switched Networks
(http://support.microsoft.com/?id=168455).

--
J.C. Hornbeck, MCSE
Microsoft Product Support

NOTE: Please reply to the newsgroup and not directly to me. This allows
others to add to and benefit from these threads and also helps to ensure a
more timely response. Thank you!

This posting is provided "AS IS" without warranty either expressed or
implied, including, but not limited to, the implied warranties of
merchantability or fitness for a particular purpose.
 
Disabling spanning tree is a quick fix but there is something else. There is
always something when it happened out of the blue, and mostly it's a
software issue like security patch or bad driver.

Here's the story: 2 years old network, Catalyst switches, all NICs are
Compaq Intel Pro (some hundreds), and there was no single DHCP error during
this time. A dozen of brand new HP boxes (Compaq, where are you?) was
invented last month in this up-to-date network, Realtek8139 and Broadcom44x
NICs - guess what? Every reboot these bastards whined about DHCP timeouts,
DNSAPI errors, redirector errors, APIPA etc., and Admin was prepared to
update his resume... :) The switches configuration was rechecked as the last
resort- bam! Spanning tree was enabled! Port Fast was disabled! (Both
settings are default but IIRC a couple of multi-Cisco-titled engineers have
spent a month (!) configuring their stuff at the time). Now tell me who
could imagine that and why?
 
spanning tree is setup only on the trunk ports. I debuged the switches and
I
don't find anything unnusual, i used ethereal and etherpeek to snif the
network and the clients don't send a single packet to the dhcp, one of the
test I made is to isolate one of my DC in a hub and put two of the
workstation with the problem on it, and the problem persist. I really don't
know what else to do.

Did you actually sniff FROM the workstation to see if it is
transmitting the DHCP discover etc?

If not, you have (obviously) a pure workstation problem as long as
the wire is plugged in and active -- put them on a cheap multiport
repeater hub with NEW drop cables and check this.
 
I can't believe it, the ****ing problem was the microsoft isa server
firewall client installed in all the workstations of my network, if
unnistall de ms firewall client and boot up the computer it get an ip
succesfully and if I turn on the firewall client the computer doesn't get
one, I don't know what a hell is happening, I remove the msfwc and the
workstation works fine, I need to know what the firewall client modifies at
the registry level or at file system (I mean it replaces files or not) any
help will be great because nobody I know (people at microsoft, people at
avanade to name a few that might have some idea of what I am talking about)
doesn't get a clue about this.

Best Regards.
 
Jetro, I don't get it you are still under this problem, or your solution was
to enabled spanning tree on the switches and disable port fast, or enable
both of them, if I enable port fast on a spanning tree port I could get a
loop and it's something completly nonsense because spanning tree have four
states to control looping, and port fast skips 2 of these states to get the
port ready much faster.
 
The problem is between the isa server firewall client and the dhcp client,
if I turn on the firewall client and boot up the computer, the computer
doesn't get an ip address, if after the computer starts I turn off the
firewall client and boot up the computer any number of times the computer
get an ip address every single time, so I tried to remove the firewall
client and reinstall, doesn't work, tried every single patch for ms isa
server, doesn't work I tried to look in the registry for a value that is
messing this things so if anybody gets a clue about this I will be very
great.

Best Regards
 
Emiliano,
Temporarily that network was cured by disabling a spanning tree algorithm on
all ports with directly connected wkstas and enabling a port fast on the
same ports. They didn't finish to reconfigure the VLANs, so the final
configuration ain't clear yet.
I hope you use
http://www.cisco.com/en/US/products/hw/switches/ps646/prod_configuration_guides_list.html,
CISCO CATALYST 3550 SERIES SWITCHES Configuration Guides (choose your
release and drill down to 'Configuring STP' and 'Configuring Optional
Spanning-Tree Features').
 
Back
Top