Strange New Problem - System Shuts Down By Itself

  • Thread starter Thread starter Newscene
  • Start date Start date
N

Newscene

We have developed the strangest new problem with one of our Win2000AS
servers. This is one of 4 virtually identical system, in this case its
primary role is SMTP/POP email. This server runs Merak Mail Server ver 8 but
the problem doesn't APPEAR to be related to the email server functions.

About three weeks ago, the day before DST went into effect for the US, the
system simply shut itself down about 3 minutes after midnight EST. The ONLY
indication in the Event Log is that the eventlog service was stopped. The
following night the shutdown occured at about 1AM EDT but returned to a few
minitues after EDT midnight on the following night. It continued like this
for several days and then stopped only to return after an absence of two
nights. Two days ago the shutdown changed and occured around 7:00 PM EDT.
Today, Monday 4/18 the shutdown occured at 09:45 EDT and again at 14:00 EDT
today.

We have examined every possible source of a scheduled event that might even
remotely be related and have found nothing. There are currently no system
actions in the Scheduled Tasks. The mailserver is set to update the
anti-Spam functions at 04:00 EDT. Further, we have examined every task and
service running in Task Manager and the Service control and all of them are
legitimate.

Thinking that the system might have been compromised somehow we have run
Spybot Search and Destroy as well as Microsoft's AntiSpyware and everything
comes up negative. The firewall is very tight and the there is only limited
external access to the network --- the only only ports open on the system
firewall are for WWW, SMTP, POP and VPN accesses --- so we do not think that
the cause is external.

In every case there is abolutely no indication of the source of the shutdown
or the cause. We have put together a small PERL program to send a
Wake-On-LAN to the machine when the shutdown is detected and it has worked
flawlessly since we implemented it --- so clearly it is an orderly shutdown.
 
If no BSOD's occur (Event ID: 1001 Source: Save Dump), then it may be a
power problem. I would look at a possible faulty (or undersized) pc power
supply, UPS, bad battery, or the circuit feeding the outlet that the pc is
plugged into.

--
Regards,
Dave

-------------
Dave Patrick ....Please no email replies - reply in newsgroup.
Microsoft Certified Professional
Microsoft MVP [Windows]
http://www.microsoft.com/protect

:
|
| We have developed the strangest new problem with one of our Win2000AS
| servers. This is one of 4 virtually identical system, in this case its
| primary role is SMTP/POP email. This server runs Merak Mail Server ver 8
but
| the problem doesn't APPEAR to be related to the email server functions.
|
| About three weeks ago, the day before DST went into effect for the US, the
| system simply shut itself down about 3 minutes after midnight EST. The
ONLY
| indication in the Event Log is that the eventlog service was stopped. The
| following night the shutdown occured at about 1AM EDT but returned to a
few
| minitues after EDT midnight on the following night. It continued like this
| for several days and then stopped only to return after an absence of two
| nights. Two days ago the shutdown changed and occured around 7:00 PM EDT.
| Today, Monday 4/18 the shutdown occured at 09:45 EDT and again at 14:00
EDT
| today.
|
| We have examined every possible source of a scheduled event that might
even
| remotely be related and have found nothing. There are currently no system
| actions in the Scheduled Tasks. The mailserver is set to update the
| anti-Spam functions at 04:00 EDT. Further, we have examined every task and
| service running in Task Manager and the Service control and all of them
are
| legitimate.
|
| Thinking that the system might have been compromised somehow we have run
| Spybot Search and Destroy as well as Microsoft's AntiSpyware and
everything
| comes up negative. The firewall is very tight and the there is only
limited
| external access to the network --- the only only ports open on the system
| firewall are for WWW, SMTP, POP and VPN accesses --- so we do not think
that
| the cause is external.
|
| In every case there is abolutely no indication of the source of the
shutdown
| or the cause. We have put together a small PERL program to send a
| Wake-On-LAN to the machine when the shutdown is detected and it has worked
| flawlessly since we implemented it --- so clearly it is an orderly
shutdown.
|
|
 
I suppose its possible, but all four machines are physically identical and
the other three are not exhibiting this problem. There is one other machine
on the same UPS and it is fine.

I'll have to see how we can test this possibility.


Dave Patrick said:
If no BSOD's occur (Event ID: 1001 Source: Save Dump), then it may be a
power problem. I would look at a possible faulty (or undersized) pc power
supply, UPS, bad battery, or the circuit feeding the outlet that the pc is
plugged into.

--
Regards,
Dave

-------------
Dave Patrick ....Please no email replies - reply in newsgroup.
Microsoft Certified Professional
Microsoft MVP [Windows]
http://www.microsoft.com/protect

:
|
| We have developed the strangest new problem with one of our Win2000AS
| servers. This is one of 4 virtually identical system, in this case its
| primary role is SMTP/POP email. This server runs Merak Mail Server ver 8
but
| the problem doesn't APPEAR to be related to the email server functions.
|
| About three weeks ago, the day before DST went into effect for the US,
the
| system simply shut itself down about 3 minutes after midnight EST. The
ONLY
| indication in the Event Log is that the eventlog service was stopped.
The
| following night the shutdown occured at about 1AM EDT but returned to a
few
| minitues after EDT midnight on the following night. It continued like
this
| for several days and then stopped only to return after an absence of two
| nights. Two days ago the shutdown changed and occured around 7:00 PM
EDT.
| Today, Monday 4/18 the shutdown occured at 09:45 EDT and again at 14:00
EDT
| today.
|
| We have examined every possible source of a scheduled event that might
even
| remotely be related and have found nothing. There are currently no
system
| actions in the Scheduled Tasks. The mailserver is set to update the
| anti-Spam functions at 04:00 EDT. Further, we have examined every task
and
| service running in Task Manager and the Service control and all of them
are
| legitimate.
|
| Thinking that the system might have been compromised somehow we have run
| Spybot Search and Destroy as well as Microsoft's AntiSpyware and
everything
| comes up negative. The firewall is very tight and the there is only
limited
| external access to the network --- the only only ports open on the
system
| firewall are for WWW, SMTP, POP and VPN accesses --- so we do not think
that
| the cause is external.
|
| In every case there is abolutely no indication of the source of the
shutdown
| or the cause. We have put together a small PERL program to send a
| Wake-On-LAN to the machine when the shutdown is detected and it has
worked
| flawlessly since we implemented it --- so clearly it is an orderly
shutdown.
|
|
 
Is the machine getting hot at all? placed higher in the rack maybe??

For starters, try replacing the power lead to this box (yes, they can stuff
up sometimes, especially if they've ever got hot or cable-tied too tightly)
and try a different power outlet also - failing this, the power supply is
the most likely culprit. They don't make them like they used to.

Cameron:-)

Newscene said:
I suppose its possible, but all four machines are physically identical and
the other three are not exhibiting this problem. There is one other machine
on the same UPS and it is fine.

I'll have to see how we can test this possibility.


Dave Patrick said:
If no BSOD's occur (Event ID: 1001 Source: Save Dump), then it may be a
power problem. I would look at a possible faulty (or undersized) pc power
supply, UPS, bad battery, or the circuit feeding the outlet that the pc is
plugged into.

--
Regards,
Dave

-------------
Dave Patrick ....Please no email replies - reply in newsgroup.
Microsoft Certified Professional
Microsoft MVP [Windows]
http://www.microsoft.com/protect

:
|
| We have developed the strangest new problem with one of our Win2000AS
| servers. This is one of 4 virtually identical system, in this case its
| primary role is SMTP/POP email. This server runs Merak Mail Server ver 8
but
| the problem doesn't APPEAR to be related to the email server functions.
|
| About three weeks ago, the day before DST went into effect for the US,
the
| system simply shut itself down about 3 minutes after midnight EST. The
ONLY
| indication in the Event Log is that the eventlog service was stopped.
The
| following night the shutdown occured at about 1AM EDT but returned to a
few
| minitues after EDT midnight on the following night. It continued like
this
| for several days and then stopped only to return after an absence of two
| nights. Two days ago the shutdown changed and occured around 7:00 PM
EDT.
| Today, Monday 4/18 the shutdown occured at 09:45 EDT and again at 14:00
EDT
| today.
|
| We have examined every possible source of a scheduled event that might
even
| remotely be related and have found nothing. There are currently no
system
| actions in the Scheduled Tasks. The mailserver is set to update the
| anti-Spam functions at 04:00 EDT. Further, we have examined every task
and
| service running in Task Manager and the Service control and all of them
are
| legitimate.
|
| Thinking that the system might have been compromised somehow we have run
| Spybot Search and Destroy as well as Microsoft's AntiSpyware and
everything
| comes up negative. The firewall is very tight and the there is only
limited
| external access to the network --- the only only ports open on the
system
| firewall are for WWW, SMTP, POP and VPN accesses --- so we do not think
that
| the cause is external.
|
| In every case there is abolutely no indication of the source of the
shutdown
| or the cause. We have put together a small PERL program to send a
| Wake-On-LAN to the machine when the shutdown is detected and it has
worked
| flawlessly since we implemented it --- so clearly it is an orderly
shutdown.
|
|
 
This morning we swapped two systems to see if the problem stays with the
hardware or follows the software. The four machines are all identical
hardware and all the systems are running on removable disk modules so
swapping is fairly easy. The hardware is actually 5 identical machines with
one as a service spare --- all the software systems can live on any of the
hardware platforms.

We'll see what happens sometime in the next 24 hours.

Thanks
John



Cameron Dorrough said:
Is the machine getting hot at all? placed higher in the rack maybe??

For starters, try replacing the power lead to this box (yes, they can
stuff
up sometimes, especially if they've ever got hot or cable-tied too
tightly)
and try a different power outlet also - failing this, the power supply is
the most likely culprit. They don't make them like they used to.

Cameron:-)

Newscene said:
I suppose its possible, but all four machines are physically identical
and
the other three are not exhibiting this problem. There is one other machine
on the same UPS and it is fine.

I'll have to see how we can test this possibility.


Dave Patrick said:
If no BSOD's occur (Event ID: 1001 Source: Save Dump), then it may be a
power problem. I would look at a possible faulty (or undersized) pc power
supply, UPS, bad battery, or the circuit feeding the outlet that the pc is
plugged into.

--
Regards,
Dave

-------------
Dave Patrick ....Please no email replies - reply in newsgroup.
Microsoft Certified Professional
Microsoft MVP [Windows]
http://www.microsoft.com/protect

:
|
| We have developed the strangest new problem with one of our Win2000AS
| servers. This is one of 4 virtually identical system, in this case
its
| primary role is SMTP/POP email. This server runs Merak Mail Server
ver 8
but
| the problem doesn't APPEAR to be related to the email server functions.
|
| About three weeks ago, the day before DST went into effect for the
US,
the
| system simply shut itself down about 3 minutes after midnight EST.
The
ONLY
| indication in the Event Log is that the eventlog service was stopped.
The
| following night the shutdown occured at about 1AM EDT but returned to a
few
| minitues after EDT midnight on the following night. It continued like
this
| for several days and then stopped only to return after an absence of two
| nights. Two days ago the shutdown changed and occured around 7:00 PM
EDT.
| Today, Monday 4/18 the shutdown occured at 09:45 EDT and again at 14:00
EDT
| today.
|
| We have examined every possible source of a scheduled event that
might
even
| remotely be related and have found nothing. There are currently no
system
| actions in the Scheduled Tasks. The mailserver is set to update the
| anti-Spam functions at 04:00 EDT. Further, we have examined every
task
and
| service running in Task Manager and the Service control and all of them
are
| legitimate.
|
| Thinking that the system might have been compromised somehow we have run
| Spybot Search and Destroy as well as Microsoft's AntiSpyware and
everything
| comes up negative. The firewall is very tight and the there is only
limited
| external access to the network --- the only only ports open on the
system
| firewall are for WWW, SMTP, POP and VPN accesses --- so we do not think
that
| the cause is external.
|
| In every case there is abolutely no indication of the source of the
shutdown
| or the cause. We have put together a small PERL program to send a
| Wake-On-LAN to the machine when the shutdown is detected and it has
worked
| flawlessly since we implemented it --- so clearly it is an orderly
shutdown.
|
|
 
Episode 15 of "The Mystery of the Self-Closing Server"

In our last episode we had swapped two systems to each other's hardware
platforms. Thus, the failing server software is now running on entirely
different hardware, PS, etc., the only unchanged element being the hard
drives.

The server ran for approximately 32 hours until about 18:00 EDT at which
time it proceeded to shutdown as before. The recovery worked as intended and
the system was back online in about 6 minutes. It then ran for approximately
2 hours until 20:00 EDT when it shut down and recovered yet again.

As in every case thus far the only indication in the logs is the notation
that the Event Log has stopped. We have just about the maximum logging
turned on and are logging everything we can think of, all t ono avail.

This movie has becoming un-funny.



Newscene said:
This morning we swapped two systems to see if the problem stays with the
hardware or follows the software. The four machines are all identical
hardware and all the systems are running on removable disk modules so
swapping is fairly easy. The hardware is actually 5 identical machines
with one as a service spare --- all the software systems can live on any
of the hardware platforms.

We'll see what happens sometime in the next 24 hours.

Thanks
John



Cameron Dorrough said:
Is the machine getting hot at all? placed higher in the rack maybe??

For starters, try replacing the power lead to this box (yes, they can
stuff
up sometimes, especially if they've ever got hot or cable-tied too
tightly)
and try a different power outlet also - failing this, the power supply is
the most likely culprit. They don't make them like they used to.

Cameron:-)

Newscene said:
I suppose its possible, but all four machines are physically identical
and
the other three are not exhibiting this problem. There is one other machine
on the same UPS and it is fine.

I'll have to see how we can test this possibility.


If no BSOD's occur (Event ID: 1001 Source: Save Dump), then it may be
a
power problem. I would look at a possible faulty (or undersized) pc power
supply, UPS, bad battery, or the circuit feeding the outlet that the
pc is
plugged into.

--
Regards,
Dave

-------------
Dave Patrick ....Please no email replies - reply in newsgroup.
Microsoft Certified Professional
Microsoft MVP [Windows]
http://www.microsoft.com/protect

:
|
| We have developed the strangest new problem with one of our
Win2000AS
| servers. This is one of 4 virtually identical system, in this case
its
| primary role is SMTP/POP email. This server runs Merak Mail Server
ver 8
but
| the problem doesn't APPEAR to be related to the email server functions.
|
| About three weeks ago, the day before DST went into effect for the
US,
the
| system simply shut itself down about 3 minutes after midnight EST.
The
ONLY
| indication in the Event Log is that the eventlog service was
stopped.
The
| following night the shutdown occured at about 1AM EDT but returned
to a
few
| minitues after EDT midnight on the following night. It continued
like
this
| for several days and then stopped only to return after an absence of two
| nights. Two days ago the shutdown changed and occured around 7:00 PM
EDT.
| Today, Monday 4/18 the shutdown occured at 09:45 EDT and again at 14:00
EDT
| today.
|
| We have examined every possible source of a scheduled event that
might
even
| remotely be related and have found nothing. There are currently no
system
| actions in the Scheduled Tasks. The mailserver is set to update the
| anti-Spam functions at 04:00 EDT. Further, we have examined every
task
and
| service running in Task Manager and the Service control and all of them
are
| legitimate.
|
| Thinking that the system might have been compromised somehow we have run
| Spybot Search and Destroy as well as Microsoft's AntiSpyware and
everything
| comes up negative. The firewall is very tight and the there is only
limited
| external access to the network --- the only only ports open on the
system
| firewall are for WWW, SMTP, POP and VPN accesses --- so we do not think
that
| the cause is external.
|
| In every case there is abolutely no indication of the source of the
shutdown
| or the cause. We have put together a small PERL program to send a
| Wake-On-LAN to the machine when the shutdown is detected and it has
worked
| flawlessly since we implemented it --- so clearly it is an orderly
shutdown.
|
|
 
Back
Top