directory.exists occassionally locks up when checking network share

Keith Langer · Sep 1, 2006

Hi,

I'm running the .Net 1.1 framework on XP Pro. I am finding some
occasions where a call to Directory.Exists never returns when passing
in a network share. Most of the time the call returns properly, but if
the machine is really busy or if there is a network problem (or if the
server's Secondary Logon or Server services have failed), then this
problem can surface. This causes my application to hang indefinitely.
Is there some a framework patch to fix this or some other way to
prevent it?

thanks,
Keith Langer

Ben Voigt · Sep 1, 2006

Keith Langer said:
Hi,

I'm running the .Net 1.1 framework on XP Pro. I am finding some
occasions where a call to Directory.Exists never returns when passing
in a network share. Most of the time the call returns properly, but if
the machine is really busy or if there is a network problem (or if the
server's Secondary Logon or Server services have failed), then this
problem can surface. This causes my application to hang indefinitely.
Is there some a framework patch to fix this or some other way to
prevent it?

Start the I/O on a new thread and then wait on the thread with a timeout...
In .Net 2.0 you could use BackgroundWorker, in 1.1 you have to write a
little more code yourself.

Keith Langer · Sep 1, 2006

Ben,

One thing that concerns me is that when this problem occurred, the
process became so locked up that it couldn't be killed and we couldn't
get the machine to reboot. We finally had to kill the winlogon service
which forced a reboot. So I don't know if the background thread is
going to allow itself to be aborted so easily.

But your suggestion is something that I actually implemented a short
while ago. Since I can't reproduce the problem locally, I'll have to
wait until next week to test it.

Keith

Keith Langer · Sep 2, 2006

Does anyone from Microsoft read this newsgroup? It would be nice if
someone could tell me why this is happening.

Keith

Carl Daniel [VC++ MVP] · Sep 2, 2006

Keith said:
Does anyone from Microsoft read this newsgroup? It would be nice if
someone could tell me why this is happening.

Various MSFT people do read this group. Telling you why it's happening is
not necessarily something they'll readily be able to do.

From what you describe, your application is waiting for a synchronous I/O
request to complete, and that request is blocked in the kernel. That's what
results in a hung, un-killable application.

The first things I'd check:

1. Do the machines where it fails have different network hardware than the
machines where it works?
2. Do the machines where it fails have different network driver versions
that the machines where it works?
3. Are the machines where it fails up to date on patches for the OS and all
installed hardware?

Bottom line - it's most likely a defective network driver or hardware that's
at the root of it. Network issues do legitimately occur, but worst case,
your application should hang for a couple minutes before a timeout occurs
and everything gets unstuck.

-cd

Keith Langer · Sep 3, 2006

Carl,

From what I've been told, the machines that have had the failure may

have a different type of NIC than the machines that don't have the
failure. The call to this function can return successfully hundreds or
thousands of times before the call doesn't return at all. The OS is
identical on all machines. As to whether the NIC drivers are up to
date, I don't know. Shouldn't Windows be able to deal with even a bad
NIC driver by causing a timeout?

Some more background on how this function is used: This application
will check for the server share every 30 seconds until it finds it. It
also attempts to connect to the share with a secondary login since the
primary login has a password which conflicts with the server. Due to a
virus, the Server and Secondary logon services had failed on the
server, so the application would check for the share every 30 seconds
and never find it.

A few questions:
1) Any idea how I can force this situation to be reproduced?
2) Do you think that if I call this method from a different thread that
I'm going to still have problems? I'm guessing that the thread will
never be successfully aborted and the system performance will degrade
as a result.
3) Is there another way to check for the directory's existence while
avoiding the potential for a lockup?

thanks,
Keith

Carl Daniel [VC++ MVP] · Sep 3, 2006

Keith said:
Carl,

have a different type of NIC than the machines that don't have the
failure. The call to this function can return successfully hundreds
or thousands of times before the call doesn't return at all. The OS
is identical on all machines. As to whether the NIC drivers are up to
date, I don't know. Shouldn't Windows be able to deal with even a bad
NIC driver by causing a timeout?

Unfortunately, no. Unless the driver correctly implements IO cancellation
and timeouts, there's nothing the IO manager in the OS can do to forcibly
stop it (only the driver can know the actions required to reliably cancel a
request).

Some more background on how this function is used: This application
will check for the server share every 30 seconds until it finds it.
It
also attempts to connect to the share with a secondary login since the
primary login has a password which conflicts with the server. Due to
a virus, the Server and Secondary logon services had failed on the
server, so the application would check for the share every 30 seconds
and never find it.

A few questions:
1) Any idea how I can force this situation to be reproduced?

No. From what you describe, I'd guess that there's a good chance that it's
a driver bug.

2) Do you think that if I call this method from a different thread
that
I'm going to still have problems? I'm guessing that the thread will
never be successfully aborted and the system performance will degrade
as a result.

I wouldn't expect it to make any difference at all.

3) Is there another way to check for the directory's existence while
avoiding the potential for a lockup?

Nothing comes to mind, sorry.

-cd

Keith Langer · Sep 3, 2006

Carl,

Do you think I would still get the lock up if I tried to retrieve the
directory info or file info instead of calling Exists? These calls
would normally throw an error if the directory doesn't exist.

Keith

Carl Daniel [VC++ MVP] · Sep 3, 2006

Keith said:
Carl,

Do you think I would still get the lock up if I tried to retrieve the
directory info or file info instead of calling Exists? These calls
would normally throw an error if the directory doesn't exist.

Most likely, it wouldn't make any difference, but it wouldn't hurt to try.

-cd

directory.exists occassionally locks up when checking network share

Keith Langer

Ben Voigt

Keith Langer

Keith Langer

Carl Daniel [VC++ MVP]

Keith Langer

Carl Daniel [VC++ MVP]

Keith Langer

Carl Daniel [VC++ MVP]