Harlan said:
[...]
Block [2] is where the enumeration retrieves the next file to add to the
list. The reason I locked this code was because I thought it might
interfere with one of its enumerators looking up a file. But (a) there's
only one thread filling the list, so there's no concern about
concurrency there, and I didn't lock the code in MoveNext that reads a
file from the list. I'm not sure whether I need to lock both blocks of
code or don't need to lock either, but I'm pretty sure having just this
lock doesn't accomplish anything.
Was there a "(b)"?
You are correct…locking only at the writer accomplishes nothing. You
need to lock both the writer and reader. Otherwise, the reader may
observe the collection in an inconsistent state (adding to the
collection isn't atomic).
Block [3] in the Current method and Blocks [4] and [5]--essentially, the
whole MoveNext method--are where I might also have locks. Besides
concern for concurrency with the AddFile method, I note that I've got
boolean formulas where I'm using different properties from the
LazyFileList such as its Count, its TimeoutAction, and its Finish status
that are read at different moments and therefore could, taken together,
not represent a consistent state. [...]
Actually, your biggest issue is that because you're using automatic
properties, the backing field isn't "volatile" and so you can't count on
the value being consistently visible in one thread after an update in
some other thread.
However, given that progress for the
enumeration is unidirectional (time waited so far can go from less than
the Timeout threshold to more than it but not the reverse; the
enumeration can go from Finished = false to Finished = true but not the
reverse), I don't think this is a problem, because the worst that will
happen is that the method will wait through one more sleep cycle than
was strictly necessary before responding to the caller. Does that make
sense?
Without "volatile", you could loop through any number of sleep cycles,
even indefinitely, depending on the compiler optimizations applied to
the code.
Basically:
-- Any mutable state at a minimum needs to be marked "volatile".
-- If the mutable state depends on non-atomic operations or is
modified based on its own value or some other mutable state, you need
full synchronization. (For example, the Count property of your list
changes according to a non-atomic operation, and you can't make the data
structure the Count property depends on volatile in any case, so you
have to lock around the retrieval of the Count property value).
-- A corallary to the above is that anything you can make
immutable, you should. For example, your "Mask" and "Root" properties,
why allow these to be mutable at all? Just back them with a readonly
field, initialized at object construction only. And since your code
doesn't really handle mutation of the "Timeout" property, I'd say
there's a good argument for making that immutable as well (require it to
be initialized during construction of the object).
-- And a general rule of locking: don't lock on "this". Use an
explicit sentinel object for the specific purpose.
I would also not bother checking the "Finished" property except in
between the more time-consuming operations. That is, any time you
return from a GetDirectories() or GetFiles() method call, but not after
the AddFile() call (i.e. in the loop that calls that). Statistically,
if the flag gets set (by the client or timeout), it's nearly always
going to occur while the code's waiting for GetDirectories() or
GetFiles() to return anyway, and only checking at those points will keep
the code simpler.
From an API point of view, IMHO it makes more sense for your Timeout
property to be a TimeSpan. This is especially true that given the way
you use it is basically as a TimeSpan anyway. But in any case, given
that the value is in milliseconds as it stands now, the code would IMHO
be more readable if you used "new TimeSpan(0, 0, 0, 0, list.Timeout)"
instead of "new TimeSpan(10000 * list.Timeout)". (Yeah, I know that's
as much typing, but at least there's no "magic conversion" going on in
the visible code
![Smile :) :)](/styles/default/custom/smilies/smile.gif)
).
Finally, I find the enumerator code a bit over-complicated, and
inefficient as well. There's no need to poll, since your code knows
exactly when the timeout may expire, and you can signal to the code when
to wake up for other purposes (i.e. when the client has canceled the
enumeration, or there's new data to return).
I hope you don't mind, I went ahead and made some modifications to your
original code. For the most part, I tried to leave things as I found
them, except for fixing bugs. But two major things I modified in the
process are:
-- I moved the logic for dealing with consuming the file list and
handling timeouts back into the main class. This leaves the enumerator
itself much simpler, and allows the producer and consumer sections of
the code to exist in the same class.
-- I removed the polling aspects of the consumer code, replacing it
with what is IMHO a better solution. Specifically, using the Monitor
class to handle both synchronization and the timeout aspects. This way,
the code only wakes up if a timeout has expired or if there's work to be
done. Otherwise, the thread can remain dormant.
You will also notice some code protected by "TESTING_SUPPORT". I added
that so that I could more easily test the various delay/cancel effects
of the code. I wrote a little test console application to exercise the
class, which responds to key inputs to insert delays in processing or to
cancel the operation altogether. Obviously, none of that code would be
included in a production application.
I also (slightly) changed the semantics of your IEnumerator<T>
implementation. Note that the behavior of the Current property is
documented to be undefined if called at times when MoveNext() hasn't
been called, or has returned "false". We can take advantage of that to
just do something halfway sensible rather than doing a lot of checking
to ensure consistency. Basically, we push the question of correct usage
off onto the client code, rather than loading the enumeration code down
with that logic.
Speaking of semantics, I notice that you've implemented the
IEnumerator.Reset() method. Strictly speaking, it's perfectly fine to
just throw a NotSupportException. And if you allow that, then your
enumerator implementation could in fact be a single iterator method,
rather than the full-blown interface implementation you've got.
If you need to support Reset(), then of course you have to do that. But
otherwise, the code could be even simpler without it, well beyond the
simple removal of the method. In particular, after my other changes as
well, you could remove the entire LazyFileEnumerator class, and replace
it with this:
public IEnumerator<string> GetEnumerator()
{
int ifile = 0;
string file;
while ((file = NextFile(ifile++)) != null)
{
yield return file;
}
}
....letting the compiler write all the rest of the implementation for you.
All that said, I have copied and pasted the code below, preserving the
Reset() implementation you had before.
Which, by the way, is what you should have done with your original code.
This forum is a public forum, and one of the main benefits of the
forum is so that others can learn from the discussions found in it, even
when reading those discussions long after they have taken place. If you
refer to code that is not actually posted in the forum, then that
benefit relies entirely on the external site being hosted indefinitely,
for as long as the forum itself is ever available by archive.
Given that no matter where you host the external content, the lifetime
of the archive is likely to be MUCH longer, it's generally a bad idea to
post messages that involve a discussion about some specific code without
actually including that code in the message. If you want to do some
special formatting, I suppose you might consider posting the external
link _also_. But IMHO you might as well just include comments in the
code that highlight the specific areas you want to point out.
Anyway, I believe that the code I posted has fixed all the threading
bugs your code had, as well as cleaned up a bit the implementation. Of
course, while I believe the code to be correct, it's possible I've
introduce my own bugs.
![Smile :) :)](/styles/default/custom/smilies/smile.gif)
Please feel free to ask for any
clarifications or corrections!
Pete
#define TESTING_SUPPORT
using System;
using System.Collections.Generic;
using System.Threading;
using System.IO;
namespace Harlan.IO
{
public enum EnumerationTimeoutActions
{
NoTimeout,
ReportComplete,
ThrowException
}
public class LazyFileList : IEnumerable<string>
{
// Public method.
public void Cancel()
{
Finished = true;
}
#if TESTING_SUPPORT
public bool Delay
{
get { bool fRet = _fDelay; _fDelay = false; return fRet; }
set { _fDelay = value; }
}
private volatile bool _fDelay;
#endif // TESTING_SUPPORT
// Public modifiers
/// <summary>
/// Timeout value in milliseconds
/// </summary>
public int Timeout { get { return _timeout; } }
private readonly int _timeout;
/// <summary>
/// Action to take when timeout expires
/// </summary>
public EnumerationTimeoutActions TimeoutAction
{
get { return _action; }
set { _action = value; }
}
private volatile EnumerationTimeoutActions _action;
// Public accessors
/// <summary>
/// Directory name of root of file search
/// </summary>
public string Root { get { return _root; } }
private readonly string _root;
/// <summary>
/// Filename mask for file search
/// </summary>
public string Mask { get { return _mask; } }
private readonly string _mask;
/// <summary>
/// True if the search has completed or was cancelled
/// </summary>
public bool Finished
{
get { return _finished; }
private set
{
// Locking here ensures that a consumer will not
// get stuck waiting for something to be added to
// the list. Otherwise, it would be possible for
// the producer to just stop working without ever
// pulsing the consumer again.
lock (_objLock)
{
_finished = value;
Monitor.Pulse(_objLock);
}
}
}
// Making this volatile allows the producer to check the variable
// without having to bother locking. It can be read atomically,
// and the producer is just polling it, so additional
// synchronization isn't that useful.
private volatile bool _finished;
public int Count
{
get
{
lock (_objLock)
{
return fileList.Count;
}
}
}
// Private implementation
private List<string> fileList = new List<string>();
private readonly object _objLock = new object();
// Constructors
public LazyFileList(string root) : this(root, null) { }
public LazyFileList(string root, string mask) : this(root,
mask, 10000, EnumerationTimeoutActions.NoTimeout) { }
public LazyFileList(string root, string mask, int timeout,
EnumerationTimeoutActions action)
{
_timeout = timeout;
_action = action;
_root = root;
_mask = mask;
// This is superfluous. Finished is initialized to
// false by default.
// Finished = false;
Thread thread = new Thread(new ThreadStart(Fill));
thread.Start();
}
// File getter
private void Fill()
{
Fill(Root);
Finished = true;
}
private void Fill(string parentDirectory)
{
// Note that you could just make the default Mask value "*",
// using that if the client code passes null, and avoid the
// conditional expression here altogether.
string[] files = (Mask == null) ?
files = Directory.GetFiles(parentDirectory) :
files = Directory.GetFiles(parentDirectory, Mask);
if (Finished)
{
return;
}
foreach (string file in files)
{
#if TESTING_SUPPORT
Thread.Sleep(Delay ? 11000 : 250);
#endif // TESTING_SUPPORT
AddFile(file);
}
string[] directories =
Directory.GetDirectories(parentDirectory);
foreach (string directory in directories)
{
if (Finished)
{
return;
}
Fill(directory);
}
}
private void AddFile(string path)
{
lock (_objLock)
{
fileList.Add(path);
Monitor.Pulse(_objLock);
}
}
private string NextFile(int ifile)
{
lock (_objLock)
{
while (fileList.Count <= ifile)
{
if (Finished)
{
// Producer will never add anything more to the
list,
// so we must be done.
return null;
}
// Otherwise, we need to wait...either
indefinitely, or with
// a timeout, depending on configuration
if (TimeoutAction ==
EnumerationTimeoutActions.NoTimeout)
{
Monitor.Wait(_objLock);
}
else
{
// The Wait() can complete either due to a
Pulse() or
// the timeout. In theory, we could use the
return value
// to predict whether the queue now has
non-zero length
// or not, but IMHO the code is clearer and
easier to prove
// correct just checking the length again.
Monitor.Wait(_objLock, Timeout);
if (fileList.Count <= ifile)
{
// Finished could be set by the client code
calling
// Cancel(). Presumably, one does not
expect an
// exception to be thrown in that case.
![Smile :) :)](/styles/default/custom/smilies/smile.gif)
if (Finished || TimeoutAction ==
EnumerationTimeoutActions.ReportComplete)
{
return null;
}
throw new System.TimeoutException("File
enumeration timed out");
}
}
}
return fileList[ifile];
}
}
#region IEnumerable<string> Members
public IEnumerator<string> GetEnumerator()
{
return new LazyFileEnumerator(this);
}
#endregion
#region IEnumerable Members
System.Collections.IEnumerator
System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
public class LazyFileEnumerator : IEnumerator<string>
{
private int index = -1;
private LazyFileList list;
private bool dead = false;
private string _current;
public LazyFileEnumerator(LazyFileList list)
{
this.list = list;
}
#region IEnumerator<string> Members
public string Current
{
get
{
return _current;
}
}
#endregion
#region IDisposable Members
public void Dispose()
{
// Nothing to dispose.
}
#endregion
#region IEnumerator Members
object System.Collections.IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
if (!dead)
{
string file = list.NextFile(++index);
if (file != null)
{
_current = file;
}
else
{
dead = true;
}
}
return !dead;
}
public void Reset()
{
index = -1;
_current = null;
dead = false;
}
#endregion
}
}
}