writing an advanced filestream class

  • Thread starter Thread starter bonk
  • Start date Start date
B

bonk

I am trying to create a stream that writes text to a file and:

- automatically creates a new file once the current file exceeds a
certain size
- makes it possible to be used by multiple threads and/or processes so
that multiple threads/processes can write to the same file (all threads
use the same instance of the stream, processes use a different instance
but still may point to the same file)

Could you point me in the correct direction how this would ideally be
implemented? What class should I drive from? How would I solve the
problem of multiple threads acessing the same stream or multiple
processes acess the same file?

Additional information that might be helpful: The stream will be used
with TraceListener (passed in the ctor of the TraceListener).
 
First; note that my example only covers in-process (shared instance)
usage. For multi-process, you would presumably have to open/close the
file each time (possibly open it in shared write mode?) and use a Mutex
to sync access between all processes instead of a lock.

I also have "issues" with multiple processes writing in binary
(interleaved) to a single file; this could lead to partial characters
getting written to a file / files; e.g. a multi-byte character (which
means "most of them" unless you are limiting yourself to ASCII or a
single-byte codepage) gets it's first byte written; another
thread/process then adds some data; then the rest of the character is
written, possibly (disjointed) to the same file, possibly to a
different file. Either way you just knackered the encoding
good'n'proper. You would certainly be splitting up words / sentances.

Perhaps you should be performing this function at the StreamWriter /
TextWriter level, so that character (or better: strings) are written in
their entirety. You should also probably not split a string between
files (hard to read), so I'd allow it to overflow the capped limit as
needed, and then start a new file.

Unfortunately, when writing bytes at the stream level, you can't
guarantee that the current buffer represents complete characters (they
could legitimately a byte at a time), and without a *lot* of inspection
(and knowledge of encodings) it would be hard to keep integrity at the
stream level.

Anyway, "as presented", I would derive from Stream, and encapsulate
(contain) a FileStream; something like below; note the main code is the
Write method; this syncs all access to one thread at a time, and works
in a loop, writing as much as we can from the input buffer to each
successive file until we run out of data.

I haven't tested it at all - but something along those lines may be
close.

Again: If you need strings to remain intact in files, then you may want
to write a StreamWriter instead.

Marc
public class MultiFileStream : Stream
{
private Stream current;
private long totalLength;
private readonly string path;
private readonly int maxFileLength;
private int currentSpace;
private readonly object SyncLock = new object();
private int fileCounter;

public override bool CanRead {get { return false; }} // write-only
stream
public override bool CanSeek {get { return false; }} // write to
end only
public override bool CanWrite {get { return true; }}
public override void Flush()
{
if (current != null) current.Flush();
}

public override long Length {get { return totalLength; }}
public override long Position {
get { return totalLength; }
set { if (value != Position) throw new NotSupportedException();
}
}
public override int Read(byte[] buffer, int offset, int count)
{
throw new NotSupportedException();
}

public override long Seek(long offset, SeekOrigin origin)
{
throw new NotSupportedException();
}

public override void SetLength(long value)
{
throw new NotSupportedException();
}
protected override void Dispose(bool disposing)
{
if (disposing && current!=null)
{
current.Dispose();
current = null;
}
base.Dispose(disposing);
}
public override void Close()
{
if (current != null)
{
current.Close();
current = null;
}
base.Close();
}
public override void Write(byte[] buffer, int offset, int count)
{
lock (SyncLock)
{
while (count > 0)
{
if (current == null || currentSpace == 0)
GetNextFile();
int writeThisPass = currentSpace < count ? currentSpace
: count;
current.Write(buffer, offset, writeThisPass);
offset += writeThisPass;
count -= writeThisPass;
totalLength += writeThisPass;
}
}
}
private void GetNextFile()
{
if (current != null)
{
current.Close();
current = null;
}
while (File.Exists(Path.Combine(path, fileCounter.ToString())))
{
fileCounter++;
}
current = File.Create(Path.Combine(path,
fileCounter.ToString()));
fileCounter++;
currentSpace = maxFileLength;
}
public MultiFileStream(string path, int maxFileLength)
{
this.path = path;
this.maxFileLength = maxFileLength;
currentSpace = 0;

}
}
 
Forgot to add; for multi-process you must expect the size to change
without notice... so you'd probably drop both the current and
currentSpace fields and check them on the fly within the Write method.
Lots of opening and closing :-(

Alternatively - you could perhaps use remoting so all processes talk to
a single instance? Of course, then it has to remote the buffer...

Marc
 
protected override void Dispose(bool disposing)
{
if (disposing && current!=null)
{
current.Dispose();
current = null;
}
base.Dispose(disposing);
}

Is a lock not needed in your Dispose method? What if this method is called
while another thread is creating a new stream? What if you dispose a stream
which is currently being written to?
 
Hi bonk,

As I understood from your requirement. There are two key problems you want
to solve.
1. Write to a file from multiple threads in the same process
2. Write to a file from multiple threads from multiple processes

The 1st problem can be solved quite easily. But the second problem may need
a re-thinking on your end to see if the approach itself is correct. I say so
because writing to the same file from multiple process without any mediating
process is difficult to manage. The main issue would be to synchronize the
access to the file by multiple processes so that the final file content won't
look as garbage data due to mix up of data from multiple processes. And
synchronizing actions from multiple processes will un-doubtedly and severly
degrade the performance of the application. Probably if you want to use the
same file name to write content then you could as well think of creating one
file per process with the following format.
<filename>_<process-id>.<ext>

Anyway coming to the 1st problem (Write to a file from multiple threads in
the same process), here also you need to synchronize the writing of data to
your FileStream class from mutiple threads so that the data won't get
mixed-up. For this you can use lock on the FileStream object representing
your file inside your "Write" method. This technique uses locks and may still
pose performance issues. I have created some free C# libraries that can
assist in performing such operations without causing performance degradation.

Take a look at my recent library that I posted on CodeProject.com
http://www.codeproject.com/cs/library/asynchronouscodeblocks.asp

Using this library you can implement the Write method of your FileStream as
shown below (only Pseudo code).

class AdvancedFileStream
{
int AdvancedFileStream::Write(byte[] data,....)
{
new async(delegate { this.InternalWrite(data,...) }, _myThreadPool);
}

Sonic.Net.ThreadPool _myThreadPool = new Sonic.Net.ThreadPool(1,1);
}

You can use one ThreadPool (defined in my library) with one maximum and one
concurrent thread to do all writing of the data to the underlying file stream
represented by the AdvancedFileStream object. When multiple threads call into
Write the method will post a delegate to the _myThreadPool and will return
immediately to the calling code. Later the delegate will be executed on the
ThreadPool thread dedicated for this instance of the AdvancedFileStream
object. Since there is only one thread in this ThreadPool only one write
request will be handled at any given point of time thus protecting the
sequence of the writes to the file from multiple threads.

There is one issue that you need to be aware of in this approach. The byte
array supplied to the AdvancedFileStream::Write method should not be re-used
as we do not know when the ThreadPool thread gets a chance to perform the
actual write. There are ways to overcome it. I leave this to you as you can
figure out several options after reading my article on ACB.

Hope this helps.
 
Possibly so; however, I would be reasonably happy for it to go bang in
this scenario, as they *really* shouldn't be individually disposing
this stream, since they don't own it!

A better option would be for me to have included an "isDisposed" field,
and barf if *anything* happens after Dispose() [except Dispose()].

If the caller is using e.g. a Writer that rudely insists on
Dispose()ing the base stream when itself disposed, then this can be
managed; I believe Jon Skeet has a non-closing stream example in his
bag of tricks on his site.

Marc
 
Marc Gravell said:
Possibly so; however, I would be reasonably happy for it to go bang in
this scenario, as they *really* shouldn't be individually disposing
this stream, since they don't own it!

A better option would be for me to have included an "isDisposed" field,
and barf if *anything* happens after Dispose() [except Dispose()].

If the caller is using e.g. a Writer that rudely insists on
Dispose()ing the base stream when itself disposed, then this can be
managed; I believe Jon Skeet has a non-closing stream example in his
bag of tricks on his site.

Yup - it's in MiscUtil:
http://www.pobox.com/~skeet/csharp/miscutil
 
Back
Top