B
Brett Gerhardi
Hi all, I am trying to go to the end of a file and read x amount of lines
from the end. For some reason I am finding that the StreamReader.Read is not
truthful about the way it should work. I am simply trying to read in blocks
of the file from the end using read and seek and record each newline
position and this is where the code doesn't seem to be behaving itself.
Here is the code to reproduce:
private const int FILE_SEEK_BUFFER_SIZE = 64;
public string LoadAndReadLinesFromFile(string aFilename, int aLinesToShow)
{
FileStream mFile;
if (!File.Exists(aFilename))
{
mFile = null;
throw new FileNotFoundException("File not found", aFilename);
}
else
{
mFile = File.Open(aFilename, FileMode.Open);
}
StreamReader mFileReader = new StreamReader(mFile);
// seek from the back of the file in the buffered amounts and count for the
amount of newline characters.
// and remember where they all are
ArrayList newLines = new ArrayList(aLinesToShow);
mFile.Seek(0, SeekOrigin.End);
newLines.Insert(0, mFile.Position); // record the end of the file pos at the
end of the array
mFile.Seek(-FILE_SEEK_BUFFER_SIZE, SeekOrigin.Current); // set initial
position for reading
char[] buffer = new char[FILE_SEEK_BUFFER_SIZE];
//byte[] buffer = new byte[FILE_SEEK_BUFFER_SIZE]; // this works fine
int searchStartPosition = FILE_SEEK_BUFFER_SIZE;
int test;
// loop until we have the right amount of newlines in our array
while (newLines.Count < aLinesToShow + 1) // +1 to include the end newline
{
test = mFileReader.Read(buffer, 0, FILE_SEEK_BUFFER_SIZE);
//test = mFile.Read(buffer, 0, FILE_SEEK_BUFFER_SIZE); // this works fine
mFile.Seek(-FILE_SEEK_BUFFER_SIZE*2, SeekOrigin.Current); // seek back twice
so we're in the correct position for the next loop
// search buffer loop until we have found all \n's or \r\n's
}
(This will end up exceptioning as I've ommitted the code which adds the
found newlines to the array, but it should demonstrate the problem ok as it
happens early in the loop)
I am using MS SMTP logs which pad out the end of the files with nulls
(/0/0/0/0 etc) and I wonder whether this could be the cause of the problem.
OK. The behaviour I see with the null padded file is that the mFile.Position
goes down from 65536 in 64 byte increments fine until 65280 .. then on the
mFileReader.Read call the position jumps to 65536 again and starts going
backwards again. The test return variable I have on the Read call shows 64,
although this is obviously not accurate. As the code suggests, if I use the
normal FileStream.Read, it behaves normally.
Is this known expected behaviour, it doesn't seem to be documented anywhere
I've seen?
Cheers, regards
-=- Brett
from the end. For some reason I am finding that the StreamReader.Read is not
truthful about the way it should work. I am simply trying to read in blocks
of the file from the end using read and seek and record each newline
position and this is where the code doesn't seem to be behaving itself.
Here is the code to reproduce:
private const int FILE_SEEK_BUFFER_SIZE = 64;
public string LoadAndReadLinesFromFile(string aFilename, int aLinesToShow)
{
FileStream mFile;
if (!File.Exists(aFilename))
{
mFile = null;
throw new FileNotFoundException("File not found", aFilename);
}
else
{
mFile = File.Open(aFilename, FileMode.Open);
}
StreamReader mFileReader = new StreamReader(mFile);
// seek from the back of the file in the buffered amounts and count for the
amount of newline characters.
// and remember where they all are
ArrayList newLines = new ArrayList(aLinesToShow);
mFile.Seek(0, SeekOrigin.End);
newLines.Insert(0, mFile.Position); // record the end of the file pos at the
end of the array
mFile.Seek(-FILE_SEEK_BUFFER_SIZE, SeekOrigin.Current); // set initial
position for reading
char[] buffer = new char[FILE_SEEK_BUFFER_SIZE];
//byte[] buffer = new byte[FILE_SEEK_BUFFER_SIZE]; // this works fine
int searchStartPosition = FILE_SEEK_BUFFER_SIZE;
int test;
// loop until we have the right amount of newlines in our array
while (newLines.Count < aLinesToShow + 1) // +1 to include the end newline
{
test = mFileReader.Read(buffer, 0, FILE_SEEK_BUFFER_SIZE);
//test = mFile.Read(buffer, 0, FILE_SEEK_BUFFER_SIZE); // this works fine
mFile.Seek(-FILE_SEEK_BUFFER_SIZE*2, SeekOrigin.Current); // seek back twice
so we're in the correct position for the next loop
// search buffer loop until we have found all \n's or \r\n's
}
(This will end up exceptioning as I've ommitted the code which adds the
found newlines to the array, but it should demonstrate the problem ok as it
happens early in the loop)
I am using MS SMTP logs which pad out the end of the files with nulls
(/0/0/0/0 etc) and I wonder whether this could be the cause of the problem.
OK. The behaviour I see with the null padded file is that the mFile.Position
goes down from 65536 in 64 byte increments fine until 65280 .. then on the
mFileReader.Read call the position jumps to 65536 again and starts going
backwards again. The test return variable I have on the Read call shows 64,
although this is obviously not accurate. As the code suggests, if I use the
normal FileStream.Read, it behaves normally.
Is this known expected behaviour, it doesn't seem to be documented anywhere
I've seen?
Cheers, regards
-=- Brett