read the data from a text file

  • Thread starter Thread starter great stuff
  • Start date Start date
G

great stuff

Hi all,

How can i read specific lines from text file
for example i have a text file with 700000 lines...i have to read the
lines from say 3000 to 5000 (3000 to 5000 is not fixed ..it changes)


Thanks for any help
 
great said:
Hi all,

How can i read specific lines from text file
for example i have a text file with 700000 lines...i have to read the
lines from say 3000 to 5000 (3000 to 5000 is not fixed ..it changes)


Thanks for any help

Files are not line based, or even character based. You have to read the
file from the beginning to know which line you are on.
 
Hi all,

How can i read specific lines from text file
for example i have a text file with 700000 lines...i have to read the
lines from say 3000 to 5000 (3000 to 5000 is not fixed ..it changes)


Thanks for any help

With .NET 3.5 and earlier, you will have to loop through the file and
then start reading when the counter hits the first line number and break
when it hits the last line number.

There is, at least currently, no

file.ReadLines(3000, 2000);

There are some new file reading features in .NET 4.0, however, if you
can wait to use them (release will be some time in 2010?). For example,
the new file reading features allows you to get an IEnumerable<T> return
from ReadLines():

IEnumerable<string> lines = File.ReadLines(@"verylargefile.txt");

If this works as expected, you could do this:

for(int i = startNumber; i <= finishNumber; i++)
{
string line = lines;
}

Peace and Grace,


--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 
Files are not line based, or even character based. You have to read the
file from the beginning to know which line you are on.


Although .NET 4.0 has a new feature of returning file lines as IEnumrable
<T>:
http://msdn.microsoft.com/en-us/magazine/ee428166.aspx

Peace and Grace,


--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 
Hi all,

How can i read specific lines from text file
for example i have a text file with 700000 lines...i have to read the
lines from say 3000 to 5000 (3000 to 5000 is not fixed ..it changes)

Thanks for any help

The Stream class does have a Seek method but it is byte based.

You're gonna have to have a have a read loop to read lines and
incrementing a counter until you get to the line you need to start
your processing at.
 
That's neat. You could do:

foreach (string line in File.ReadLines(fileName).Skip(3000).Take(2000))

However, it still has to read the lines up to 3000, it can't actually
skip anything.

Unless all the lines are a fixed length :) Then you could open it up using a
filestream, seek to your start position, and then wrap it in a streamreader to
keep reading :)

One of the weak areas of VB.NET is that it is so cumbersom to create a custom
iterator. In c# I can do this today - with just a few lines of code:

public static class StreamReaderExtensions
{
public static IEnumerable<string> ReadLines (this StreamReader reader)
{
string line;
while ((line = reader.ReadLine()) != null)
{
yeild return line;
}
}
}

then.. as you say, i can go:

using (StreamReader reader = new StreamReader (myfile))
{
foreach (string line in reader.Skip(3000).Take(2000))
Console.WriteLine (line);
}

Alas, VB10's iterator support is not that much better. Oh, well.
 
Tom said:
Unless all the lines are a fixed length :) Then you could open it up using a
filestream, seek to your start position, and then wrap it in a streamreader to
keep reading :)

That also requires that the characters have a fixed size. If the
encoding of the file is for example UTF-8, a character can be a single
byte or several bytes, so even with the same character length you get
different byte lengths.
 
That's neat. You could do:

foreach (string line in
File.ReadLines(fileName).Skip(3000).Take(2000))

However, it still has to read the lines up to 3000, it can't actually
skip anything.

Technically, I would assume you are correct. It will still have to read
through, but it could still be more efficient, as it rips through bytes
looking for return chars rather than turning them into text as it would
with ReadLine(). From my experience, the longer you stay in the binary
world, the faster most operations are.

NOTE: I have not reflected up the .NET 4.0 base classes from IL so I am
not sure what is actually going on. It is just my assumption that the
conversion to actual strings would be deferred for perf.

From the standpoint of the OP, it would be more maintainable and easier
to read code, so there is still a benefit, even if it has to read
through the large file.

Peace and Grace,

--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 
Technically, I would assume you are correct. It will still have to read
through, but it could still be more efficient, as it rips through bytes
looking for return chars rather than turning them into text as it would
with ReadLine(). From my experience, the longer you stay in the binary
world, the faster most operations are.

Oh, yeah - I have a custom stream reader implementation for a delimited file
format (records are delimited by non-printable binary chars). I basically
basically use a binary reader to read in chuncks at a time (these files get
really big) - and then basically find the markers and then return a string
from the data in between...

Theoretically, you could do the same thing with a newline delimited text file
- simply read it in binary chuncks, locate and count newlines until you get to
the desired bit and then start returning strings. It should be very fast.
NOTE: I have not reflected up the .NET 4.0 base classes from IL so I am
not sure what is actually going on. It is just my assumption that the
conversion to actual strings would be deferred for perf.

From the standpoint of the OP, it would be more maintainable and easier
to read code, so there is still a benefit, even if it has to read
through the large file.

Well, unless the records are a fixed length, you almost have to. But, you
certainly could make it faster by doing the inital reads in binary mode
(filestream). This is somewhat easier in C# though, because of it's ability
to create a custom iterator so easialy... Still possilbe in VB.NET - just a
bit more work :)
 
Gregory said:
Technically, I would assume you are correct. It will still have to read
through, but it could still be more efficient, as it rips through bytes
looking for return chars rather than turning them into text as it would
with ReadLine(). From my experience, the longer you stay in the binary
world, the faster most operations are.

NOTE: I have not reflected up the .NET 4.0 base classes from IL so I am
not sure what is actually going on. It is just my assumption that the
conversion to actual strings would be deferred for perf.

The ReadLines method return IEnumerable<string>, unless the Skip method
is specifically written to recognise the enumeration returned from
ReadLines so that it can call some other method on ut, it will still
have to decode each line into a string.
From the standpoint of the OP, it would be more maintainable and easier
to read code, so there is still a benefit, even if it has to read
through the large file.

Yes, it's pretty clear what the result of the code is, which is often
the most desirable aspect. It could be optimised a bit by only looking
for the line breaks in the stream, but still every byte in the file up
to the desired line has to be read and processed.
 
Back
Top