file parsing

  • Thread starter Thread starter sali
  • Start date Start date
S

sali

i have a huge text file, having multiple lines [allways the same number for
each] for one record, and want to parse it in the loop.

thought to use:

bat1.bat -----------
:lab
set /p line1=
set /p line2=
set /p line3=
goto lab
--------

and later:

type text1.txt | bat1.bat

expecting bat1 to readin whole stdin, assigning successive lines to vars
line1, line2, line2 in turn, until file sent to stdin exhausted.
but, just line1 set to first line, and nothing else.

what's wrong, or "set /p" doesn't able to read stdin?

is there some easy other way to parse multiple lines text file?

thnx
 
sali said:
i have a huge text file, having multiple lines [allways the same number for
each] for one record, and want to parse it in the loop.

thought to use:

bat1.bat -----------
:lab
set /p line1=
set /p line2=
set /p line3=
goto lab
--------

and later:

type text1.txt | bat1.bat

expecting bat1 to readin whole stdin, assigning successive lines to vars
line1, line2, line2 in turn, until file sent to stdin exhausted.
but, just line1 set to first line, and nothing else.

what's wrong, or "set /p" doesn't able to read stdin?

is there some easy other way to parse multiple lines text file?

thnx

It's quite possible using the syntax

FOR /F "delims=" %%i in (filename.txt) do something %%i

If you were to reveal what it was that you wanted to do rather than tell us
a number of ways whatever mysterious thing it is that you want to do CAN'T
be done, we may be able to help more.

....Perhaps the group alt.msdos.batch.nt might give you some insight - or try
executing
FOR/?
from the prompt

HTH

....Bill
 
billious said:
sali said:
i have a huge text file, having multiple lines [allways the same number
for each] for one record, and want to parse it in the loop.

thought to use:

bat1.bat -----------
:lab
set /p line1=
set /p line2=
set /p line3=
goto lab
--------

and later:

type text1.txt | bat1.bat

expecting bat1 to readin whole stdin, assigning successive lines to vars
line1, line2, line2 in turn, until file sent to stdin exhausted.
but, just line1 set to first line, and nothing else.

what's wrong, or "set /p" doesn't able to read stdin?

is there some easy other way to parse multiple lines text file?

thnx

It's quite possible using the syntax

FOR /F "delims=" %%i in (filename.txt) do something %%i

If you were to reveal what it was that you wanted to do rather than tell
us a number of ways whatever mysterious thing it is that you want to do
CAN'T be done, we may be able to help more.

ok, here is the problem:
text file contains 3000000 (milion) lines, each thre of them representing
data of one record (3 fields), so there are 1000000 logical records in file.
i want to read in 3by3by3... lines, put each of those 3 lines in separate
var, test vars according some criteria, and if satisfied, write them (echo
var >> output.txt) to output file.

as far, i see "for /f" to read in whole single line, call subroutine, then
use loop counter in called subroutine to change "state machine" indicating
each 3rd line to be read and performing action on that event.

yes, it could work, just hoped to understand "stdin" handling on top of cmd
prompt.
 
billious said:
sali said:
i have a huge text file, having multiple lines [allways the same number
for each] for one record, and want to parse it in the loop.

thought to use:

bat1.bat -----------
:lab
set /p line1=
set /p line2=
set /p line3=
goto lab
--------

and later:

type text1.txt | bat1.bat

expecting bat1 to readin whole stdin, assigning successive lines to vars
line1, line2, line2 in turn, until file sent to stdin exhausted.
but, just line1 set to first line, and nothing else.

what's wrong, or "set /p" doesn't able to read stdin?

is there some easy other way to parse multiple lines text file?

thnx

It's quite possible using the syntax

FOR /F "delims=" %%i in (filename.txt) do something %%i

If you were to reveal what it was that you wanted to do rather than tell
us a number of ways whatever mysterious thing it is that you want to do
CAN'T be done, we may be able to help more.

ok, here is the problem:
text file contains 3000000 (milion) lines, each thre of them representing
data of one record (3 fields), so there are 1000000 logical records in file.
i want to read in 3by3by3... lines, put each of those 3 lines in separate
var, test vars according some criteria, and if satisfied, write them (echo
var >> output.txt) to output file.

as far, i see "for /f" to read in whole single line, call subroutine, then
use loop counter in called subroutine to change "state machine" indicating
each 3rd line to be read and performing action on that event.

yes, it could work, just hoped to understand "stdin" handling on top of cmd
prompt.

Will the text contain any characters like <>|& or is it numerical or
alphanumerical data? It make no difference if the file contains 30 or 30
million lines - the code will be the same.
 
foxidrive said:
billious said:
i have a huge text file, having multiple lines [allways the same number
for each] for one record, and want to parse it in the loop.
..

Will the text contain any characters like <>|& or is it numerical or
alphanumerical data? It make no difference if the file contains 30 or 30
million lines - the code will be the same.


yes, it is general text file, so any printable char should be assumed,
starting with space, up to chr(255) [because international codepage is
possible], the only nonprintable chars are cr/lf pairs, and [maybe] chr(26)
at the end of the file.
 
foxidrive said:
billious said:
i have a huge text file, having multiple lines [allways the same number
for each] for one record, and want to parse it in the loop.
alphanumerical data? It make no difference if the file contains 30 or 30
million lines - the code will be the same.

i was afraid that it maybe depends.
if code works on pure data stream, the size is [of course] irrelevant, but,
if code uses any variant of data buffering [even on virtual memory] very
large files may crash the system.
 
foxidrive said:
i have a huge text file, having multiple lines [allways the same number
for each] for one record, and want to parse it in the loop.
.

Will the text contain any characters like <>|& or is it numerical or
alphanumerical data? It make no difference if the file contains 30 or 30
million lines - the code will be the same.


yes, it is general text file, so any printable char should be assumed,
starting with space, up to chr(255) [because international codepage is
possible], the only nonprintable chars are cr/lf pairs, and [maybe] chr(26)
at the end of the file.

You'll need to use WSH or GAWK or another HLL. Batch is unsuitable for
general processing of text which contains characters like those I
mentioned.
 
sali said:
foxidrive said:
i have a huge text file, having multiple lines [allways the same number
for each] for one record, and want to parse it in the loop.
alphanumerical data? It make no difference if the file contains 30 or 30
million lines - the code will be the same.

i was afraid that it maybe depends.
if code works on pure data stream, the size is [of course] irrelevant,
but, if code uses any variant of data buffering [even on virtual memory]
very large files may crash the system.

It's not a size problem at all - it's a content problem.

Crudely, the following should work - just put your processing between lines
[15] and [16]

BUT this will probably NOT work if any of the elements contains characters
outside of the range [space..'~'] - and will have problems with certain
characters within that range - those with special meaning to batch, such as
[&!%<>,*?"]


[01]@echo off
[02]set ye1=&set ye2=&set ye3=
[03]for /f "delims=" %%i in (3l.dat) do (
[04]if defined ye1 (
[05] if defined ye2 (set ye3=%%i&call :process
[06] ) else (
[07] set ye2=%%i
[08] )) else (
[09] set ye1=%%i
[10] )
[11])
[12]goto :eof
[13]
[14]:process
[15]echo element 1=%ye1% element 2=%ye2% element 3=%ye3%
[16]set ye1=&set ye2=&set ye3=
[17]goto :eof


Each line begins [number]. Lines will be wrapped in transmission and need to
be rejoined. The [number] at the beginning of each line needs to be removed.

....But overall, I'd be tempted to use SED or (g)awk as suggested by foxi.

HTH

....Bill
 
billious said:
sali said:
foxidrive said:
On Wed, 9 Nov 2005 16:32:38 +0100, sali wrote:


i have a huge text file, having multiple lines [allways the same
number
for each] for one record, and want to parse it in the loop.

alphanumerical data? It make no difference if the file contains 30 or
30
million lines - the code will be the same.

i was afraid that it maybe depends.
if code works on pure data stream, the size is [of course] irrelevant,
but, if code uses any variant of data buffering [even on virtual memory]
very large files may crash the system.

It's not a size problem at all - it's a content problem.

Crudely, the following should work - just put your processing between
lines [15] and [16]

BUT this will probably NOT work if any of the elements contains characters
outside of the range [space..'~'] - and will have problems with certain
characters within that range - those with special meaning to batch, such
as [&!%<>,*?"]


[01]@echo off
[02]set ye1=&set ye2=&set ye3=
[03]for /f "delims=" %%i in (3l.dat) do (
[04]if defined ye1 (
[05] if defined ye2 (set ye3=%%i&call :process
[06] ) else (
[07] set ye2=%%i
[08] )) else (
[09] set ye1=%%i
[10] )
[11])
[12]goto :eof
[13]
[14]:process
[15]echo element 1=%ye1% element 2=%ye2% element 3=%ye3%
[16]set ye1=&set ye2=&set ye3=
[17]goto :eof


Each line begins [number]. Lines will be wrapped in transmission and need
to be rejoined. The [number] at the beginning of each line needs to be
removed.

...But overall, I'd be tempted to use SED or (g)awk as suggested by foxi.



thanks for the encourragement to use "for /f", here is my [tested & working]
example, special chars and national alphabet [up to chr(255)] are *not* the
problem, it is a variation of your idea.

-------
set BROJAC=1
set MAXLIN=6

for /f "delims=" %%i in (mojtekst.txt) do call :l1 %%i

goto :EOF

:l1
set LIN_%BROJAC%=%1
set /a BROJAC=%BROJAC%+1

if %BROJAC% GTR %MAXLIN% (
set BROJAC=1
ren %LIN_2%\%LIN_5% %LIN_1%.mp3
goto :EOF
)

goto :EOF
-----------------

this example, using database file renames files in folder tree, according to
given renaming scheme.


but again, doc explicitly says that input file for "for /f" is *copied* to
memory before processing, so realy huge file with low memory conditions may
crash the system.
 
sali said:
billious said:
sali said:
On Wed, 9 Nov 2005 16:32:38 +0100, sali wrote:


i have a huge text file, having multiple lines [allways the same
number
for each] for one record, and want to parse it in the loop.

alphanumerical data? It make no difference if the file contains 30 or
30
million lines - the code will be the same.

i was afraid that it maybe depends.
if code works on pure data stream, the size is [of course] irrelevant,
but, if code uses any variant of data buffering [even on virtual memory]
very large files may crash the system.

It's not a size problem at all - it's a content problem.

Crudely, the following should work - just put your processing between
lines [15] and [16]

BUT this will probably NOT work if any of the elements contains
characters outside of the range [space..'~'] - and will have problems
with certain characters within that range - those with special meaning to
batch, such as [&!%<>,*?"]


[01]@echo off
[02]set ye1=&set ye2=&set ye3=
[03]for /f "delims=" %%i in (3l.dat) do (
[04]if defined ye1 (
[05] if defined ye2 (set ye3=%%i&call :process
[06] ) else (
[07] set ye2=%%i
[08] )) else (
[09] set ye1=%%i
[10] )
[11])
[12]goto :eof
[13]
[14]:process
[15]echo element 1=%ye1% element 2=%ye2% element 3=%ye3%
[16]set ye1=&set ye2=&set ye3=
[17]goto :eof


Each line begins [number]. Lines will be wrapped in transmission and need
to be rejoined. The [number] at the beginning of each line needs to be
removed.

...But overall, I'd be tempted to use SED or (g)awk as suggested by foxi.



thanks for the encourragement to use "for /f", here is my [tested &
working] example, special chars and national alphabet [up to chr(255)] are
*not* the problem, it is a variation of your idea.

-------
set BROJAC=1
set MAXLIN=6

for /f "delims=" %%i in (mojtekst.txt) do call :l1 %%i

goto :EOF

:l1
set LIN_%BROJAC%=%1
set /a BROJAC=%BROJAC%+1

if %BROJAC% GTR %MAXLIN% (
set BROJAC=1
ren %LIN_2%\%LIN_5% %LIN_1%.mp3
goto :EOF
)

goto :EOF
-----------------

this example, using database file renames files in folder tree, according
to given renaming scheme.


but again, doc explicitly says that input file for "for /f" is *copied* to
memory before processing, so realy huge file with low memory conditions
may crash the system.

Fair 'nuff. If it works for you, so be it!

I'd caution on the presence of spaces, "&" and "%" in the filenames/dirnames
you are processing (which may confuse the REN)

Perhaps you could use SED/(g)awk to pre-process the file - selecting out
lines 1,2 and 5 of each block of 6. This would reduce the size of file that
has to be handled by the batch-processing routine.

HTH

....Bill
 
Back
Top