Text editor to open massive files needed

B. R. 'BeAr' Ederson · Jan 7, 2005

Are you accepting the challenge?

If he did, he should take a different approach than he described above.
With todays typical physical and virtual memory sizes such files should
only be edited 'disk based'. That said: Even if the file has to be
changed in size, parts of it have to be resorted, and so on - one ought
to load a manageable amount of sectors to memory and leave the rest
temporarily untouched. There's a lot of housekeeping needed to do this
in a clean way, though... ;-)

The OP sounded like the file was some sort of csv?

Actually, I don't think so. I rather believe in the ESRI *.e00 exchange
format or similar data. I've only seen such amounts of GIS data apart
from dedicated GIS servers if people blunder with the latest ArcGIS.
:-(

To be more specific: the *.e00 files contain sectionwise fixed length
data streams with entry headers (for instance file names and column
descriptions); all maybe of different wide. And it might be packed,
moreover. Could be a mere mess to deal with. :-(

BeAr

skyman302 · Jan 7, 2005

Sorry for taking so long to respond.

Heres a few sample lines

372403915 03/07/2004 00:33:11.540 69.711446 209.602310 134.482
-0.220000
372403915 03/07/2004 00:33:11.565 69.712961 209.601311 135.949
-0.225128
372403915 03/07/2004 00:33:11.590 69.714476 209.600318 138.315
-0.230256
372403915 03/07/2004 00:33:11.615 69.715991 209.599331 140.208
-0.235385

There should be 4 lines in all each with 7 collums. All I need is the
4,5,6 collums. I need to delete the rest of the collums.

skyman302 · Jan 7, 2005

Actually, it's satallite altimetry data. It's in plain text, in this
format:

44755360 02/21/2003 19:12:16.069 50.178520 117.101336 693.497
-13.290000
44755360 02/21/2003 19:12:16.094 50.180062 117.100966 693.953
-13.290000
44755360 02/21/2003 19:12:16.119 50.181605 117.100595 695.631
-13.290000
44755360 02/21/2003 19:12:16.144 50.183148 117.100225 695.450
-13.290000
44755360 02/21/2003 19:12:16.169 50.184692 117.099856 696.098
-13.290000
44755360 02/21/2003 19:12:16.194 50.186235 117.099487 695.057
-13.290000

If it's wrapped then know there should be 5 lines of it.

skyman302 · Jan 7, 2005

So the part I want starts after the 37th character in each row and ends
after the 69th character. Also, It doesn't seem to display in the data
I provided that theres 3 spaces before the first collumn.

Vic Dura · Jan 7, 2005

So the part I want starts after the 37th character in each row and ends
after the 69th character. Also, It doesn't seem to display in the data
I provided that theres 3 spaces before the first collumn.

I've emailed you a program that will do this for you.

B. R. 'BeAr' Ederson · Jan 7, 2005

So the part I want starts after the 37th character in each row and ends
after the 69th character. Also, It doesn't seem to display in the data
I provided that theres 3 spaces before the first collumn.

This is a task for the cut command known from Unix. I tested a 10 GB
file containing a multiple of your data with the Windows port you'll
find here:

http://unxutils.sourceforge.net

Use the version found in UnxUpdates.zip. The command line is simple:

cut.exe -b37-69 in.txt > out.txt

It took nearly half an hour on a PIII-1000 (Win2k) and created 4,5 GB
output.

HTH.
BeAr

skyman302 · Jan 8, 2005

Thank you all so very much for the help. I couldn't have ever had the
slightest clue what to do without it.

Chaos Master · Jan 8, 2005

Dan Glybitz stated:

I was thinking more WTF? 8gb has got to be either a typo, or he's got the
entire contents of his local library backed up on his hard drive.

Sometimes you get those files from results of a simulation, say a
circuit or IC design, may give very big files - but 8GB? It would be a
very very long run or a very complex design.

Or it may be a "data dump" from a device/instrument with data output
to PC...

[]s
--
Chaos Master®, posting from Canoas, Brazil - 29.55° S / 51.11° W

"People told me I can't dress like a fairy.
I say, I'm in a rock band and I can do what the hell I want!"
-- Amy Lee

REM · Jan 8, 2005

jo <[email protected]> wrote:

The Vedit - Buyware - site is well worth a visit to get info about some
of this stuff.
Their basic editor is 80 dollars and bottles out at 2GB, though it can
split and recombine larger files.
For a mere 150 dollars more you get an editor that they claim can
effortlessly handle files of over 100GB.

http://www.vedit.com/huge_files.htm

Interesting. I'm not aware of any freeware that will do this (yet).

skyman302 · Jan 21, 2005

Okay, I have another problem now. I have all the data in three
collumns. But now I need to subtract 180 from column two and then
switch collumn 1 and collumn two's positions. Anyone have any scripts
they'd like to share to do this using the unix tools.

B. R. 'BeAr' Ederson · Jan 21, 2005

Okay, I have another problem now. I have all the data in three
collumns. But now I need to subtract 180 from column two and then
switch collumn 1 and collumn two's positions. Anyone have any scripts
they'd like to share to do this using the unix tools.

You should use the appropriate <cut> command line to get the rows sorted.
The following should do fine on your original data:

cut.exe -b47-57,37-46,62-69 in.txt > out.txt

If some of the numbers can grow larger than displayed you need to adjust
the byte positions above. You can also use the cut command on the already
created output, of course. If you used exactly the command I suggested
within an earlier post, you should be successful with the following
command:

cut.exe -b1-10,11-21,26-33 out.txt > out2.txt

Regarding your question to subtract 180: How shall the output look like?
If I use your sample there will be negative numbers. Do you want them
preceded with a '-' symbol or do you want just the absolute value?

BeAr

skyman302 · Jan 21, 2005

Regarding your question to subtract 180: How shall the output look
like?

If I use your sample there will be negative numbers. Do you want them
preceded with a '-' symbol or do you want just the absolute value?

I need them preceded by a '-' symbol.

B. R. 'BeAr' Ederson · Jan 22, 2005

I need them preceded by a '-' symbol.

So we'll change the utility. ;-) TCols from Rune Berg is a very good
tool for your needs. You get it here (inside the updates020225.zip):

http://home.online.no/~runeberg/tt/tthome.htm#download

If you want to start from scratch you can use this command line:

TCOLS -o, from in.txt to out.txt $5-180 $4 $6

The above gives you a comma separated file. Read the documentation
if you need another separator.

Of course you can also use the already cut list. You just need to
replace the $-values by the correct column numbers. That may also be
the better way if you are in a low memory situation. TCols seems a
bit more memory intensive than Cut. But I didn't give that a real
check. So I may be wrong.

Anyhow. TCols should be quiet as quick (or slow) as Cut. It takes
about the same time as Cut for a 10 GByte input file on my computer.
Be sure to check the result for correct completion. You can just
divide file size by number of chars per line plus 2 (CR+LF). Or
you use a text info tool like TFInfo from the main package of Rune
Bergs TextTools.

HTH.
BeAr

John · Jan 23, 2005

Probably a bit too late to be useful for the OP, but if anyone's
interested, the blog post at the URL below links to three editors that
can work on multi-gigabyte files without eating crazy amounts of RAM:

http://flatfly.blogspot.com/2005/01/3-editors-that-can-handle-huge-files.html

aafuss · Jan 23, 2005

Babya E-Type:
http://fileforum.betanews.com/detail/Babya_EType/1105869579

B. R. 'BeAr' Ederson · Jan 23, 2005

Probably a bit too late to be useful for the OP, but if anyone's
interested, the blog post at the URL below links to three editors that
can work on multi-gigabyte files without eating crazy amounts of RAM:

http://flatfly.blogspot.com/2005/01/3-editors-that-can-handle-huge-files.html

HexEdit only loads up to the 4 GByte boundary (according to its web site).
The iHex program has to be treated with care. Its (shown) address space
only covers 4 GByte. So you will not exactly know where you are inside a
large file... (And I can't say if the program knows the correct position,
inside.)

Thought I mention this, in case sb. wants to use these programs. ;-)

BeAr

John · Jan 23, 2005

Thanks for these useful details.

Another editor that claims to be able to effortlessly open ridiculously
large (up to 15TB!!) files is Hex Editor 3.0 from HHD:
http://www.hhdsoftware.com/hexeditor_beta.html
It's still a beta though...

Sietse Fliege · Jan 23, 2005

John said:
Thanks for these useful details.

Another editor that claims to be able to effortlessly open
ridiculously large (up to 15TB!!) files is Hex Editor 3.0 from HHD:
http://www.hhdsoftware.com/hexeditor_beta.html
It's still a beta though...

And if for viewing/searching only: there is also ListXP

B. R. 'BeAr' Ederson · Jan 23, 2005

Non-functional. Btw.: I can't create a test file of TByte-size at the
moment... ;-)

And if for viewing/searching only: there is also ListXP

Christian Ghislers 'Lister' opens and browses 10 GByte with no visible
delay and has text (Ansi/ASCII/OEM) and hex mode next to good searching
abilities.

BeAr

Sietse Fliege · Jan 24, 2005

Sietse said:
And if for viewing/searching only: there is also ListXP

I forgot: http://www.listxp.com/

Text editor to open massive files needed

B. R. 'BeAr' Ederson

skyman302

skyman302

skyman302

Vic Dura

B. R. 'BeAr' Ederson

skyman302

Chaos Master

REM

skyman302

B. R. 'BeAr' Ederson

skyman302

B. R. 'BeAr' Ederson

John

aafuss

B. R. 'BeAr' Ederson

John

Sietse Fliege

B. R. 'BeAr' Ederson

Sietse Fliege