Text editor to open massive files needed

  • Thread starter Thread starter skyman302
  • Start date Start date
Are you accepting the challenge?

If he did, he should take a different approach than he described above.
With todays typical physical and virtual memory sizes such files should
only be edited 'disk based'. That said: Even if the file has to be
changed in size, parts of it have to be resorted, and so on - one ought
to load a manageable amount of sectors to memory and leave the rest
temporarily untouched. There's a lot of housekeeping needed to do this
in a clean way, though... ;-)
The OP sounded like the file was some sort of csv?

Actually, I don't think so. I rather believe in the ESRI *.e00 exchange
format or similar data. I've only seen such amounts of GIS data apart
from dedicated GIS servers if people blunder with the latest ArcGIS.
:-(

To be more specific: the *.e00 files contain sectionwise fixed length
data streams with entry headers (for instance file names and column
descriptions); all maybe of different wide. And it might be packed,
moreover. Could be a mere mess to deal with. :-(

BeAr
 
Sorry for taking so long to respond.

Heres a few sample lines

372403915 03/07/2004 00:33:11.540 69.711446 209.602310 134.482
-0.220000
372403915 03/07/2004 00:33:11.565 69.712961 209.601311 135.949
-0.225128
372403915 03/07/2004 00:33:11.590 69.714476 209.600318 138.315
-0.230256
372403915 03/07/2004 00:33:11.615 69.715991 209.599331 140.208
-0.235385

There should be 4 lines in all each with 7 collums. All I need is the
4,5,6 collums. I need to delete the rest of the collums.
 
Actually, it's satallite altimetry data. It's in plain text, in this
format:

44755360 02/21/2003 19:12:16.069 50.178520 117.101336 693.497
-13.290000
44755360 02/21/2003 19:12:16.094 50.180062 117.100966 693.953
-13.290000
44755360 02/21/2003 19:12:16.119 50.181605 117.100595 695.631
-13.290000
44755360 02/21/2003 19:12:16.144 50.183148 117.100225 695.450
-13.290000
44755360 02/21/2003 19:12:16.169 50.184692 117.099856 696.098
-13.290000
44755360 02/21/2003 19:12:16.194 50.186235 117.099487 695.057
-13.290000

If it's wrapped then know there should be 5 lines of it.
 
So the part I want starts after the 37th character in each row and ends
after the 69th character. Also, It doesn't seem to display in the data
I provided that theres 3 spaces before the first collumn.
 
So the part I want starts after the 37th character in each row and ends
after the 69th character. Also, It doesn't seem to display in the data
I provided that theres 3 spaces before the first collumn.

I've emailed you a program that will do this for you.
 
So the part I want starts after the 37th character in each row and ends
after the 69th character. Also, It doesn't seem to display in the data
I provided that theres 3 spaces before the first collumn.

This is a task for the cut command known from Unix. I tested a 10 GB
file containing a multiple of your data with the Windows port you'll
find here:

http://unxutils.sourceforge.net

Use the version found in UnxUpdates.zip. The command line is simple:

cut.exe -b37-69 in.txt > out.txt

It took nearly half an hour on a PIII-1000 (Win2k) and created 4,5 GB
output.

HTH.
BeAr
 
Thank you all so very much for the help. I couldn't have ever had the
slightest clue what to do without it.
 
Dan Glybitz stated:
I was thinking more WTF? 8gb has got to be either a typo, or he's got the
entire contents of his local library backed up on his hard drive.

Sometimes you get those files from results of a simulation, say a
circuit or IC design, may give very big files - but 8GB? It would be a
very very long run or a very complex design.

Or it may be a "data dump" from a device/instrument with data output
to PC...

[]s
--
Chaos Master®, posting from Canoas, Brazil - 29.55° S / 51.11° W

"People told me I can't dress like a fairy.
I say, I'm in a rock band and I can do what the hell I want!"
-- Amy Lee
 
Okay, I have another problem now. I have all the data in three
collumns. But now I need to subtract 180 from column two and then
switch collumn 1 and collumn two's positions. Anyone have any scripts
they'd like to share to do this using the unix tools.
 
Okay, I have another problem now. I have all the data in three
collumns. But now I need to subtract 180 from column two and then
switch collumn 1 and collumn two's positions. Anyone have any scripts
they'd like to share to do this using the unix tools.

You should use the appropriate <cut> command line to get the rows sorted.
The following should do fine on your original data:

cut.exe -b47-57,37-46,62-69 in.txt > out.txt

If some of the numbers can grow larger than displayed you need to adjust
the byte positions above. You can also use the cut command on the already
created output, of course. If you used exactly the command I suggested
within an earlier post, you should be successful with the following
command:

cut.exe -b1-10,11-21,26-33 out.txt > out2.txt

Regarding your question to subtract 180: How shall the output look like?
If I use your sample there will be negative numbers. Do you want them
preceded with a '-' symbol or do you want just the absolute value?

BeAr
 
Regarding your question to subtract 180: How shall the output look
like?
If I use your sample there will be negative numbers. Do you want them
preceded with a '-' symbol or do you want just the absolute value?
I need them preceded by a '-' symbol.
 
I need them preceded by a '-' symbol.

So we'll change the utility. ;-) TCols from Rune Berg is a very good
tool for your needs. You get it here (inside the updates020225.zip):

http://home.online.no/~runeberg/tt/tthome.htm#download

If you want to start from scratch you can use this command line:

TCOLS -o, from in.txt to out.txt $5-180 $4 $6

The above gives you a comma separated file. Read the documentation
if you need another separator.

Of course you can also use the already cut list. You just need to
replace the $-values by the correct column numbers. That may also be
the better way if you are in a low memory situation. TCols seems a
bit more memory intensive than Cut. But I didn't give that a real
check. So I may be wrong.

Anyhow. TCols should be quiet as quick (or slow) as Cut. It takes
about the same time as Cut for a 10 GByte input file on my computer.
Be sure to check the result for correct completion. You can just
divide file size by number of chars per line plus 2 (CR+LF). Or
you use a text info tool like TFInfo from the main package of Rune
Bergs TextTools.

HTH.
BeAr
 
Probably a bit too late to be useful for the OP, but if anyone's
interested, the blog post at the URL below links to three editors that
can work on multi-gigabyte files without eating crazy amounts of RAM:

http://flatfly.blogspot.com/2005/01/3-editors-that-can-handle-huge-files.html

HexEdit only loads up to the 4 GByte boundary (according to its web site).
The iHex program has to be treated with care. Its (shown) address space
only covers 4 GByte. So you will not exactly know where you are inside a
large file... (And I can't say if the program knows the correct position,
inside.)

Thought I mention this, in case sb. wants to use these programs. ;-)

BeAr
 
Non-functional. Btw.: I can't create a test file of TByte-size at the
moment... ;-)
And if for viewing/searching only: there is also ListXP

Christian Ghislers 'Lister' opens and browses 10 GByte with no visible
delay and has text (Ansi/ASCII/OEM) and hex mode next to good searching
abilities.

BeAr
 
Back
Top