Sorting text files - how ?

  • Thread starter Thread starter John Fitzsimons
  • Start date Start date
J

John Fitzsimons

Hi,

Suppose I have the following text details :

9/Aug/04 02:47: /~johnf/web15.jpg
9/Aug/04 02:02: /~johnf/knight.gif
9/Aug/04 01:40: /~johnf/spirits.htm
9/Aug/04 02:02: /~johnf/welcome.htm
9/Aug/04 02:47: /~johnf/chspirit.htm
9/Aug/04 02:20: /~johnf/ouija.htm
9/Aug/04 02:02: /~johnf/world2.jpg
9/Aug/04 02:02: /~johnf/blue.jpg

Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

....................1.gi
....................2.gi
....................3.gif
....................1.htm
....................2.htm

etc. etc.

How would I do this ? Sorting from the end of a line rather than the
beginning ? What method/program/utility would people here suggest
please ?

Regards, John.
 
John Fitzsimons said:
Suppose I want to sort things so that all the .gifs, htm etc.
files are together ? Or perhaps together and sorted in each
category eg.

Perfect job for a small Python script. Run over the files, use a
simple regular expression to isolate the file extension, and you can
sort/copy/move/... the files based on the type.

Regards,
Wald
 
John Fitzsimons <[email protected]> wrote:
Suppose I have the following text details :
9/Aug/04 02:47: /~johnf/web15.jpg
9/Aug/04 02:02: /~johnf/knight.gif
9/Aug/04 01:40: /~johnf/spirits.htm
9/Aug/04 02:02: /~johnf/welcome.htm
9/Aug/04 02:47: /~johnf/chspirit.htm
9/Aug/04 02:20: /~johnf/ouija.htm
9/Aug/04 02:02: /~johnf/world2.jpg
9/Aug/04 02:02: /~johnf/blue.jpg
Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

How would I do this ? Sorting from the end of a line rather than the
beginning ? What method/program/utility would people here suggest
please ?

You can use the dir command to compile the list:

dir /O:E > c:\johnf.txt

You might have different switches, this is for XP:

DIR [/O[[:]sortorder]]

/O List by files in sorted order.
sortorder N By name (alphabetic) S By size (smallest first)
E By extension (alphabetic) D By date/time (oldest first)
G Group directories first - Prefix to reverse order

Produces this file list sorted by extension:

Volume in drive C has no label
Volume Serial Number is 70DF-1EF4
Directory of C:\BIN\TEMP

.. <DIR> 07/24/04 7:14a
... <DIR> 07/24/04 7:14a
FILE_ID DIZ 455 06/21/00 8:19p
OCSREG EXE 32768 06/18/00 8:51p
PAR148 EXE 592088 03/08/99 2:02a
SETUP EXE 77824 06/22/00 7:04p
FASTCOMP EX_ 60657 10/14/00 9:18a
SETUP EX_ 35862 06/22/00 7:04p
SFXCRE~1 EX_ 55332 10/14/00 9:23a
OCSFXPAK HL_ 41329 06/20/00 9:22p
OCSFXP50 HTM 3268 06/21/00 8:43p
SETUP INI 2249 06/22/00 8:53a
SM PAR 58062 01/27/02 8:36a
LICENSE TXT 4796 06/18/00 10:23a
ORDER TXT 2463 06/21/00 10:27p
README TXT 2221 06/21/00 11:06a
VENDOR TXT 2034 06/21/00 10:52a
WHATSNEW TXT 229 06/20/00 8:50p
OPSOFT UR_ 109 07/01/00 7:17a
OCSFXP50 XML 11306 06/21/00 8:43p
OCSFXP5A ZIP 234432 01/27/02 8:46a
21 file(s) 1217484 bytes
1023932928 bytes free

Variable width fonts make the above look unformatted. The actual result is very
neat.
 
John said:
Hi,

Suppose I have the following text details :

9/Aug/04 02:47: /~johnf/web15.jpg
9/Aug/04 02:02: /~johnf/knight.gif
9/Aug/04 01:40: /~johnf/spirits.htm
9/Aug/04 02:02: /~johnf/welcome.htm
9/Aug/04 02:47: /~johnf/chspirit.htm
9/Aug/04 02:20: /~johnf/ouija.htm
9/Aug/04 02:02: /~johnf/world2.jpg
9/Aug/04 02:02: /~johnf/blue.jpg

Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

...................1.gi
...................2.gi
...................3.gif
...................1.htm
...................2.htm

etc. etc.

How would I do this ? Sorting from the end of a line rather than the
beginning ? What method/program/utility would people here suggest
please ?

Regards, John.

Do you mean you have a text *list* of the file names or are you trying to
sort the files themselves? If the latter, none of them are text but can be
sorted by extension by browsing to the folder and clicking on the "Type"
column in the right hand window. In order to have that column you must set
"View" to "Details".

--
dadiOH
_____________________________

dadiOH's dandies v3.0...
....a help file of info about MP3s, recording from
LP/cassette and tips & tricks on this and that.
Get it at http://mysite.verizon.net/xico
____________________________
 
lunedì 09/ago/2004 _dadiOH_ in said:
Do you mean you have a text *list* of the file names or are you trying to
sort the files themselves? If the latter, none of them are text but can be
sorted by extension by browsing to the folder and clicking on the "Type"
column in the right hand window. In order to have that column you must set
"View" to "Details".

I thought the same thing, then I saw that he wants also, for example,
....2.htm after ...1.htm, and I don't know how to achieve that in the Details
View.
 
John said:
Suppose I have the following text details :

9/Aug/04 02:47: /~johnf/web15.jpg
9/Aug/04 02:02: /~johnf/knight.gif
9/Aug/04 01:40: /~johnf/spirits.htm
9/Aug/04 02:02: /~johnf/welcome.htm
9/Aug/04 02:47: /~johnf/chspirit.htm
9/Aug/04 02:20: /~johnf/ouija.htm
9/Aug/04 02:02: /~johnf/world2.jpg
9/Aug/04 02:02: /~johnf/blue.jpg

Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

...................1.gi
...................2.gi
...................3.gif
...................1.htm
...................2.htm

etc. etc.

How would I do this ? Sorting from the end of a line rather than the
beginning ? What method/program/utility would people here suggest
please ?

It might be easier to look at the software you are using to generate
the list.
A43 will generate a text file of folder contents depending how you
have them sorted, so if you sort by type A43 will print a txt file
sorted by type.
Free Commander will always print in alphabetical order...
 
John Fitzsimons said:
Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

TrackerV3 can sort by extension, and then by name as 2nd criterion. If
that's what you want try: http://www.trackerv3.com/

Donald
 
Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

...................1.gi
...................2.gi
...................3.gif
...................1.htm
...................2.htm

etc. etc.

How would I do this ? Sorting from the end of a line rather than the
beginning ? What method/program/utility would people here suggest
please ?

Hm. Seems you *really* want to sort the text but tell your favorite file
lister to sort itself. There is one program, I know of, which *nearly*
does this kind of things. (But in fact fails on your concrete problem.)

There was a PC-Mag Utility in the early '90th (when they were undoubtedly
freeware) called PCSort. Look here:

http://short.stop.home.att.net/freesoft/txtfrmt1.htm

Unfortunately, it uses only Spaces as word bounaries, while you need
dot and backslash. But it *does* permit sorting word-wise from end
and from beginning of the line, likewise.

If you want invest a bit of time: It is accompanied by Assembler source.
You only need to change the word parsing subroutine. Replace the SPACE
immediate by 0x2E and you have your first boundary. (This can be even
done by patching.) Add an additional comparison with 0x5C and you have
your backslash test. After that your sorting only fails when you have
files with more than one dot in their names. ;-)

BeAr
 
Suppose I have the following text details :

9/Aug/04 02:47: /~johnf/web15.jpg
9/Aug/04 02:02: /~johnf/knight.gif
9/Aug/04 01:40: /~johnf/spirits.htm
9/Aug/04 02:02: /~johnf/welcome.htm
9/Aug/04 02:47: /~johnf/chspirit.htm
9/Aug/04 02:20: /~johnf/ouija.htm
9/Aug/04 02:02: /~johnf/world2.jpg
9/Aug/04 02:02: /~johnf/blue.jpg

Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

...................1.gi
...................2.gi
...................3.gif
...................1.htm
...................2.htm

etc. etc.

How would I do this ? Sorting from the end of a line rather than the
beginning ? What method/program/utility would people here suggest
please ?

My favorite solution is already built into the operating system, if
you can get comfortable using command windows.

Pop up a command window in the folder you want to sort, then type
DIR /OEN
to sort first by file extension (the type of file) then by name.
DIR /OEN >TEMP.TXT
puts the sorted listing in a file you can open with NotePad.

For an easy way to open a command window in any folder you're looking
at in Windows Explorer, see
<http://www.jsiinc.com/SUBB/tip0500/rh0594.htm>

If that's too much trouble, try Karen's Directory Printer.
<http://www.karenware.com/powertools/ptdirprn.asp>
 
John Fitzsimons wrote:
Do you mean you have a text *list* of the file names or are you trying to
sort the files themselves?

< snip >

The former. It is a web log of page access'. There is more info prior
to the 9 but that is irrelevant to the sorting I want to do.

Like the header says. I am sorting a text file. Not creating one.

Regards, John.
 
There was a PC-Mag Utility in the early '90th (when they were undoubtedly
freeware) called PCSort. Look here:

Unfortunately, it uses only Spaces as word bounaries, while you need
dot and backslash.
:-(

But it *does* permit sorting word-wise from end
and from beginning of the line, likewise.

< snip >

Thanks. At least I now know I am not the first person in the world who
wanted to sort a text file list from the R.H.S. :-)

Regards, John.
 
Mark R. Blain wrote ....

| My favorite solution is already built into the operating system,
| if you can get comfortable using command windows.

| Pop up a command window in the folder you want to sort,
| then type
|
| DIR /OEN
|
| to sort first by file extension (the type of file) then by name.
|
| DIR /OEN >TEMP.TXT
| puts the sorted listing in a file you can open with NotePad.
| ....

Since I'm too senile to remember the command-line parameters
I stick the following line in the autoexec.bat file ....

SET DIRCMD=/O:GN

Then, using only dir produces a sorted listing ....
 
Thanks. At least I now know I am not the first person in the world who
wanted to sort a text file list from the R.H.S. :-)

If you just want to group by extension and don't insist in the correct
sorting order you can do a char-wise sort from the right hand side with:

http://pweb.sophia.ac.jp/~britto/rls

Maybe you can ask the author to improve the program to your needs? At
least it could be a challenge and would make his program even more
special. ;-)

BeAr
 
It was a dark and stormy night when John Fitzsimons <DELETEucwubqf02
@sneakemail.com said:
Suppose I have the following text details :

9/Aug/04 02:47: /~johnf/web15.jpg
9/Aug/04 02:02: /~johnf/knight.gif
9/Aug/04 01:40: /~johnf/spirits.htm
9/Aug/04 02:02: /~johnf/welcome.htm
9/Aug/04 02:47: /~johnf/chspirit.htm
9/Aug/04 02:20: /~johnf/ouija.htm
9/Aug/04 02:02: /~johnf/world2.jpg
9/Aug/04 02:02: /~johnf/blue.jpg

Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

...................1.gi
...................2.gi
...................3.gif
...................1.htm
...................2.htm

etc. etc.

How would I do this ? Sorting from the end of a line rather than
the beginning ? What method/program/utility would people here
suggest please ?

If all the lines in your text file have the same structure as your
example then there is one solution but only if you don't mind using
command line tools like Cygwin:
<http://www.cygwin.com/>

The steps required are:

Start Cygwin and change to the directory where your text file is,
then use the command:

sed -e "s/^\(.\+\)\/\(.\+\)\.\(.\+\)$/\3\t\2\t\1/" your-text-file.txt>foo.txt

That will change the format from:

9/Aug/04 02:47: /~johnf/web15.jpg
9/Aug/04 02:02: /~johnf/knight.gif
9/Aug/04 01:40: /~johnf/spirits.htm
9/Aug/04 02:02: /~johnf/welcome.htm
9/Aug/04 02:47: /~johnf/chspirit.htm
9/Aug/04 02:20: /~johnf/ouija.htm
9/Aug/04 02:02: /~johnf/world2.jpg
9/Aug/04 02:02: /~johnf/blue.jpg

To:

jpg web15 9/Aug/04 02:47: /~johnf
gif knight 9/Aug/04 02:02: /~johnf
htm spirits 9/Aug/04 01:40: /~johnf
htm welcome 9/Aug/04 02:02: /~johnf
htm chspirit 9/Aug/04 02:47: /~johnf
htm ouija 9/Aug/04 02:20: /~johnf
jpg world2 9/Aug/04 02:02: /~johnf
jpg blue 9/Aug/04 02:02: /~johnf

Then sort the file using the command:

sort foo.txt>foo-2.txt

And revert to the original format with:

sed -e "s/^\([^\t]\+\)\t\([^\t]\+\)\t\(.\+\)$/\3\/\2.\1/" foo-2.txt>foo-3.txt

And finally:

cat foo-3.txt|unix2dos>final.txt

As you can see this method is not for the weak or faint of heart ;-)

Regards
 
John Fitzsimons said:
What method/program/utility would people here suggest please ?

Well, now that I understood better what you want: I could write that utility
for you. But, oh, this is a freeware NG ... ;)

Donald
 
John said:
Hi,

Suppose I have the following text details :

9/Aug/04 02:47: /~johnf/web15.jpg
9/Aug/04 02:02: /~johnf/knight.gif
9/Aug/04 01:40: /~johnf/spirits.htm
9/Aug/04 02:02: /~johnf/welcome.htm
9/Aug/04 02:47: /~johnf/chspirit.htm
9/Aug/04 02:20: /~johnf/ouija.htm
9/Aug/04 02:02: /~johnf/world2.jpg
9/Aug/04 02:02: /~johnf/blue.jpg

Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

...................1.gi
...................2.gi
...................3.gif
...................1.htm
...................2.htm

etc. etc.

How would I do this ? Sorting from the end of a line rather than the
beginning ? What method/program/utility would people here suggest
please ?

Regards, John.
Have you looked at cmsort? I haven't tried it myself, but it seems like
it could be a candidate.

http://www.chmaas.handshake.de/delphi/freeware/cmsort/cmsort.htm

HTH,
 
On Mon, 09 Aug 2004 16:07:52 +1000, John Fitzsimons asked acf:

[RHS-sort a log file word wise]

And now a solution with one of the must-have pipe utilities of my DOS-
collection. - LMod by Horst Schaeffer. You get it here:

http://home.mnet-online.de/horst.muc

The following command line should manage your sample even if there are
files without extension or multiple dots. It is a bit ;-) more complex
than it would have to be if both conditions never hit. But I'm sure you
can simplify yourself. Be careful when you copy and modify the command
line. Even a few spaces which look superfluous at first glance are
important!

lmod /l* /b/ "[$!]. " [] <test.txt | lmod /l* /b. "[$!]"[] | sort | lmod /l* [$3:$!] > result.txt

Everything should be on one line. The first character bevor the first
lmod is just a wrap-override character. I don't if your Agent supports
it. So I thought better to mention it...

Okay. I think this should do the trick. Only drawback I see at the moment
is a slightly changed appearance of the output file if your log contains
dates with one and two digits for the day entry. If this is unbearable
with you - just modify the pipe further... ;-P

BeAr
 
Thanks. At least I now know I am not the first person in the world who
wanted to sort a text file list from the R.H.S. :-)

Regards, John.

You are not sorting from the R.H.S. This can be described in several
ways but you apparently want to find the last(!) period in the line
and sort on the remainder of the line (primary sort) and from the
beginning of the line to the aforementioned period (secondary sort).

Sorting from the R.H.S. yields:

3.gif
1.gi
2.gi
1.htm
2.htm

(Added leading blanks to align the R.H.S. for emphasis only.)

Fortunately your example provided the data for the counter example.

As mentioned, the OS provides that feature in the special case of
filenames. If the use of filenames was just for illustration then you
need additional software (i.e., sort utility). Many applications will
allow you to do that such as a scripting language with both character
functions and sort set (i.e., across lines), a spreadsheet, or a
database.

I'm surprised that no has mentioned using one of the *nix toolsets --
unless I just scanned past it.

John, do you have your answer or do you need a package?

BillR
 
John said:
Hi,

Suppose I have the following text details :

9/Aug/04 02:47: /~johnf/web15.jpg
9/Aug/04 02:02: /~johnf/knight.gif
9/Aug/04 01:40: /~johnf/spirits.htm
9/Aug/04 02:02: /~johnf/welcome.htm
9/Aug/04 02:47: /~johnf/chspirit.htm
9/Aug/04 02:20: /~johnf/ouija.htm
9/Aug/04 02:02: /~johnf/world2.jpg
9/Aug/04 02:02: /~johnf/blue.jpg

Suppose I want to sort things so that all the .gifs, htm etc. files
are together ? Or perhaps together and sorted in each category eg.

...................1.gi
...................2.gi
...................3.gif
...................1.htm
...................2.htm

etc. etc.

How would I do this ? Sorting from the end of a line rather than the
beginning ? What method/program/utility would people here suggest
please ?

Regards, John.
TED will sort text any way you like. Only 86 kB and available from
http://jsimlo.sk/notepad/
HTH,
 
Back
Top