sort.exe

  • Thread starter Thread starter Les
  • Start date Start date
L

Les

Sort.exe sorts flat files in a DOS type mode.

Older versions strictly counted bytes from the beginning
of each record to the sort key, producing very good
results.

My flat files have mixed Japanese/English. Sort is
counting the Japnaese characters, which take two physical
bytes in the file, as one byte, meaning that a key past
the Japanese characters varies in its location depending
on the number of Japanese characters, spaces, English
characters, etc prior to the key.

This is very bad news!

Does anyone have a recommendation?
 
Les said:
Sort.exe sorts flat files in a DOS type mode.

Older versions strictly counted bytes from the beginning
of each record to the sort key, producing very good
results.

My flat files have mixed Japanese/English. Sort is
counting the Japnaese characters, which take two physical
bytes in the file, as one byte, meaning that a key past
the Japanese characters varies in its location depending
on the number of Japanese characters, spaces, English
characters, etc prior to the key.

This is very bad news!

Does anyone have a recommendation?

I don't know details of sort, but sounds like you've discovered a
limitation with the simple sort program that's been around since the
early days of DOS. With your skills, sounds like you could handle
loading Perl (www.perl.org) or Python (www.python.org) onto your
computer and using these lanuages to write a what would be a very simple
program (couple lines at most, but you could extend to the extent you
wish) do do what you want. They surely have the flexibility to enable
you to sort as you want, and you may be able to find standard
modules/libraries available (esp for Perl via CPAN) that handle Japanese
characters.
 
a.. Using the /l locale command-line option
Currently, the only alternative to the default locale is the "C" locale, which is faster than natural language sorting and sorts characters according to their binary encodings.
 
Back
Top