code page

  • Thread starter Thread starter Carol
  • Start date Start date
C

Carol

how I force the import wizard to use the ASCII code page rather than
UNICODE.

When importing a text file, the program freezes for a long time trying to
parse the file as unicode.
It may happen because there is a unicode character in there somewhere.
Still, I want to ignore that and treat the file as pure ascii.
 
Hi Carol,

Click the Advanced... button in the text import wizard. You can set the
code page there.
 
Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)



yes, but the thing locks up before I can get to that.
that is why i need to force it.
 
NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

John Nurick said:
Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unico
de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)
 
One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/odbcjetschema_ini_file.asp

Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.



NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.
 
hey thanks!!!!!!!!!!!!!!!!!!!

John Nurick said:
One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/odbcjetschema_ini_file.asp

Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.



NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

John Nurick said:
Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unic o
de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)




yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can set the
code page there.

 
Hi Carol,

I've had a look at the file. A terminal CRLF normally isn't a problem,
and I reckon the trouble is being caused by all the null bytes in the
fields CHKDT and CLEARDT. They are the only thing that could cause it to
be mistaken for Unicode.

Access 2002 couldn't understand your original file at all, but when I
got rid of the nulls with a find-and-replace in a text editor it
imported just fine.

I guess the nulls are the result of a problem in the EBCDIC->ANSI
conversion. If that can't be fixed it's not difficult to strip them out
before you import the file, using a little script or program in your
language of choice. If Perl is installed on your machine, here's the one
I used:

while (<>) {
#dispose of null bytes and trailing spaces in text fields
s/\x00+| +(?=")//g;

#remove quote marks and spaces from numeric fields
#so they import as numeric
s/" *(\d[\d.]+) *"/\1/g;

print;
}



hey thanks!!!!!!!!!!!!!!!!!!!

John Nurick said:
One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/odbcjetschema_ini_file.asp

Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.



NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unic
o
de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)




yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can set the
code page there.
 
John, thanks for your help!!
I believe you are right about the whole thing.
Thank you for the Perl script.
I think the CHKDT and CLEARDT are the problem.
Could this be a case where I need to offset the begin points as I 've heard
in this thread? It is strange that I am not get any values for CLEARDT.
Maybe my start point is wrong.
How would I possibly solve that? Trial and error? I guess that the starting
point depends on the field precending CHKDT andCLEARDT .

Second, are nulls the same thing as spaces?
i thought I was going from EBCDIC->ASCII not EBCDIC->ANSI. I obviously don;t
know the difference, and don't expect you to explain it all to me.
Thank you for your time, John





John Nurick said:
Hi Carol,

I've had a look at the file. A terminal CRLF normally isn't a problem,
and I reckon the trouble is being caused by all the null bytes in the
fields CHKDT and CLEARDT. They are the only thing that could cause it to
be mistaken for Unicode.

Access 2002 couldn't understand your original file at all, but when I
got rid of the nulls with a find-and-replace in a text editor it
imported just fine.

I guess the nulls are the result of a problem in the EBCDIC->ANSI
conversion. If that can't be fixed it's not difficult to strip them out
before you import the file, using a little script or program in your
language of choice. If Perl is installed on your machine, here's the one
I used:

while (<>) {
#dispose of null bytes and trailing spaces in text fields
s/\x00+| +(?=")//g;

#remove quote marks and spaces from numeric fields
#so they import as numeric
s/" *(\d[\d.]+) *"/\1/g;

print;
}



(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/uni c
o
de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)



yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can
set
the
code page there.

how I force the import wizard to use the ASCII code page rather than
UNICODE.

When importing a text file, the program freezes for a long time trying
to
parse the file as unicode.
It may happen because there is a unicode character in there somewhere.
Still, I want to ignore that and treat the file as pure ascii.
 
I guess I mean 'slack bytes' when I say offset points.


Carol said:
John, thanks for your help!!
I believe you are right about the whole thing.
Thank you for the Perl script.
I think the CHKDT and CLEARDT are the problem.
Could this be a case where I need to offset the begin points as I 've heard
in this thread? It is strange that I am not get any values for CLEARDT.
Maybe my start point is wrong.
How would I possibly solve that? Trial and error? I guess that the starting
point depends on the field precending CHKDT andCLEARDT .

Second, are nulls the same thing as spaces?
i thought I was going from EBCDIC->ASCII not EBCDIC->ANSI. I obviously don;t
know the difference, and don't expect you to explain it all to me.
Thank you for your time, John





John Nurick said:
Hi Carol,

I've had a look at the file. A terminal CRLF normally isn't a problem,
and I reckon the trouble is being caused by all the null bytes in the
fields CHKDT and CLEARDT. They are the only thing that could cause it to
be mistaken for Unicode.

Access 2002 couldn't understand your original file at all, but when I
got rid of the nulls with a find-and-replace in a text editor it
imported just fine.

I guess the nulls are the result of a problem in the EBCDIC->ANSI
conversion. If that can't be fixed it's not difficult to strip them out
before you import the file, using a little script or program in your
language of choice. If Perl is installed on your machine, here's the one
I used:

while (<>) {
#dispose of null bytes and trailing spaces in text fields
s/\x00+| +(?=")//g;

#remove quote marks and spaces from numeric fields
#so they import as numeric
s/" *(\d[\d.]+) *"/\1/g;

print;
}



hey thanks!!!!!!!!!!!!!!!!!!!

One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/o
dbcjetschema_ini_file.asp
Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.




NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to
identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first
two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.
c
o
de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a
far-eastern
version of Windows or Office?)



yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can set
the
code page there.

how I force the import wizard to use the ASCII code page rather
than
UNICODE.

When importing a text file, the program freezes for a long time
trying
to
parse the file as unicode.
It may happen because there is a unicode character in there
somewhere.
Still, I want to ignore that and treat the file as pure ascii.
 
Carol,

I'm glad we seem to have worked it out.

Null bytes aren't the same as spaces. A space is ASCII hex 20 (decimal
32), and a null is ASCII 0. (A Null value in a database field is
something else again, a sort of non-value that represents the fact that
the actual value of the thingy that the field represents is unknown.)

As for ASCII and ANSI: if the file contains no accented characters,
line-drawing characters or the like, and the computer was made in the
last 20 years or so the difference is academic.

I've never used the software you mentioned - was it Novastor? Your
theory may be right: I can imagine a program inserting null bytes to
fill in gaps left by incorrect specification of the starting positions
and sizes of the COBOL fields. One place to check that is in CHKDT,
which seems to contain a yyyy-mm-dd date string followed by a null byte,
while the other date field came across fine. So maybe you've done
something that sets the width of the output field as 11 rather than 10,
so the program is padding it out with a null.

John, thanks for your help!!
I believe you are right about the whole thing.
Thank you for the Perl script.
I think the CHKDT and CLEARDT are the problem.
Could this be a case where I need to offset the begin points as I 've heard
in this thread? It is strange that I am not get any values for CLEARDT.
Maybe my start point is wrong.
How would I possibly solve that? Trial and error? I guess that the starting
point depends on the field precending CHKDT andCLEARDT .

Second, are nulls the same thing as spaces?
i thought I was going from EBCDIC->ASCII not EBCDIC->ANSI. I obviously don;t
know the difference, and don't expect you to explain it all to me.
Thank you for your time, John





John Nurick said:
Hi Carol,

I've had a look at the file. A terminal CRLF normally isn't a problem,
and I reckon the trouble is being caused by all the null bytes in the
fields CHKDT and CLEARDT. They are the only thing that could cause it to
be mistaken for Unicode.

Access 2002 couldn't understand your original file at all, but when I
got rid of the nulls with a find-and-replace in a text editor it
imported just fine.

I guess the nulls are the result of a problem in the EBCDIC->ANSI
conversion. If that can't be fixed it's not difficult to strip them out
before you import the file, using a little script or program in your
language of choice. If Perl is installed on your machine, here's the one
I used:

while (<>) {
#dispose of null bytes and trailing spaces in text fields
s/\x00+| +(?=")//g;

#remove quote marks and spaces from numeric fields
#so they import as numeric
s/" *(\d[\d.]+) *"/\1/g;

print;
}



hey thanks!!!!!!!!!!!!!!!!!!!

One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/o dbcjetschema_ini_file.asp

Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.




NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to
identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first
two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/uni c
o
de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a
far-eastern
version of Windows or Office?)



yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can set
the
code page there.

how I force the import wizard to use the ASCII code page rather
than
UNICODE.

When importing a text file, the program freezes for a long time
trying
to
parse the file as unicode.
It may happen because there is a unicode character in there
somewhere.
Still, I want to ignore that and treat the file as pure ascii.
 
Back
Top