Unicode characters in identifiers

  • Thread starter Thread starter R.Kaiser
  • Start date Start date
R

R.Kaiser

Where can I find which Unicode characters are valid for identifiers in
Visual C++ 2005?

Thanks
Richard
 
R.Kaiser said:
Where can I find which Unicode characters are valid for identifiers in
Visual C++ 2005?

According to the VC++ documentation:

http://msdn2.microsoft.com/en-us/library/565w213d.aspx

The following characters are valid as the first character of a name:

_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

And the following characters, in addition to the above, are valid as the
second or subsequent character:

0 1 2 3 4 5 6 7 8 9

Plus, the $ character is also valid as a MS extension.

-cd
 
According to the VC++ documentation:
...

More accurate definition according to the spec is:

2.10 Identifiers [lex.name]
identifier:
nondigit
identifier nondigit
identifier digit

nondigit: one of
universal-character-name
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

digit: one of
0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters and digits. Each
universal-character-name in an identifier shall designate a character whose
encoding in ISO 10646 falls into one of the ranges specified in Annex E.
Upper- and lower-case letters are different. All characters are significant.

2 In addition, some identifiers are reserved for use by C++ implementations
and standard libraries (17.4.3.1.2) and shall not be used otherwise; no
diagnostic is required.

This means that national characters can be a part of a name. I've tested
Russian and Hebrew characters. It's worked.
 
Thanks Carl and Vladimir, what you are listing are the valid characters
for Standard C++ identifiers.

But in Windows Forms Applications, Visual C++ 2005 also accepts
extensions, like German Umlauts in ordinary identifiers

int Zähler;

or names like

àÀáÁãÃçÇéÉêÊÍíóÓúÚüÜ

for names of Windows controls. I could use this expression as a value
for the Name property of e.g. a Button by setting it in the Properties
window.

The rules seem to be quite complicated:

int Zähler; // valid
int Z$hler; // valid
int Zä$hler; // error
int xàÀáÁãÃçÇéÉêÊÍíóÓúÚüÜ; // valid
int àÀáÁãÃçÇéÉêÊÍíóÓúÚüÜ; // error

Despite intensive search, I could find no reference for the valid
characters of identifiers in VS2005. I would expect, that certain Arabic
and Asian characters are also valid.

Richard Kaiser
 
Vladimir Nesterovsky said:
According to the VC++ documentation:
...

More accurate definition according to the spec is:

2.10 Identifiers [lex.name]
identifier:
nondigit
identifier nondigit
identifier digit

nondigit: one of
universal-character-name
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

digit: one of
0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters and digits. Each
universal-character-name in an identifier shall designate a character whose
encoding in ISO 10646 falls into one of the ranges specified in Annex E.
Upper- and lower-case letters are different. All characters are significant.

2 In addition, some identifiers are reserved for use by C++ implementations
and standard libraries (17.4.3.1.2) and shall not be used otherwise; no
diagnostic is required.

This means that national characters can be a part of a name. I've tested
Russian and Hebrew characters. It's worked.

How do you save the file? as utf-8 or utf-16, with unicode marker ?

--PA
 
Vladimir Nesterovsky said:
According to the VC++ documentation:
...

More accurate definition according to the spec is:

2.10 Identifiers [lex.name]
identifier:
nondigit
identifier nondigit
identifier digit

nondigit: one of
universal-character-name
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

digit: one of
0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters and digits.
Each universal-character-name in an identifier shall designate a character
whose encoding in ISO 10646 falls into one of the ranges specified in
Annex E. Upper- and lower-case letters are different. All characters are
significant.

2 In addition, some identifiers are reserved for use by C++
implementations and standard libraries (17.4.3.1.2) and shall not be used
otherwise; no diagnostic is required.

Yes, I know what the C++ standard says - I was simply quoting what the VC++
2005 documentation says. I rather suspected that the documentation was
wrong, since I know there was a bunch of work done in the 7.0 compiler to
support unicode source files.
This means that national characters can be a part of a name. I've tested
Russian and Hebrew characters. It's worked.

Good to know, thanks.

-cd
 
R.Kaiser said:
Thanks Carl and Vladimir, what you are listing are the valid characters
for Standard C++ identifiers.

But in Windows Forms Applications, Visual C++ 2005 also accepts
extensions, like German Umlauts in ordinary identifiers

Those are also valid according to the C++ standard, and apparently according
to VC++ (but not according to the VC++ documentation - go figure).
int Zähler;

or names like

àÀáÁãÃçÇéÉêÊÍíóÓúÚüÜ

for names of Windows controls. I could use this expression as a value for
the Name property of e.g. a Button by setting it in the Properties window.

The rules seem to be quite complicated:

int Zähler; // valid
int Z$hler; // valid
int Zä$hler; // error
int xàÀáÁãÃçÇéÉêÊÍíóÓúÚüÜ; // valid
int àÀáÁãÃçÇéÉêÊÍíóÓúÚüÜ; // error

Despite intensive search, I could find no reference for the valid
characters of identifiers in VS2005. I would expect, that certain Arabic
and Asian characters are also valid.

The cases you cite that don't work are interesting. What source file
encoding were you using? I can imagine that if the source file was, for
example, Latin-8 but the compiler concluded (incorrectly) that it was UTF-8,
that certain pairs of characters would end up being illegal.

I would expect that only a Unicode encoding would be safe for files with
non-ASCII characters, but I haven't done any experimenting in that area
myself.

-cd
 
Thanks Vladimir,

I have overlooked the reference to Annex E in the standard.

Richard


Vladimir said:
According to the VC++ documentation:
...

More accurate definition according to the spec is:

2.10 Identifiers [lex.name]
identifier:
nondigit
identifier nondigit
identifier digit

nondigit: one of
universal-character-name
_ a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

digit: one of
0 1 2 3 4 5 6 7 8 9

1 An identifier is an arbitrarily long sequence of letters and digits. Each
universal-character-name in an identifier shall designate a character whose
encoding in ISO 10646 falls into one of the ranges specified in Annex E.
Upper- and lower-case letters are different. All characters are significant.

2 In addition, some identifiers are reserved for use by C++ implementations
and standard libraries (17.4.3.1.2) and shall not be used otherwise; no
diagnostic is required.

This means that national characters can be a part of a name. I've tested
Russian and Hebrew characters. It's worked.
 
Carl said:
Those are also valid according to the C++ standard, and apparently according
to VC++ (but not according to the VC++ documentation - go figure).


The cases you cite that don't work are interesting. What source file
encoding were you using?

I used

Western European (Windows) - Codepage 1252

Richard


I can imagine that if the source file was, for
 
Back
Top