Size of int (once again, sorry)

Agoston Bejo · Sep 30, 2004

Hi, sorry about the multiple posting, technical difficulties....

-----

What does exactly the size of the int datatype depends in C++?
Recenlty I've heard that it depends on the machine's type, i.e. on 16-bit
machines it's 16 bit, on 32-bit machines it's 32 etc.
Is this true? Is this true for _all_ C++ compilers?

Sigurd Stenersen · Sep 30, 2004

Agoston said:
What does exactly the size of the int datatype depends in C++?
Recenlty I've heard that it depends on the machine's type, i.e. on
16-bit machines it's 16 bit, on 32-bit machines it's 32 etc.
Is this true? Is this true for _all_ C++ compilers?

No, that's wrong. If you run a 16-bit compiler on a 32-bit computer, int
will typically be 16 bits.

In other words, it's up to the compiler to decide the size of an int.

Arnaud Debaene · Sep 30, 2004

Agoston Bejo said:
Hi, sorry about the multiple posting, technical difficulties....

-----

What does exactly the size of the int datatype depends in C++?
Recenlty I've heard that it depends on the machine's type, i.e. on 16-bit
machines it's 16 bit, on 32-bit machines it's 32 etc.
Is this true? Is this true for _all_ C++ compilers?

An int is defined as being the "natural size" for the target machine,
so a 32 bit compiler would have 32 bits ints, and so on.

Arnaud
MVP - VC

Doug Harrison [MVP] · Sep 30, 2004

Agoston said:
Hi, sorry about the multiple posting, technical difficulties....

-----

What does exactly the size of the int datatype depends in C++?
Recenlty I've heard that it depends on the machine's type, i.e. on 16-bit
machines it's 16 bit, on 32-bit machines it's 32 etc.
Is this true? Is this true for _all_ C++ compilers?

The following relationship holds for the four basic integer types:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

There are also minimum ranges:

char >= 8 bits
short >= 16 bits
int >= 16 bits
long >= 32 bits

Certain parts of the standard library become impossible to implement
correctly if sizeof(char) == sizeof(int), so on most systems, sizeof(char) <
sizeof(int). See <limits.h> for lots of interesting macros which describe
the integer types for a given system.

Now, to answer your question, int is intended to map to the "natural" word
size for the target architecture, and it's intended to be the first type you
reach for when you need an integer type. So, a compiler targeting a 16 bit
machine will typically define int as 16 bits, a compiler targeting a 32 bit
machine will define int as 32 bits, and so on. Well, almost. A 16 bit int
really is too small in many cases, but a 32 bit int is usually large enough.
So to avoid wasting space with a 64 bit int, not to mention breaking
programs that assume 32 bit int, compilers for 64-bit Windows break with
tradition and keep int 32 bits. I think that's a reasonable decision, but
one has to hope 64-bit CPU designers keep 32 bit ints nice and efficient.
(It would certainly be in their interest to do so.

Walter Briscoe · Sep 30, 2004

In message <[email protected]> of Thu, 30 Sep

Certain parts of the standard library become impossible to implement
correctly if sizeof(char) == sizeof(int), so on most systems, sizeof(char) <
sizeof(int). See <limits.h> for lots of interesting macros which describe
the integer types for a given system.

What parts and why?

Doug Harrison [MVP] · Sep 30, 2004

Walter said:
In message <[email protected]> of Thu, 30 Sep

What parts and why?

Off the top of my head, fgetc and the <ctype.h> functions. They need to be
able to distinguish EOF from char values, when stored in int. IOW, there
needs to be an int value that doesn't correspond to any char value. (I think
that might also be true for the C++ char_traits<char> specialization, but
there you probably aren't restricted to just int; any standard integer type
larger than char would do. I'd have to double-check the standard to be sure
about that, though.)

Doug Harrison [MVP] · Sep 30, 2004

Note: Below I've replaced "char" with "unsigned char", which makes it right.

Off the top of my head, fgetc and the <ctype.h> functions. They need to be
able to distinguish EOF from unsigned char values, when stored in int. IOW, there
needs to be an int value that doesn't correspond to any unsigned char value.

An even better way to put it is this. In order for fgetc and <ctype.h> to
work right, int has to be able to faithfully hold all values of unsigned
char, plus EOF. Since all bit patterns of unsigned char are valid unsigned
char values, this means sizeof(int) has to be greater than sizeof(char).

It's even messier with char_traits<char>, so I didn't try to fix that part,
but the same consideration applies to its char_type and int_type types. See
the subthread starting here for more on the aforementioned messiness:

http://groups.google.com/[email protected]

Someone has recently proposed at least a partial way to clean this up:

http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#467

Lucas Galfaso · Oct 2, 2004

Hi,

Doug Harrison said:
(...) Since all bit patterns of unsigned char are valid unsigned
char values, this means sizeof(int) has to be greater than sizeof(char).

I believe that is somehow wrong, because C++ does not specify that every
value that you can put into an unsigned char, must correspond to a
character.
I fact, I used to work with a C++ compiler that sizeof(char) == sizeof(long)
and char was 64 bits long.

Lucas/

Doug Harrison [MVP] · Oct 2, 2004

Lucas said:
Hi,

I believe that is somehow wrong, because C++ does not specify that every
value that you can put into an unsigned char, must correspond to a
character.

I'm talking about unsigned char, not "characters", in the context of fgetc

and said:
I fact, I used to work with a C++ compiler that sizeof(char) == sizeof(long)
and char was 64 bits long.

Let me give you an example. For that compiler, we can use the relationship I
posted earlier to deduce that sizeof(int) is the same as sizeof(char) and
sizeof(long). The macro EOF is some integer constant expression, typically
an int equal to -1. For the sake of argument, let's assume the usual two's
complement signed integer representation. Then EOF has the representation
with all bits set. Now consider the definition of fgetc. It reads unsigned
chars and returns them as ints. All bit patterns of unsigned char are valid,
so unless int can represent all of them, we've got a problem. But two's
complement signed ints can do this, so that's not an issue. The problem is
that fgetc returns EOF when it reaches end of file or encounters an error,
and EOF is -1, which has all bits set in two's complement. Thus, there's no
way to distinguish an EOF return value from an unsigned char in the file
that had all bits set. So, like I said, a compiler that wants to implement
fgetc needs sizeof(int) > sizeof(char).

I'm curious how the compiler you described dealt with this. Perhaps it was a
freestanding implementation, one that doesn't supply all the standard
library, in particular the problematic <ctype.h> and <stdio.h>.

Lucas Galfaso · Oct 2, 2004

Doug Harrison said:
I'm talking about unsigned char, not "characters", in the context of fgetc

Let me give you an example. For that compiler, we can use the relationship
I
posted earlier to deduce that sizeof(int) is the same as sizeof(char) and
sizeof(long). The macro EOF is some integer constant expression, typically
an int equal to -1. For the sake of argument, let's assume the usual two's
complement signed integer representation. Then EOF has the representation
with all bits set. Now consider the definition of fgetc. It reads unsigned
chars and returns them as ints. All bit patterns of unsigned char are
valid,
so unless int can represent all of them, we've got a problem. But two's
complement signed ints can do this, so that's not an issue. The problem is
that fgetc returns EOF when it reaches end of file or encounters an error,
and EOF is -1, which has all bits set in two's complement. Thus, there's
no
way to distinguish an EOF return value from an unsigned char in the file
that had all bits set. So, like I said, a compiler that wants to implement
fgetc needs sizeof(int) > sizeof(char).

I'm curious how the compiler you described dealt with this. Perhaps it was
a
freestanding implementation, one that doesn't supply all the standard
library, in particular the problematic <ctype.h> and <stdio.h>.

I am not 100% sure (this was +5 years ago) but I think we do not have fgetc,
just the C++ libs. Anyway the lack of this library does not make the
compiler not standard.

Doug Harrison [MVP] · Oct 2, 2004

Lucas said:
I am not 100% sure (this was +5 years ago) but I think we do not have fgetc,
just the C++ libs. Anyway the lack of this library does not make the
compiler not standard.

True. Like I said:

<q>
Certain parts of the standard library become impossible to implement
correctly if sizeof(char) == sizeof(int), so on most systems, sizeof(char) <
sizeof(int).
</q>

<q>
So, like I said, a compiler that wants to implement
fgetc needs sizeof(int) > sizeof(char).

I'm curious how the compiler you described dealt with this. Perhaps it was a
freestanding implementation, one that doesn't supply all the standard
library, in particular the problematic <ctype.h> and <stdio.h>.
</q>

Those headers are required by a hosted C++ implementation, i.e. what people
normally think of as Standard C++, but not by freestanding implementations,
which you might encounter in embedded environments.

Bo Persson · Oct 2, 2004

Doug Harrison said:
I'm talking about unsigned char, not "characters", in the context of
fgetc

Let me give you an example. For that compiler, we can use the
relationship I
posted earlier to deduce that sizeof(int) is the same as sizeof(char)
and
sizeof(long). The macro EOF is some integer constant expression,
typically
an int equal to -1. For the sake of argument, let's assume the usual
two's
complement signed integer representation. Then EOF has the
representation
with all bits set. Now consider the definition of fgetc. It reads
unsigned
chars and returns them as ints. All bit patterns of unsigned char are
valid,
so unless int can represent all of them, we've got a problem. But
two's
complement signed ints can do this, so that's not an issue. The
problem is
that fgetc returns EOF when it reaches end of file or encounters an
error,
and EOF is -1, which has all bits set in two's complement. Thus,
there's no
way to distinguish an EOF return value from an unsigned char in the
file
that had all bits set. So, like I said, a compiler that wants to
implement
fgetc needs sizeof(int) > sizeof(char).

No, that is not a requirement. If the char type is large enough (like
CHAR_BITS == 32), it can't possibly use all bit patterns for valid
characters. One pattern can then be reserved for EOF.

That is exactly what we generally do for wchar_t and WEOF.

Bo Persson

Doug Harrison [MVP] · Oct 2, 2004

Bo said:
No, that is not a requirement.

Yes, it is.

If the char type is large enough (like
CHAR_BITS == 32), it can't possibly use all bit patterns for valid
characters. One pattern can then be reserved for EOF.

Please, review the documentation for fgetc and <ctype.h>. They deal in
unsigned char cast to int. (And why would plain char be restricted as you
describe? What if plain char is unsigned?)

That is exactly what we generally do for wchar_t and WEOF.

I'm not familiar with those details so won't comment.

Bo Persson · Oct 3, 2004

Doug Harrison said:
Yes, it is.

Please, review the documentation for fgetc and <ctype.h>. They deal in
unsigned char cast to int. (And why would plain char be restricted as
you
describe? What if plain char is unsigned?)

But if char, unsigned char, and int are all the same size (like 32
bits), not all values will be used for characters so there is room for
reserving one value for EOF.

I'm not familiar with those details so won't comment.

Ok, that is C++ where wide the character type wchar_t uses the value
wchar_t(-1) as the end-of-file signal WEOF for wide streams.

Bo Persson

Doug Harrison [MVP] · Oct 3, 2004

Bo said:
But if char, unsigned char, and int are all the same size (like 32
bits)

Note that char and unsigned char are always the same size. For any integer
type X, signed X and unsigned X are the same size, and for the type char,
plain char is a distinct type otherwise implemented the same as signed char
or unsigned char.

not all values will be used for characters so there is room for
reserving one value for EOF.

There's no basis for saying that.

At any rate, it's irrelevant for fgetc and <ctype.h>. Did you review their
documentation? Here's a proof of what I've been saying:

1. fgetc reads unsigned chars and returns them as ints.
2. For character types, all bits in the object representation participate in
the value representation.
3. All bit patterns of unsigned char are valid numbers.
4. Thus, if unsigned char and int are the same size, unsigned char will
necessarily use up all the values of int.
5. Therefore, there is no int value left over to represent EOF.
6. One cannot implement fgetc, <ctype.h>, and a handful of other library
functions if char and int are the same size, because the int value EOF
cannot be distinguished from a valid unsigned char represented by int, and
those functions require that distinction.

Ok, that is C++ where wide the character type wchar_t uses the value
wchar_t(-1) as the end-of-file signal WEOF for wide streams.

Actually, WEOF appears in the Standard C header <wchar.h>, and it has the
type wint_t, which is not necessarily wchar_t:

http://www.lysator.liu.se/c/na1.html
<q>
typedef ... wint_t;
An integral type unchanged by integral promotion. It must be capable of
holding every valid wide character, and also the value WEOF (described
below). It can be the same type as wchar_t.

WEOF
is an objectlike macro which evaluates to a constant expression of type
wint_t. It need not be negative nor equal EOF, but it serves the same
purpose: the value, which must not be a valid wide character, is used to
represent an end of file or as an error indication.
</q>

This is explicitly spelled out compared to the situation with unsigned char,
int, EOF, fgetc, and <ctype.h>, for which you have to synthesize the
constraints from more fundamental properties.

Size of int (once again, sorry)

Agoston Bejo

Sigurd Stenersen

Arnaud Debaene

Doug Harrison [MVP]

Walter Briscoe

Doug Harrison [MVP]

Doug Harrison [MVP]

Lucas Galfaso

Doug Harrison [MVP]

Lucas Galfaso

Doug Harrison [MVP]

Bo Persson

Doug Harrison [MVP]

Bo Persson

Doug Harrison [MVP]