B
beginwithl
Hi
I sincerely apologize for making so many question in one thread, but
at least this way if you decide not to answer, there won’t be 10
useless threads floating around. Anyways, I started learning about
file I/O, code tables, encodings etc and man, this stuff is a bit
overwhelming
1)
a) With "Encoding.Default" you retrieve system’s default code page.
But if windows has numerous code pages, then what exactly would
default page be, meaning where ( or in what apps ) does windows use
this default page over other code pages?
b) Can code page support Unicode coded character set, but may use
different encoding than Unicode does ( Unicode set uses three
encodings - UTF-8, UTF-16 and UTF-32 )?
c)
* Are there also 8-bit code pages which use Unicode character
encoding, and thus have only 255 code points matched to characters?
* Can these code pages also use UTF-16 or UTF-32 encoding?
* Are there also code pages that support more than 255, but less than
2^16 code points?
2)
From MSDN site:
“StreamWriter defaults to using an instance of UTF8Encoding unless
specified otherwise. This instance of UTF8Encoding is constructed such
that the Encoding.GetPreamble method returns the Unicode byte order
mark written in UTF-8. The preamble of the encoding is added to a
stream when you are not appending to an existing stream. This means
any text file you create with StreamWriter will have three byte order
marks at its beginning."
As far as I understand, the above text suggests that preamble should
be added by default, but I’d say that’s not true?!
3) I noticed there are only four classes derived from Encoding class
( ASCIIEncoding, UTF8Encoding, UnicodeEncoding and UTF7Encoding ).What
if you want to use some other, non-unicode encoding?
4)
a)
From MSDN site:
“StreamWriter Constructor (Stream, Encoding)
If you specify something other than Encoding.Default, the byte order
mark (BOM) is written to the file.”
But BOM should only be added when using one of Unicode encodings, thus
why would BOM be added if you specify non-Unicode encoding?
b) “Since the Unicode byte order mark character is not found in any
code page, it disappears if data is converted to ANSI. Unlike other
Unicode characters, it is not replaced by a default character when it
is converted. If a byte order mark is found in the middle of a file,
it is not interpreted as a Unicode character and has no effect on text
output.”
Well, since at least some (ANSI) code pages do have glyphs for
characters at code points FF and FE, I assume above text implies that
apps ( using non-unicode code pages ) reading such a file would
understand that FE FF sequence represents BOM and thus should ignore
it?
In other words, it is up to app ( using non-Unicode code page )
reading such a file to realize that FE FF sequence should be ignored?!
5)
a) “Internally, the .NET Framework stores text as Unicode UTF-16.”
I assume that the above quote is only referring to String objects and
char variables using UTF-16 encoding, or is there some other text
which is also stored as UTF-16?
b) Ignoring the fact that FE FF sequence identifies the type of
encoding, does U+FEFF also represent a character ( outside the context
of encoding )?
6)
Say app1 ( running on PC1 ) and app2 ( running on PC2 ) communicate
via network using TCP/IP protocol. PC1 uses little endian-order, while
PC2 uses big-endian order. Now, I know we send information over TCP/IP
( and networks in general ) using big-endian order, but:
a) But does only data in the packet’s header uses this byte order,
while application data is sent just as it is, without reversing its
byte order ( assuming this data is sent over the network by PC1 )?
b) If so, then if PC1 sends some .exe file to PC2, then how will PC2
know whether it came from little endian-machine and thus should
reverse bytes before trying to load this .exe file?
Thank you
I sincerely apologize for making so many question in one thread, but
at least this way if you decide not to answer, there won’t be 10
useless threads floating around. Anyways, I started learning about
file I/O, code tables, encodings etc and man, this stuff is a bit
overwhelming
1)
a) With "Encoding.Default" you retrieve system’s default code page.
But if windows has numerous code pages, then what exactly would
default page be, meaning where ( or in what apps ) does windows use
this default page over other code pages?
b) Can code page support Unicode coded character set, but may use
different encoding than Unicode does ( Unicode set uses three
encodings - UTF-8, UTF-16 and UTF-32 )?
c)
* Are there also 8-bit code pages which use Unicode character
encoding, and thus have only 255 code points matched to characters?
* Can these code pages also use UTF-16 or UTF-32 encoding?
* Are there also code pages that support more than 255, but less than
2^16 code points?
2)
From MSDN site:
“StreamWriter defaults to using an instance of UTF8Encoding unless
specified otherwise. This instance of UTF8Encoding is constructed such
that the Encoding.GetPreamble method returns the Unicode byte order
mark written in UTF-8. The preamble of the encoding is added to a
stream when you are not appending to an existing stream. This means
any text file you create with StreamWriter will have three byte order
marks at its beginning."
As far as I understand, the above text suggests that preamble should
be added by default, but I’d say that’s not true?!
3) I noticed there are only four classes derived from Encoding class
( ASCIIEncoding, UTF8Encoding, UnicodeEncoding and UTF7Encoding ).What
if you want to use some other, non-unicode encoding?
4)
a)
From MSDN site:
“StreamWriter Constructor (Stream, Encoding)
If you specify something other than Encoding.Default, the byte order
mark (BOM) is written to the file.”
But BOM should only be added when using one of Unicode encodings, thus
why would BOM be added if you specify non-Unicode encoding?
b) “Since the Unicode byte order mark character is not found in any
code page, it disappears if data is converted to ANSI. Unlike other
Unicode characters, it is not replaced by a default character when it
is converted. If a byte order mark is found in the middle of a file,
it is not interpreted as a Unicode character and has no effect on text
output.”
Well, since at least some (ANSI) code pages do have glyphs for
characters at code points FF and FE, I assume above text implies that
apps ( using non-unicode code pages ) reading such a file would
understand that FE FF sequence represents BOM and thus should ignore
it?
In other words, it is up to app ( using non-Unicode code page )
reading such a file to realize that FE FF sequence should be ignored?!
5)
a) “Internally, the .NET Framework stores text as Unicode UTF-16.”
I assume that the above quote is only referring to String objects and
char variables using UTF-16 encoding, or is there some other text
which is also stored as UTF-16?
b) Ignoring the fact that FE FF sequence identifies the type of
encoding, does U+FEFF also represent a character ( outside the context
of encoding )?
6)
Say app1 ( running on PC1 ) and app2 ( running on PC2 ) communicate
via network using TCP/IP protocol. PC1 uses little endian-order, while
PC2 uses big-endian order. Now, I know we send information over TCP/IP
( and networks in general ) using big-endian order, but:
a) But does only data in the packet’s header uses this byte order,
while application data is sent just as it is, without reversing its
byte order ( assuming this data is sent over the network by PC1 )?
b) If so, then if PC1 sends some .exe file to PC2, then how will PC2
know whether it came from little endian-machine and thus should
reverse bytes before trying to load this .exe file?
Thank you