In very great summary, I think that there is simply something that I
just don't comprehend about the way .NET does it, even after looking at
the documentation and the like and that there is probably quite simply
a large difference at how we view doing things. I'd like to seek to
correct my own apparent deficiency here, and it is to your credit that
I've learned about it.
Apologies about the rather long length of the post, ahead of time, and
further apologies if my apparent inability to "get it" is frustrating.
I am trying to understand what the major differences are as well as
what it is I don't get.
Quite on the contrary. non-breaking whitespace always shows like
a space. Only that it prevents a line break there.
You use it (for instance) between Mac OS and X so that you
don't end up with Mac OS
X
Oops. I meant to say zero-width non-breaking whitespace. I missed the
most critical part.
http://www.gnu.org/software/gettext/FAQ.html#nonascii_strings
"Short answer: If you want your program to be useful to other people,
then don't use accented characters (or other non-ASCII characters) in
string literals in the source code."
...
"So, in summary, there is no way to make accented characters in string
literals work in C/C++."
I'd wonder how long it's been since that was written, since no current
system that I am aware of is restricted to ASCII or ISO-8859-xx only
character sets---even embedded devices these days are able to render
Unicode characters pretty easily.
And it shows, sorry to say.
Anyone can be expected to be ignorant in a field in which they're not a
full-time-plus participant---I'm no exception to that rule. That having
been said, I do continually attempt to learn as much as possible to make
future modifications easier, be that internationalizing an application I
write, or me coming back later to modify the application. This
includes a11y, i18n, UI design, development techniques, etc.
Exactly!
And gettext was created in the open source world,
where a geek writes the software, another geek translates it,
and nother geek will use it.
I have to disagree on the idea that F/OSS is by geeks and for geeks...
I know many non-geeks that are quite happy to use it. That's really
neither here nor there, though. gettext was designed for the "small
tools" way of thinking---using a suite of small tools that are designed
to interoperate with other tools (known/unknown, past/present/future)
and exceed the original developers intended design. There's nearly 40
years of utilities out there designed in this way. Some people do
think that the tools are too narrowly-aimed, but that's precisely part
of the point, is to have very little overlap. This means that
functionality provided by older, already existing tools (e.g.,
mass-editing multiple text files, or generating multiple text files
that fit a pattern, etc.) won't be reimplemented in a newer tool that
follows that philosophy. It's a way of thinking, to be sure, but it's
not restricted to geeks.
Once you get out of that world, things start breaking.
A geek writes the software, a linguist (professional tranlsator)
will localize it, and a total non-geek will use it.
Then small things like gender, case, number, etc. start to look
unprofesional (think "all your bases are belong to us"
It'd seem that perhaps---among other things---I don't quite see how
using an arbitrary key to look-up a string helps productivity at all.
If you're working on the code for a project, and what you're looking at
is:
string windowTitle = stringsResourceMgr.GetString("dialog.print");
Then it'd seem you have to go and look that up in whatever language
you're working in. Most programmers work in English, so they'd have to
go search the English language resource file and look for the string.
It'd seem that having something like:
string windowTitle = "Document Print Settings";
.... would eliminate that right away.
The other solution would be to put the string in a comment near where
the string is referenced from the resource it's loaded from, but that
can very easily be out of date since comments are often less maintained
than code. It seems to me that there is more room for error that way
due to it being less straightforward.
However, I'm not sure that I could come up with anything any better
than gettext or the way that .NET handles it natively or the way that
Java handles it natively without a very large, significant amount of
thought. It's not an easy problem to solve, and there may never be an
ideal or perfect situation---and it seems to me that there are
tradeoffs depending on which ones you pick. I suppose what it comes
down to then is deciding what trade-offs are acceptable for a given
project.
I agree here.
But for things to go right, it means that developers would also have
to be well-practiced in thinking how something will behave in 30
languages, most of them totaly unfamiliar.
Impossible.
This is why you need good libraries/tools.
Well, maybe just the 5 most widely varied languages. It'd be somewhat
redundant to know both French and Spanish for this purpose, but it'd
help to know, say, Latin (limited word roots available), Esperanto
(same thing, but more adaptable, often used in computerized translation
software as an intermediate language for it's "meeting in the middle"
between many languages and constant regularity), French or Spanish, and
one or two other languages that work entirely differently from the
above. As a fall back, having people close by that speak English and
one or more of those other languages would be a decent help, too. But,
there will _always_ be quirks, and there's never going to be a day and
age where translations will always be literal between languages.
Have you ever been involved in translating a medium-size software
into more than 10 languages?
No, but I have been involved in medium-to-large size software wherein
anywhere from hundreds to thousands of files had to be processed in a
similar fashion. The example here was fixing the English keys, in
gettext's case. In that case, you're going to be making the exact same
transformation on every single file, and this is a simple for loop at
the shell, using sed or awk to do the substitution. To make it easier,
the three or so commands can be wrapped into a shell script.
Speaking a foreign language (or 2, or 4, or even 10) does not make one
a g11n/i18n/l10n expert. It helps, but it is not enough.
If I made that implication, I didn't mean to; sorry. Most of the work
of globalizing an application can be done by a reasonably seasoned,
non-arrogant programmer that is willing to learn where s/he must not
make assumptions---e.g., not assume that numbers are always formatted
with commas for thousands separators and decimal points for the
separation between the whole and fractional parts of a number, or
assuming that dates are printed the way they're used to, or that layout
issues will be static. It takes some learning to gain the ability to
do it, and a great deal of practice to stop making the assumptions that
we are inclined to make based on our past (usually somewhat narrow)
life experience.
Is my writing so good that I sound like a native English speaker?
I know this is not the case.
You are correct, but I think it's somewhat beside the point. You are
able to communicate in English, which at the very least gives us some
common ground; you've provoked a lot of thought on my end as well, and
for that I am truly appreciative.
I have friends that are doctors (the medical kind).
And they don't complain to me about crapy drugs, or procedures.
So try asking one of you expert friends about it.
See what they say. Maybe even tell them about some of my reasons.
While I've not queried everyone I know, I can say that the only things
that I've heard some of my friends gripe about is having to learn
multiple systems for the same task between different jobs.
Also, while I don't *personally* work in the field of i18n, I've also
learned quite a lot from them about writing programs that are easy to
internationalize, because they do also gripe about software that can't
be internationalized by them because the program is too rigidly
structured and makes too many culture-specific assumptions.
That's ok, comes and goes
But you seem to not want to listen.
No, I do want to listen. I want to more than listen, I want to
comprehend. Admittedly, I still don't fully understand why using an
arbitrary key is better, or how this saves work overall. I see other
issues with the way .NET does it from what I understand of the system
so far---though to be sure, I need to read more about it, despite
having read a lot on it. One such example: using assemblies to hold the
information, instead of a plain text file or generated hash table from
the plain text file.
An application front-end that is written in multiple language
environments but exposes a similar or identical interface can then no
longer share the same data, or must have a central repository of that
data which is fed to multiple code/file generators that work with each
i18n resource management system. Some of the largest companies I've
performed work for have such software, but _do_ use gettext because of
its ubiquity. While I can't name specifics, I can say that one
employer of 150,000+ employees uses gettext in its internal software
used by sales agents around the world for this very reason---they've
adopted it for (nearly) all of their software, including that which
wasn't previously internationalized and has existed since before
gettext was written.
Do you really think that all these guys using the key-value system
are all idiots? Between the MS guys that created the .rc files,
and the C# localization model, Sun with the Java properties files,
all of them?
No; I think that it's a very difficult problem domain where an ideal
solution hasn't yet been discovered. I think that there are many
tradeoffs between the various available systems, such as tying oneself
to a particular system or portability. Now, C# and .NET's way of
handling it is more portable than Java's, since a fully JIT-capable CLR
exists for more platforms (currently, though this may change since the
only fully-compliant Java VM can now be ported by the community),
though not still not quite as portable as gettext, which works just
about everywhere imaginable. That having been said, I do know that
Java---at least as of the last time I did any programming in it---didn't
have support for easy pluralization in its own system of
internationalization; I'd call that a decent oversight, myself.
Do you really think they are not aware of the "great invention"
that is gettext?
I'm not in for a religious debate; my initial recommendation was based
on the idea that it's more portable between languages, runtime
environments, and operating systems than the .NET standard means of
implementing internationalization. It'd seem that really, none of the
existing systems are perfect. Then again, I know of nothing that is
perfect in every way.
'gettext' was primarily designed for the "small-tools" way of doing
things, much like UNIX was designed, and much like most well-written
modular software is designed. You'd mentioned that gettext was a bad
tool for needing to be "fixed," but I don't think that revising strings
means that it needs to be fixed. On the contrary, I'd think using
arbitrary resource identifiers can potentially lead to translated
strings diverging from each other in meaning, unless there is something
grave that I am missing about how the .NET way of doing it works. I'll
wholeheartedly admit that I have a great deal more functional
experience with gettext than I do with anything else, since I use
software that uses it and sometimes make updates in gettext's resource
files for people when they don't have the technical know-how to do it
(say, they're contributing a single translation, and I happen to be
around to help them get it into a resource file).
gettext works in the "3 x geek" world. You move out of there,
you need quality. And gettext is not enough anymore.
I'm not entirely sure what you mean by the "'3 x geek' world".
I think we'll have to agree to disagree there, however; I'd argue that
the systems I use are typically of a much higher quality than most
proprietary commercial systems that I've used over the years---it's one
of the reasons that I, by and large, don't use proprietary commercial
systems unless it's a very specific requirement for a client. I don't
have time to deal with minutia like software crashes, required reboots,
BSODs, etc.---I need to be able to sit down, get my stuff done, and
go. I think that we at least agree on that, insofar as it probably
applies to both of us.
Given that what I use employs the small-tools philosophy (it means lots
of small, well-tested components as opposed to large, monolithic
time-bombs) it works well for me. Yes, there is the rare time that I
run into an extremely strange issue and need to think of a way to do it,
but 99% of the time, I can do it without writing any new software
whatsoever---I'd define that as quality. The system I use today is
built for end-users, power users and developers alike, and is very
flexible in that fashion---it was designed with the idea of being used
beyond its design, if that statement makes any sense. Being based on
nearly 40 years of time-tested principles of small, extensible tools
has its advantages. There will eventually be a major fundamental shift
in how things are done---possibly even including computers being able
to learn human languages like we can today teach them new programming
languages. Ah, that would be ideal. But we're not there yet.
What I do think is that somehow, for the moment, there's something that
I am not getting about the way .NET handles it. Comparing it with
gettext, it seems to be that either there is possibly something very
fundamental missing from the documentation, or possibly that I am
overlooking something pretty major.
I can at least say with a fair bit of certainty that there is a great
deal of room for growth still in giving application software the
ability to be easily used among a broad spectrum of users around the
globe. I think that an ideal system would be
language/environment/platform agnostic, easy-to-use, and provide the
ability to keep translations in sync with each other.
Incidentally, if you don't mind my asking, what do you think of the
interface for mass translations that Ubuntu uses
(
https://translations.launchpad.net/)? The application software that
uses it (which is by and large still only a minimal amount yet, since
the front-end is relatively new) is already able to handle
locale-specific things like the formatting of dates, times, currency,
etc., and is in most cases requiring translation into at least a
handful of the 268 languages available. I am curious as to your
thoughts on it, though; it appears to provide a combination of methods
to attempt to provide enough information for translation (far more than
Google does for its effort to internationalize its services, for
example) and also appears to support multiple translation libraries
that application software may use.
--- Mike