Thai text rendering issue in browser (line breaks)

B

Brett

Hello All,

I am having an issue whereby some online content which is being
translated into Thai, is having line-breaks incorrectly placed in the
middle of some words when rendered. I can't read Thai, but a
translator is telling me they are having to put <BR>'s all through the
content in order for it to render correctly. They are using IE only
(v6.0).

I have tried changing the charset to utf-8 in the <meta> tag, and
tried playing around with the browser's Languages and Encoding
settings via the menu.

I know for a fact that Firefox and IE render the same Thai text
differently (I have a hunch Firefox does it properly) - I can see this
by resizing down the same page in both browsers and watching
when/where it breaks the line of text.

Has anyone heard of this/seen it before? Is there a solution that
doesn't involve having to insert forced line breaks (<br>)?

Cheers,

Brett.
 
J

Jukka K. Korpela

I am having an issue whereby some online content which is being
translated into Thai, is having line-breaks incorrectly placed in
the middle of some words when rendered.

To make you realistically tuned, I'll mention that line breaking issues
(regarding Unicode in general, and even more so for HTML) are a
horrendous mess.
I can't read Thai, but a
translator is telling me they are having to put <BR>'s all through
the content in order for it to render correctly.

Sounds like they have a fairly interesting definition for "correctly".
I think I see the point though: if you cannot affect automatic line
breaking, you may feel forced to use forced line breaks. But then
you're moving fast towards the dark side of the Force.

If you had specified a sample URL, things would be much easier to
analyze.
They are using IE only (v6.0).

Either you have a problem in using Microsoft Internet Explorer
version 6, or you have a problem in authoring for the World Wide Web.
So crossposting was pointless. I have trimmed followups to the
c.i.w.a.h. group, on the grounds that you should never restrict
yourself to a particular browser unless you really have to.
(What happens when Company Policy decides that IE be banned due to
security holes, effective yesterday noon? Pointy-haired bosses do make
such decisions, and this is far from the silliest things they decide.)
I have tried changing the charset to utf-8 in the <meta> tag, and
tried playing around with the browser's Languages and Encoding
settings via the menu.

Why don't you stop playing an post the URL? Yes, the URL. Do not bother
posting snippets of code as people so often do. They _don't_ tell what
the real HTTP headers are. Besides, you should not play with charset
unless you know what your charset (character encoding) _is_. Do you?
I know for a fact that Firefox and IE render the same Thai text
differently (I have a hunch Firefox does it properly)

It would be no big surprise that IE gets character issues wrong.

I have a very limited understanding of Thai writing, but I think it is
basically a script where you don't use spaces between words. Instead,
rendering software is supposed to do line breaking according to
syllable boundaries. Recognizing those boundaries is nontrivial, so
maybe Firefox can do it (to some extent at least) and IE cannot.

This might be a case where you might use nonstandard <wbr> markup to
suggest line breaking opportunities to simplistic browsers. It's better
than <br> since <wbr> does not _force_ line break, it simply _allows_
line break.

However I would expect that people who author in Thai have addressed
the problem somehow. The pages I've found about HTML authoring in Thai
have looked rather limited and old (last update in 1998 or so). Good
luck.
 
A

Alan J. Flavell

I am having an issue whereby some online content which is being
translated into Thai, is having line-breaks incorrectly placed in the
middle of some words when rendered. I can't read Thai,

I can't either, but I'm pretty sure this topic came up before,

Yup, look for the subject line "Zero width space still unsafe?"
from December 2004.
but a translator is telling me they are having to put <BR>'s all
through the content in order for it to render correctly.

Urk. A very fragile move.
They are using IE only (v6.0).

Even if we go along with that, we'll also need to ask which OS it is
and whether they installed the relevant "Regional" options. Since, as
far as I can see, browser and OS work together to get this to work (if
it's going to work at all).
I have tried changing the charset to utf-8 in the <meta> tag,

Please - this is a sensitive area. Even if you *know* what you're
doing, it takes quite a bit of study to get things right. Merely
playing around changing things at random isn't likely to be
productive.
tried playing around with the browser's Languages and Encoding
settings via the menu.

No, the languages setting in the browser menu only tells the web
server what languages you are willing to accept (in the event that
it implements language negotiation). It does /not/ affect the
rendering.
I know for a fact that Firefox and IE render the same Thai text
differently (I have a hunch Firefox does it properly)

No big surprises there, then...
Has anyone heard of this/seen it before? Is there a solution that
doesn't involve having to insert forced line breaks (<br>)?

Can we see an actual URL of one of these problematical documents?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top