Check string language

  • Thread starter Thread starter Cylix
  • Start date Start date
C

Cylix

Is there any existing method in VB.NET or any 3-third party function
can find out the language in a string?
Let say, isChinese? isFrench?
 
Is there any existing method in VB.NET or any 3-third party function
can find out the language in a string?
Let say, isChinese? isFrench?

Don't know a direct library, but there may be an indirect method:
Create some of simple text files for some specific languages of the
equilavent of your string via Babelfish.altavista.com or Google
Translation, then put each language's strings into each text file.

Then under your project you can compare the any string with language
within these language-specific text files which you want to know.

Hope this helps.
 
Oups, looks like Kimiraikkonen understood much better (you don't want to
place the computer current language setting in a string but to check in
which language could be written a particular string of text ?)...

Don't know about 3rd party but :

- you could first quickly check based on letters i.e. if you see a
particular unicode charset it could give a first indication (for example if
you have chinese characters, cyrillic characters, latin characters it could
give a first clue).

- if you have then characters that are used in a fair number of languages
you could :
- either check based on basic words frequency (for example "this",
"that" ,"the", "it", etc... is more likely frequent in English)
- also letter frequency could be perhaps an indication

You may want to give some more details. For example on a web site, you could
also have lang attributes that are supposed to tell in which language is a
particular page/section (for use by screen readers for example). If you get
some text from there, you could then also check this info...
 
Thanks Patrice anyway.

Actually, the method mention above is a large project that I cannot
effort.
My problem is quite simple, I a string from the email subject.
It may include three types of charater, english letter, chinese
letter, others.
I would like to trim the string have english letters and chinese
letters only.
 
Back
Top