Extracting words from a sentence.

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Given a string that represents a sentence, is there a quick way of extracting
the words that make up the sentence? For example, given a string "See Dick
run." it would return an array of strings {"See","Dick","run"} One catch is
that it has to be locale sensitive.

I thought about just removing all the punctuation, but that leaves problems
like the word "don't". I don't want to end up with two words, "don" and "t".
 
Hi David,

Use Regular expressions:

Regex re = new Regex(@"\b\w*\b");
MatchCollection mc = re.Matches("See Dick Run");
foreach(Match m in mc)
{
Console.WriteLine(m.Value);
}

(I didn't test the code, but it *should* be mostly correct.

HTH,
Bill Priess MCP
 
You could also use System.String.Split() passing the ' ' char as the
argument.
so something like this:
string s = "See Dick Run";
string[] a = s.Split(' ');
 
Thanks for the help, Bill and Nick.

I tried split(), but it had some issues. In particular, given "See Dick
run." the third word that it returned would be "run.", where what I wanted
was "run". (The period isn't part of the word.)

The function I'm looking for would actually have to be quite sophisticated,
because the application allows the user to select a language, and so the
function would have to work with whatever rules were appropriate for the
culture of the string. I can come up with the correct rules for English and
French, but after that, I get a bit lost.

I was hoping that maybe the wizards who did all this .Net Framework
globalization stuff had already tackled the problem. It was a long shot, but
sometimes I am amazed at what is buried in there, so I was hoping.

I think I'll have to use the regular expression route, which I hadn't
thought of before, and count on user feedback to improve the situation for
languages that don't seem to work.
 
Back
Top