[OT] Plurlize() Method

  • Thread starter Thread starter Mythran
  • Start date Start date
M

Mythran

This is off-topic slightly and not language dependant. But still, I would
like some pointers or suggestions regarding a pluralize method. I have a
method which accepts a string and I have it returning the plural form of the
specified string (where the specified string is a single word or word
combination). I know these types of methods won't work 100% of the time,
but a good majority of the words should work.

Any suggestions or pointers on if I'm missing anything to the following
logic would be appreciated:

If the word is null or empty, return the word (null or empty).
If the length of the word is 1 character, return the character + "s".
If the entire word is upper-case:
If the word ends in an "s", return the word + "S".
If the word ends in "is", remove the "is" and return the remaining portion
of the word + "es".
If the word ends with "s", "z", "x", "ch", or "sh", return the word + "es".
If the word ends in "y":
If the word ends with an "ay", "ey", "iy", "oy", or "uy", return the
word + "s".
Otherwise, remove the last character and return the remaining word +
"ies".

Last case scenario, if the above conditions do NOT match the specified word,
return the word + "s".

Thanks,
Mythran
 
This is off-topic slightly and not language dependant.

Sounds like it's pretty dependent on English to me....


(Couldn't resist.)


What you described looks pretty good. Like you said, it won't catch
everything, but it's a good start.

Will you have a table for irregular words (mouse, goose, hyphenated stuff
like "brother-in-law")?
 
Mythran said:
This is off-topic slightly and not language dependant. But still, I
would like some pointers or suggestions regarding a pluralize method. I
have a method which accepts a string and I have it returning the plural
form of the specified string (where the specified string is a single
word or word combination). I know these types of methods won't work
100% of the time, but a good majority of the words should work.

Any suggestions or pointers on if I'm missing anything to the following
logic would be appreciated:

If the word is null or empty, return the word (null or empty).
If the length of the word is 1 character, return the character + "s".
If the entire word is upper-case:
If the word ends in an "s", return the word + "S".
If the word ends in "is", remove the "is" and return the remaining
portion of the word + "es".
If the word ends with "s", "z", "x", "ch", or "sh", return the word + "es".
If the word ends in "y":
If the word ends with an "ay", "ey", "iy", "oy", or "uy", return the
word + "s".
Otherwise, remove the last character and return the remaining word +
"ies".

Last case scenario, if the above conditions do NOT match the specified
word, return the word + "s".

Thanks,
Mythran

Aren't there some words that are unchanged in plural form?
 
Mythran said:
This is off-topic slightly and not language dependant. But still, I
would like some pointers or suggestions regarding a pluralize method. I
have a method which accepts a string and I have it returning the plural
form of the specified string (where the specified string is a single
word or word combination). I know these types of methods won't work
100% of the time, but a good majority of the words should work.
Well, it all depends on what you consider "a good majority". As long as you
always give someone an opportunity to correct the output, you can always
tweak your algorithms. If it's intended to produce fully automated output,
you should definitely be more thorough and pick a well-tested library or
list. Getting a list also allows you to test your algorithm for accuracy, so
you can make a trade-off between speed and accuracy.
Any suggestions or pointers on if I'm missing anything to the following
logic would be appreciated:

If the word is null or empty, return the word (null or empty).
If the length of the word is 1 character, return the character + "s".

Always remember to dot your i's and cross your t's.
If the entire word is upper-case:
If the word ends in an "s", return the word + "S".

The United States have the Inland Revenue Service, and other nations have
their own IRSes.
If the word ends in "is", remove the "is" and return the remaining
portion of the word + "es".

I was looking for the guy who cleared out the debris. My obvious suspect
(the one leaning against the stripped chassis) had many alibis, but if
you've got more than one you're obviously already in trouble.
If the word ends with "s", "z", "x", "ch", or "sh", return the word + "es".

I watched my favorite miniseries yesterday, and after that a quiz. I learned
(and later scribbled down in my codices) that Scotland has many lochs, and
that there aren't many Bachs to be found among the Dutch or Welsh.
If the word ends in "y":
If the word ends with an "ay", "ey", "iy", "oy", or "uy", return the
word + "s".

"To be or not to be", begins one of the most famous soliloquies.
Otherwise, remove the last character and return the remaining word +
"ies".
Many passersby where stumped by this one, but I got it in the end.
Last case scenario, if the above conditions do NOT match the specified
word, return the word + "s".
Aside from a few sheep, mice and goose, your flock seems to be in order.
Always keep an eye on the money, though.
 
Jeroen said:
Aside from a few sheep, mice and goose, your flock seems to be in order.

Well, just goes to show you that even humans make mistakes sometimes.
 
The United States have the Inland Revenue Service, and other nations have
their own IRSes.

Pssstt...INTERNAL Revenue Service.

Clearly it was declared that way so the rest of us couldn't see what's going
on....
 
Jeff said:
Pssstt...INTERNAL Revenue Service.
No clue where I got "inland" from. But if they ever want to sound less
secretive, I'd use that.
Clearly it was declared that way so the rest of us couldn't see what's going
on....
Somehow I don't think transparency would make it less painful. Or more just.
 
Mythran said:
This is off-topic slightly and not language dependant. But still, I would
like some pointers or suggestions regarding a pluralize method. I have a
method which accepts a string and I have it returning the plural form of
the specified string (where the specified string is a single word or word
combination). I know these types of methods won't work 100% of the time,
but a good majority of the words should work.

Any suggestions or pointers on if I'm missing anything to the following
logic would be appreciated:

If the word is null or empty, return the word (null or empty).
If the length of the word is 1 character, return the character + "s".
If the entire word is upper-case:
If the word ends in an "s", return the word + "S".
If the word ends in "is", remove the "is" and return the remaining portion
of the word + "es".
If the word ends with "s", "z", "x", "ch", or "sh", return the word +
"es".
If the word ends in "y":
If the word ends with an "ay", "ey", "iy", "oy", or "uy", return the
word + "s".
Otherwise, remove the last character and return the remaining word +
"ies".

Last case scenario, if the above conditions do NOT match the specified
word, return the word + "s".

Thanks,
Mythran


Test cases:

This -> Thes
Deer -> Deers
Air -> Airs
Compile -> Compiles

:-)
 
Family said:
This -> Thes
Deer -> Deers
Air -> Airs
Compile -> Compiles
"This" and "compile" aren't nouns; they'd presumably be excluded through
some other mechanism. "Deer" is a genuine problem. "Airs" is the right
plural of "air", though -- for those meanings of "air" that allow a plural.
 
"This" and "compile" aren't nouns; they'd presumably be excluded through
some other mechanism. "Deer" is a genuine problem. "Airs" is the right
plural of "air", though -- for those meanings of "air" that allow a
plural.

Ignore him Jeroen: he's just putting on airs....
 
Jeff Johnson said:
Sounds like it's pretty dependent on English to me....


(Couldn't resist.)


What you described looks pretty good. Like you said, it won't catch
everything, but it's a good start.

Will you have a table for irregular words (mouse, goose, hyphenated stuff
like "brother-in-law")?

This isn't supposed to be a catch all library. Just something for use to
use when we need a quick fix for singular->plural words for use in our
custom applications. Nothing major, so when it does pluralize the wrong
way, it won't cause all the nukes in the world to launch at the same time
(maybe a few seconds in-between launches).

No, I don't have a table for irregular words. Didn't want to go through all
the hassle of that. Too many irregular words in American English :P

Mythran
 
Göran Andersson said:
Aren't there some words that are unchanged in plural form?

A plethora!

Here are a few words that really break a lot of the rules:

pants (I believe this one derives from 2 pantaloons sewn together, but have
no O.S.D. for it to refer to...)
scissors (actually, scissor is the proper singular form, but we don't use a
single scissor, we use 2 scissors that normally form a single tool)
glasses (as in eye-wear)
data (already plural - plural for datum)
media (already plural - plural for medium)

When a noun ends with an 'f', remove 'f' and append ves...doesn't work so
well with roof or dwarf (rooves, dwarves).

(a lot of my information comes from multiple sources...the above comes from
http://grammar.ccc.commnet.edu/GRAMMAR/plurals.htm).

:)

Mythran
 
Jeff Johnson said:
Ignore him Jeroen: he's just putting on airs....

Actually I was putting on my errs... but that gives me another test: The
algorithm returns Airs -> Aires.

The point is, the original post did not state the hard part, which is
telling whether the string is a noun and can be pluralized. I've got to
think an easy and more accurate answer is to have a lookup table or dataset.
 
Actually I was putting on my errs...

Well, you're only human....
but that gives me another test: The algorithm returns Airs -> Aires.

....why? The only time he removes the end of a word is when it ends in "is."
"Airs" doesn't fit that bill. But then again, putting an already-plural word
through this algorithm will almost assuredly generate bad output.

As I see it, it would be Airs --> Airses. (Perhaps that's what it has in its
pocketses....)
The point is, the original post did not state the hard part, which is
telling whether the string is a noun and can be pluralized. I've got to
think an easy and more accurate answer is to have a lookup table or
dataset.

I think to really tell you've got to program an AI. I think I'll go sit in
my comfy chair while I wait for Mythran to cook that up.
 
Jeff Johnson said:
Y'know, it's not like they say "gooses" and "mices" in UK....

"...geeses....I want a goose that lays gold eggs for easter..." :P

I think that's the phrase....quick, which movie?

:)

Mythran
 
Jeff Johnson said:
I think to really tell you've got to program an AI. I think I'll go sit in
my comfy chair while I wait for Mythran to cook that up.

Done...this is Mythraneseses' perfect pluralization artificial intelligence
application, I get all nounses perfect every times'.

btw, there really isn't a surefire way of doing pluralization and I do
understand this...but something I noticed that every post here lacks to
mention is that some words are spelled the same when plural and when
singular but can have their spellings changed when they are plural or
singular in a different context. Context actually can change some plural
noun forms. :D When a noun can be both a "Count" or a "Noncount" noun
for instance. Taken from
http://owl.english.purdue.edu/handouts/esl/eslcount.html

Mythran
 
When a noun ends with an 'f', remove 'f' and append ves...doesn't work so
well with roof or dwarf (rooves, dwarves).

According to my dictionary, dwarves is an acceptable plural for
dwarf.

Chris
 
"...geeses....I want a goose that lays gold eggs for easter..." :P

I think that's the phrase....quick, which movie?

:)

Straying waaaay off-topic here, but the only thing that even comes close in
my mind is Willy Wonka (the original). Was that what you were shooting for?
(That's not the exact line, by the way.)
 
Back
Top