Regular Expressions

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I'm having trouble creating a regular expression to parse bits of data from a string and was hoping someone could lead me in the right direction. Consider the following string

423456 Victor Frankenstein, M.D. 04/04/200

I want to construct a new string to look like this

04/04/2004-423456-Frankenstei

My biggest problem is getting the last name, this is because there may or may not be something between the first and last name (ie. middle name, middle initial(with or without a period), and even multiple names)

If anyone has any ideas, I would really welcome your insight

Thanks
Mark
 
Mark,
I don't have a specific pattern to solve your problem.

I would start at the ends and work toward the middle. Parsing names always
seems to be problematic, what about names like Cher and Prince?

Can you safely use the last name before the date, except if there is a
degree?

I find both of the following sites invaluable when working with regular
expressions.

A tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/d...l/cpconRegularExpressionsLanguageElements.asp

Hope this helps
Jay


Mark said:
I'm having trouble creating a regular expression to parse bits of data
from a string and was hoping someone could lead me in the right direction.
Consider the following string:
423456 Victor Frankenstein, M.D. 04/04/2004

I want to construct a new string to look like this:

04/04/2004-423456-Frankenstein

My biggest problem is getting the last name, this is because there may or
may not be something between the first and last name (ie. middle name,
middle initial(with or without a period), and even multiple names).
 
Thanks for your reply.

To be more specific. There will always be a first and last name as well as the possibility of middle names or initials, and there will always be some sort of credential (ie, M.D.). There will always be a comma after the last name and there will always be a date after the credentials.

Thanks again,
Mark
 
Mark,
There will always be a comma after the last name
There is your key!

There will always be a comma after the last name, so your last name is a
name followed by a comma. Also my suggesting of starting at the ends, you
know its the last name as it is the name the furthest to the right...

Your "middle names" are simply zero or more names.

Thinking about it you may not even care about the comma specifically on the
last name, after yes, part of no.

Your pattern will need something like (not really reg ex):

number
first name
middle name*
last name
credential
date

Where the above is the pattern to match each of those sections of your input
string, I would probably use named groups (group name) to be able to pull
each section out of the match object. Between each section you can include
any delimiters that may be expected such as white space, commas & what have
you... (the comma after the last name). This helps convince the parser that
the last name is the last name & not a trailing middle name...

Hope this helps
Jay

Mark said:
Thanks for your reply.

To be more specific. There will always be a first and last name as well
as the possibility of middle names or initials, and there will always be
some sort of credential (ie, M.D.). There will always be a comma after the
last name and there will always be a date after the credentials.
 
Back
Top