R
Roshawn Dawson
Hi,
I'm a complete newbie when using regular expressions, so forgive me if my delimma sounds stupid.
I have an ASP.NET app that utilizes url rewriting. It's a simple bookstore that allows users to
view books by author. The author's name is just a url encoded link tag. In my observation, author
names can be quite tricky. Therefore I've written a regular expression for each possible format the
author names could be in. Below is an example of author names with the regular expression included:
J.K. Rowling -- (\w\.\w\.\+\w+)
Dr. Martin Luther King Jr. -- (\w{2}\.\+\w+\+\w+\+\w+\+\w{2})
Nora Roberts -- (\w+\+\w+)
Thomas L. Friedman -- (\w+\+\w\.\+\w+)
Mark Victor Hansen -- (\w+\+\w+\+\w+)
T. Harv Eker -- (\w\.\+\w+\+\w+)
George R. R. Martin -- (\w+\+\w\.\+\w\.\+\w+)
Nigel Da Costa Lewis -- (\w+\+\w+\+\w+\+\w+)
Amiira Ruotola-Behrendt -- (\w+\+\w+-\w+)
Kristie J. Nelson-Neuhaus -- (\w+\+\w\.\+\w+-\w+)
I'm aware that some of them work as I've tested them, but others (the latter four) are difficult.
For instance, George R. R. Martin could be written without a space between the two initials. And
having dashes (-) in the names present a challenge as well.
Without desiring to write each regular expression as a separate expression, I put them together
using OR logic using the pipe symbol (|). So the entire thing would be like this:
(\w\.\w\.\+\w+|\w{2}\.\+\w+\+\w+\+\w+\+\w{2}|\w+\+\w+|\w+\+\w\.\+\w+)
and so forth.
Since I'm a newbie, I'm bound to be doing something in an inefficient manner. Can the regular
expressions I've shown be improved. I'd like to know how.
Thanks
Roshawn
I'm a complete newbie when using regular expressions, so forgive me if my delimma sounds stupid.
I have an ASP.NET app that utilizes url rewriting. It's a simple bookstore that allows users to
view books by author. The author's name is just a url encoded link tag. In my observation, author
names can be quite tricky. Therefore I've written a regular expression for each possible format the
author names could be in. Below is an example of author names with the regular expression included:
J.K. Rowling -- (\w\.\w\.\+\w+)
Dr. Martin Luther King Jr. -- (\w{2}\.\+\w+\+\w+\+\w+\+\w{2})
Nora Roberts -- (\w+\+\w+)
Thomas L. Friedman -- (\w+\+\w\.\+\w+)
Mark Victor Hansen -- (\w+\+\w+\+\w+)
T. Harv Eker -- (\w\.\+\w+\+\w+)
George R. R. Martin -- (\w+\+\w\.\+\w\.\+\w+)
Nigel Da Costa Lewis -- (\w+\+\w+\+\w+\+\w+)
Amiira Ruotola-Behrendt -- (\w+\+\w+-\w+)
Kristie J. Nelson-Neuhaus -- (\w+\+\w\.\+\w+-\w+)
I'm aware that some of them work as I've tested them, but others (the latter four) are difficult.
For instance, George R. R. Martin could be written without a space between the two initials. And
having dashes (-) in the names present a challenge as well.
Without desiring to write each regular expression as a separate expression, I put them together
using OR logic using the pipe symbol (|). So the entire thing would be like this:
(\w\.\w\.\+\w+|\w{2}\.\+\w+\+\w+\+\w+\+\w{2}|\w+\+\w+|\w+\+\w\.\+\w+)
and so forth.
Since I'm a newbie, I'm bound to be doing something in an inefficient manner. Can the regular
expressions I've shown be improved. I'd like to know how.
Thanks
Roshawn