name parser

  • Thread starter Thread starter mcnews
  • Start date Start date
I suggest to describe in more detail what exactly you want to archieve.
Input Output Output Output Output Output
------------------- ---------- ------ ------ --------- ------
Unparsed name Prefix First Middle Last Suffix
=================== ========== ====== ====== ========= ======
Smith Smith
Smith Sr. Smith Sr
Mrs. Smith Mrs Smith
Rev. Smith Jr. Rev Smith Jr
Mr. and Mrs. E.Jones Mr and Mrs E Jones
Mr. & Mrs. Bix, CPA Mr & Mrs Bix CPA
Wilson, Mr & Mrs Jim Mr & Mrs Jim Wilson
J. J.Johnson V J J Johnson V
Sir T. S. Eliot Sir T S Eliot
e e cummings, IV e e cummings IV
ee cummings ee cummings
Lt. Gen. C James Phd Lt Gen C James Phd
W.E.B. DuBois W E B DuBois
Du Pont, Jackie Jackie Du Pont
Clyde Smith-Jones Clyde Smith-Jones
Mike O'Donnell Mike O'Donnell
O Donnell, Mike Mike O Donnell
Jimmy Mac Donald Jimmy Mac Donald
Mr. A. E. Von Sturm Mr A E Von Sturm
Ms. Beverly D'Angelo Ms Beverly D'Angelo
 
parse names

As in turn johnsmith into "John Smith" or into "John" and "Smith"
seperately?

What exactly is the input? Firstname and lastname in a list like this:

johnsmith
johndoe
ericsmith
jackjackson

or just a long string like this
"johnjacksonericsmithjohndoephilliprosstaylor" and what would be the
correct output exactly? Are there any deliminators, do some records
have middle names as well? Is for parsing names from a document like
"dear john b. smith" and you want just "john" or do you want "john b".
do you want the full stop like this "John B.". Is it for the purpose
of Case Correction like this:

i heard that john smith isn't very well

into

i heard that John Smith isn't very well

--

The way you right "parse names" as if it's so obvious makes you look
obnoxious.

Phill
 
Input Output Output Output Output Output
------------------- ---------- ------ ------ --------- ------
Unparsed name Prefix First Middle Last Suffix
=================== ========== ====== ====== ========= ======
Smith Smith
Smith Sr. Smith Sr
Mrs. Smith Mrs Smith
Rev. Smith Jr. Rev Smith Jr
Mr. and Mrs. E.Jones Mr and Mrs E Jones
Mr. & Mrs. Bix, CPA Mr & Mrs Bix CPA
Wilson, Mr & Mrs Jim Mr & Mrs Jim Wilson
J. J.Johnson V J J Johnson V
Sir T. S. Eliot Sir T S Eliot
e e cummings, IV e e cummings IV
ee cummings ee cummings
Lt. Gen. C James Phd Lt Gen C James Phd
W.E.B. DuBois W E B DuBois
Du Pont, Jackie Jackie Du Pont
Clyde Smith-Jones Clyde Smith-Jones
Mike O'Donnell Mike O'Donnell
O Donnell, Mike Mike O Donnell
Jimmy Mac Donald Jimmy Mac Donald
Mr. A. E. Von Sturm Mr A E Von Sturm
Ms. Beverly D'Angelo Ms Beverly D'Angelo

There's a program called "MatchIT" which has a built in names database
as well as fuzzy matching for increased accuracy. Fuzzy matching is
generally considered better than standard logical for this job so if
anyone suggested any algorithm it would probably be a fuzzy language
(so not VB).

http://www.printsoft.co.uk/web/products_matchit.htm

It has an API accessable to VB.NET that you can use if want to use the
solution or, if it's a one off cleaning you need you might find it
easier to just forward your records to a data cleaning company like
CCR Data (http://www.ccr.co.uk/) who can do a one off clean using
these types of tools for a one off charge. Just email them a sample of
the records and the total number and they'll email back and give you a
cost.

The fuzzy algorithms are largely available in the public domain
although implementing them and tweaking them isn't exactly an easy
task.

Phill
 
As in turn johnsmith into "John Smith" or into "John" and "Smith"
seperately?

What exactly is the input? Firstname and lastname in a list like this:

johnsmith
johndoe
ericsmith
jackjackson

or just a long string like this
"johnjacksonericsmithjohndoephilliprosstaylor" and what would be the
correct output exactly? Are there any deliminators, do some records
have middle names as well? Is for parsing names from a document like
"dear john b. smith" and you want just "john" or do you want "john b".
do you want the full stop like this "John B.". Is it for the purpose
of Case Correction like this:

i heard that john smith isn't very well

into

i heard that John Smith isn't very well
i know.
i am obnoxious.

anyway, then ames will always be lastname, firstname, middle init (no
period).
unless they don't have a mddle init.
i am not sure about titles such as dr. or esq. just yet.
 
Back
Top