Normalizing street addresses

  • Thread starter Thread starter tshad
  • Start date Start date
T

tshad

We are doing a project that needs to normalize addresses. Is there anything
out there that does that already?

For example,

We need to look at addresses, such as:

1025 Fourth Street
1025 Fouth St
1025 Forth ST
1025 Forth Strt

or

1015 Marcy Lane
1015 Marcy Ln

Where we can determine if the address is the same.

Thanks,

Tom
 
We are doing a project that needs to normalize addresses. Is there
anything out there that does that already?

You mean for free? I doubt it. Look into a commercial solution. CASS
standardizing isn't easy.
 
tshad said:
We are doing a project that needs to normalize addresses. Is there anything
out there that does that already?

For example,

We need to look at addresses, such as:

1025 Fourth Street
1025 Fouth St
1025 Forth ST
1025 Forth Strt

or

1015 Marcy Lane
1015 Marcy Ln

Where we can determine if the address is the same.

Thanks,

Tom

We went with ArcGIS, but try this site:
http://www.usps.com/webtools/address.htm.

Mike
 
Family Tree Mike said:
We went with ArcGIS, but try this site:
http://www.usps.com/webtools/address.htm.

I am going to look at that. but it may be more that we are looking for. I
am mainly just trying to get a converter that would convert part of an
address that has street, lane, place, suite etc and find the correct word
for the possible abbreviations.

I can do it myself using a couple of tables and a generic list. But I would
need to manually put in all the possible combinations and was hoping there
was an easier way that I could just add to my code.

Thanks,

Tom
 
I am going to look at that. but it may be more that we are looking
for. I am mainly just trying to get a converter that would convert
part of an address that has street, lane, place, suite etc and find
the correct word for the possible abbreviations.

I can do it myself using a couple of tables and a generic list. But
I would need to manually put in all the possible combinations and was
hoping there was an easier way that I could just add to my code.

You can get lists of data from the USPS and other US government
organizations. These lists of data are suitable for most uses, and can
easily be parsed into a format usable by your application. You can use
the data in one of two ways:

* Translate them into a partial class containing only the data
in the tables, or
* Translate them into an embedded database format that you deploy
with your application (probably the best) and interface with that
data file with a single class.

The latter is probably best, especially if you're likely to do things
like update the ZIP code database with a commercial one from a vendor
like zip-codes.com. You can use SQLite[1] or the built-in Windows
VFP/FoxPro database driver, though you'll probably have to generate
the database table files yourself that way (I don't think Windows'
ODBC drivers for xBase/DBF lets you create new database tables using
its API). You could even use the data files themselves, converted to
CSV, but that may lend itself to modification by end-users depending on
how technical they are(n't).

Here are some of the files that would be relevant to your effort:

Street Type Common and Standard Abbreviations
http://www.usps.com/ncsc/lookups/abbr_suffix.txt

State Abbreviations, including "Military States"
http://www.usps.com/ncsc/lookups/abbr_state.txt

ZIP code data from the US Census (CSV, w/ lat+long data),
contains ~29,500 ZIP codes, which is most of them. No
military states, either, so use with validation/confirmation
in the user interface for data retrieved from it.
http://www.census.gov/tiger/tms/gazetteer/zips.txt

This is a good start.

Note also that you can find all sorts of similar data tables and the
like on the Internet by searching Google with some uncommon
parameters. For example, I found these files with two different
queries:

USPS abbreviations filetype:txt
zip code list filetype:txt

HTH,
Mike
 
Michael B. Trausch said:
I am going to look at that. but it may be more that we are looking
for. I am mainly just trying to get a converter that would convert
part of an address that has street, lane, place, suite etc and find
the correct word for the possible abbreviations.

I can do it myself using a couple of tables and a generic list. But
I would need to manually put in all the possible combinations and was
hoping there was an easier way that I could just add to my code.

You can get lists of data from the USPS and other US government
organizations. These lists of data are suitable for most uses, and can
easily be parsed into a format usable by your application. You can use
the data in one of two ways:

* Translate them into a partial class containing only the data
in the tables, or
* Translate them into an embedded database format that you deploy
with your application (probably the best) and interface with that
data file with a single class.

The latter is probably best, especially if you're likely to do things
like update the ZIP code database with a commercial one from a vendor
like zip-codes.com. You can use SQLite[1] or the built-in Windows
VFP/FoxPro database driver, though you'll probably have to generate
the database table files yourself that way (I don't think Windows'
ODBC drivers for xBase/DBF lets you create new database tables using
its API). You could even use the data files themselves, converted to
CSV, but that may lend itself to modification by end-users depending on
how technical they are(n't).

Here are some of the files that would be relevant to your effort:

Street Type Common and Standard Abbreviations
http://www.usps.com/ncsc/lookups/abbr_suffix.txt

These are exactly what I was looking for. I can move these into a couple of
tables easily.

I am not sure what the difference between Primary Street Suffix Name and
Postal Standard Suffix Abreviation is since you can use either.

Primary Commonly Postal Service
Street Used Street Standard
Suffix Suffix or Suffix
Name Abbreviation Abbreviation

ALLEY ALLEE ALY
ALLEY ALLEY ALY

But it does what I need.

State Abbreviations, including "Military States"
http://www.usps.com/ncsc/lookups/abbr_state.txt

ZIP code data from the US Census (CSV, w/ lat+long data),
contains ~29,500 ZIP codes, which is most of them. No
military states, either, so use with validation/confirmation
in the user interface for data retrieved from it.
http://www.census.gov/tiger/tms/gazetteer/zips.txt

This is a good start.

Note also that you can find all sorts of similar data tables and the
like on the Internet by searching Google with some uncommon
parameters. For example, I found these files with two different
queries:

USPS abbreviations filetype:txt
zip code list filetype:txt

What does filetype.txt do?

Thanks,

Tom
 
These are exactly what I was looking for. I can move these into a
couple of tables easily.

I am not sure what the difference between Primary Street Suffix Name
and Postal Standard Suffix Abreviation is since you can use either.

Primary Commonly Postal Service
Street Used Street Standard
Suffix Suffix or Suffix
Name Abbreviation Abbreviation

ALLEY ALLEE ALY
ALLEY ALLEY ALY

But it does what I need.

The table is intended to be human-readable and can be printed out by
people that like to put such materials on the sides of their
cubicles. :-) When you parse it and convert it into a format for your
application to use, of course, you'll normalize that data a little bit
so that you've one Primary Street Suffix and one Postal Service
Standard Suffix to many Commonly Used Street Suffixes.

If your application prints addresses for mailing, you should use the
standard suffix abbreviation, whereas if you are displaying it for it
to be read by a human, you'd use the primary street suffix name.
What does filetype.txt do?

filetype:NNN tells Google that you only want files that have the NNN
extension, where NNN can be txt, pdf, csv, c, cs, py, or any other
plain text or document-oriented format that Google can index.

--- Mike
 
If you have the house number + postal code then you can define uniqueness by
removing all spaces, converting to uppercase on both items and checking for
all addresses with the HouseNumber==convertedHouseNumber &&
PostalCode==convertedPostalCode.
 
Back
Top