Process Address in String Variable

  • Thread starter Thread starter Slonocode
  • Start date Start date
S

Slonocode

Hello I am trying to display a USA address properly in a multiline textbox.
Unfortunately the address I must process is contained in a string variable
and the format is not uniform.

Examples:

John Smith, 12345 Main Street, Anytown, NY 98776
John Smith, 12345 Main Street, Anytown, NY, 98876
John Smith, c/o Jane Smith, 12345 Main Street, Anytown, NY 98876
John Smith, c/o Jane Smith, 12345 Main Street, Anytown, NY, 98876

As you may notice some have a comma after the state and some don't.
Also there are a variable number of commas before the comma between the city
and state.

The problem is that the original program that stores the address puts a
comma wherever there is a newline character in the address. Unfortunately I
have no control over how the addresses are saved by the original program so
I have to deal with this mess.

My problem is that I can't seem to isolate the comma between the city and
state. That comma obviously needs to remain while the rest need to be
replaced with newline characters.

Any ideas how I might attack this?
 
Oops I should probably show the result I'm looking for.

John Smith
12345 Main Street
Anytown, NY 98776


or

John Smith
c/o Jane Smith
12345 Main Street
Anytown, NY 98776
 
Hi Sloncode,
I am not from the USA so I only look at the problem it could be beans to for
me.
John Smith, 12345 Main Street, Anytown, NY 98776
John Smith, 12345 Main Street, Anytown, NY, 98876
John Smith, c/o Jane Smith, 12345 Main Street, Anytown, NY 98876
John Smith, c/o Jane Smith, 12345 Main Street, Anytown, NY, 98876
When I see it, I think every adres with 3 comma's is correct.

Then we have the problem of 4 comma types and the 5 comma types.

With the 5 commas we can replace the last comma with "" (we know the place
with lastindexof(",")

Then we have adresses which have after a split an array of 4 length and of 3
length, than it would be rather easy I think.

I did not put it in code because I think that is not the problem, if it is
message it, but I think this is quiet easy to do.

Cor
 
Cor said:
Hi Sloncode,
I am not from the USA so I only look at the problem it could be beans
to for me.
When I see it, I think every adres with 3 comma's is correct.

Then we have the problem of 4 comma types and the 5 comma types.

With the 5 commas we can replace the last comma with "" (we know the
place with lastindexof(",")

Then we have adresses which have after a split an array of 4 length
and of 3 length, than it would be rather easy I think.

I did not put it in code because I think that is not the problem, if
it is message it, but I think this is quiet easy to do.

Cor

Thanks for the reply Cor.

I should have put more examples:

1. John Smith, 12345 Main Street, Anytown, NY 98776
2. John Smith, 12345 Main Street, Anytown, NY, 98876
3. John Smith, c/o Jane Smith, 12345 Main Street, Anytown, NY 98876
4. John Smith, c/o Jane Smith, 12345 Main Street, Anytown, NY, 98876
5. John Smith, 12345 Main Street, Apt 8B, Anytown, NY 98876
6. John Smith, 12345 Main Street, Apt 8B, Anytown, NY, 98876
7. John Smith, c/o Jane Smith, 12345 Main Street, Apt 8B, Anytown, NY 98876
8. John Smith, c/o Jane Smith, 12345 Main Street, Apt 8B, Anytown, NY,
98876

I hope this is enough examples to show that I can't rely on counting the
commas.

Yes when there are only 3 commas I can proccess it easily but when there are
more I can't seem to isolate the comma between the city and state just by
counting them.
 
* "Slonocode said:
Hello I am trying to display a USA address properly in a multiline textbox.
Unfortunately the address I must process is contained in a string variable
and the format is not uniform.

Examples:

John Smith, 12345 Main Street, Anytown, NY 98776
John Smith, 12345 Main Street, Anytown, NY, 98876
John Smith, c/o Jane Smith, 12345 Main Street, Anytown, NY 98876
John Smith, c/o Jane Smith, 12345 Main Street, Anytown, NY, 98876

I think it's not easy to get the right result because I am nut sure if
there will be a unique representation. How whould the application know
if it should read

...
Anytown, NY
98876

or

...
Anytown
NY, 98876

Converting the data automatically may cause invalid results. Do you
know if the application that writes the file is able to restore all the
data correctly?
 
Hi Slonocode,

Working from the right hand side seems to be useful:

Every address ends with a space and a zip code.
Whip it off with LastIndexOf (" ").
Then Trim the end.
The it it EndsWith (",") remove that.
Then every address ends with a state code.
Whip it off. and add it to the zip code.

All other commas become new lines.

Regards,
Fergus
 
Herfried said:
I think it's not easy to get the right result because I am nut sure if
there will be a unique representation. How whould the application
know if it should read

...
Anytown, NY
98876

or

...
Anytown
NY, 98876

Converting the data automatically may cause invalid results. Do you
know if the application that writes the file is able to restore all
the data correctly?

Yes the original application will restore the data correctly. The original
application stores its info in a MS Access database(If that matters). The
original application allows me to export the data to a text file which is
what I have to read from. The Database is locked so I can't read from it
directly.

The original application seems to put a comma whereever there was a newline
character. Right now I'm baffled as to how it restores the correct
formatting unless having the data in a MS Access DB is easier.
 
Hi Slonocode
Another approach I saw your second message when I had almost finesed the
solution
What my start for that approach is, that in your example the first 3 rows
are always between the 3 first commas or the end. (What it is does not
matter). And then comes the problems.
But therefore I need to see how that last adress is written.
(I think that that extra "apt8b" is in almost every adres system is the
problem, but maybe with you not)
Cor
 
hope you can make sence of this :/

pseudo here i think it could work (if someone would be so kind to check it
:)

declare strings for al the possible fields (you can do w the w you want
later)
and declare a commacount and a backcommacount as int

strName = split text till first comma
if split text from first comma starts w c/o then
strCO = split text from first til second comma
strstreet = split text from 2nd til 3d comma
commacount = 3
else
strstreet = split text from first till second comma
commacount = 2
end if

if the apt field always starts w apt you can go on like that else start from
the back now
temptext = split text from last comma till end
if temptext = nummeric
strpostcode = temptext
strState = split text from 2nd last comma till last comma
backcommacount = 2
else
strState = split temptext on space (1nd part)
strpostcode = split temptext on space (2nd part)
backcommacount = 1
end if

strTown = split text from backcommacount -1 comma till backcommacount comma
backcommacount +=1

if split text from backcommacount - 1 till backcommacount <> strstreet then
strApp = split text from backcommacount - 1 till backcommacount
end if
 
Fergus said:
Hi Slonocode,

Working from the right hand side seems to be useful:

Every address ends with a space and a zip code.
Whip it off with LastIndexOf (" ").
Then Trim the end.
The it it EndsWith (",") remove that.
Then every address ends with a state code.
Whip it off. and add it to the zip code.

All other commas become new lines.

Regards,
Fergus

Thanks Furgus

All I can say is "DOH" why didn't I think of that.
 
Hi Slonocode,

Because you've got a fever and have had too much cowbell?? ;-)

Regards,
Fergus
 
EricJ said:
hope you can make sence of this :/

pseudo here i think it could work (if someone would be so kind to
check it :)

declare strings for al the possible fields (you can do w the w you
want later)
and declare a commacount and a backcommacount as int

strName = split text till first comma
if split text from first comma starts w c/o then
strCO = split text from first til second comma
strstreet = split text from 2nd til 3d comma
commacount = 3
else
strstreet = split text from first till second comma
commacount = 2
end if

if the apt field always starts w apt you can go on like that else
start from the back now
temptext = split text from last comma till end
if temptext = nummeric
strpostcode = temptext
strState = split text from 2nd last comma till last comma
backcommacount = 2
else
strState = split temptext on space (1nd part)
strpostcode = split temptext on space (2nd part)
backcommacount = 1
end if

strTown = split text from backcommacount -1 comma till backcommacount
comma backcommacount +=1

if split text from backcommacount - 1 till backcommacount <>
strstreet then strApp = split text from backcommacount - 1 till
backcommacount
end if

Thanks for the reply EricJ

I started working with Fergus's suggestion and came up with a solution.

I think your strategy probably has possibilities but it's a little too
specialized for the examples that I gave.

The part with c/o and apt could have things that start with anything.

Anyway after I came up with something that worked I was too lazy to try
another.

Thanks though
 
I guess I would try to use Split to determine which
iteration of the items below you are dealing with and
handle each appropriatly (using regex or array indexing)

IE:

Dim str As String = "John Smith, c/o Jane Smith, 12345
Main Street, Anytown, NY, 98876"
Dim strArray As String() = Split(str, ",")
Select Case strArray.Length
Case 4
'Add code here :)
Case 5
'Add code here :)
Case 6
Debug.WriteLine(strArray(3) & "," & strArray(4))
End Select

It's a start... the rest is up to you!
 
Back
Top