Regular Expressions

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have a text file that contains records that may or may not have a line for every field. Here is an example where record #2 is missing the ADDRESS2 line

NAME: JOHN SMIT
ADDRESS1: 123 MAIN S
ADDRESS2: APT
STATE: O
NAME: PAT DO
ADDRESS1: 678 MARKET S
STATE: C

Below is a pattern that matches only the first record. I have tweaked it every way that I can but I cannot capture the second record and have the groups contain the field value

NAME: ([^\r]*)\r\nADDRESS1: ([^\r]*)\r\nADDRESS2: ([^\r]*)\r\nSTATE: ([^\r]*)\r\

When I execute this pattern, I get
group(1) = 'JOHN SMITH
group(2) = '123 MAIN ST
group(3) = 'APT 2
group(4) = 'OH

Great answer for record 1 but no record 2

Is there a way to do this with a single pattern or do I need to create a pattern for each record combination? I thought this would be easy.
 
Use this:

NAME: (?<name>[^\r]*)\r\nADDRESS1: (?<address1>[^\r]*)\r\n(?:ADDRESS2:
(?<address2>[^\r]*)\r\n|(?<address2>))?STATE: (?<state>[^\r]*)(?:\r\n)?

Each match will be the entire record, and the values for name, address1,
address2, and state will be in the named groups by the same name. I
included an alternation in the address2 section so that it will accept a
record with no address2. When you get the value of the named group for
address2, it will return an empty string if the record does not have a
value. You can easily duplicate this logic for any of the other fields that
are optional.

Brian Davis
www.knowdotnet.com



genojoe said:
I have a text file that contains records that may or may not have a line
for every field. Here is an example where record #2 is missing the ADDRESS2
line:
NAME: JOHN SMITH
ADDRESS1: 123 MAIN ST
ADDRESS2: APT 2
STATE: OH
NAME: PAT DOE
ADDRESS1: 678 MARKET ST
STATE: CA

Below is a pattern that matches only the first record. I have tweaked it
every way that I can but I cannot capture the second record and have the
groups contain the field value.
NAME: ([^\r]*)\r\nADDRESS1: ([^\r]*)\r\nADDRESS2: ([^\r]*)\r\nSTATE: ([^\r]*)\r\n

When I execute this pattern, I get:
group(1) = 'JOHN SMITH'
group(2) = '123 MAIN ST'
group(3) = 'APT 2'
group(4) = 'OH'

Great answer for record 1 but no record 2.

Is there a way to do this with a single pattern or do I need to create a
pattern for each record combination? I thought this would be easy.
 
Back
Top