pull dates out of a text file

  • Thread starter Thread starter Smokey Grindle
  • Start date Start date
S

Smokey Grindle

Ok I must admit I stink at regular expressions... been trying to learn them
for a while now and its not sticking how I wish it would... but I am trying
to take a very long string (about 30KB) and pull out all the dates in it
that are in mm/dd/yyyy format and put them into a collection... how would
you go about writing this regex? and then how would you move them into a
collection? thanks a lot!
 
Smokey said:
Ok I must admit I stink at regular expressions... been trying to learn them
for a while now and its not sticking how I wish it would... but I am trying
to take a very long string (about 30KB) and pull out all the dates in it
that are in mm/dd/yyyy format and put them into a collection... how would
you go about writing this regex? and then how would you move them into a
collection? thanks a lot!

I'm no Regex-pert by any means, but this should get you close:

Imports System.Text.RegularExpressions

Dim sInput as String = ...
Dim sPattern as String _
= "([0123][0-9]\/[01][0-9]\/[12][0-9]{3})"
Dim mc As MatchCollection _
= Regex.Matches( sInput, sPattern )

If mc.Count > 0 Then
For Each m As Match In mc
Debug.Writeline( _
m.Groups(1).Value _
)
Next
End If

HTH,
Phill W.
 
I think I understand whats going on there, is that in british date format?
aka dd/mm/yyyy? I am assumeing it is because it looks like you are saying
first digit is 0 to 3, second digit is 0 through 9 then a divider then 0 or
1 and second number 0-9 divider then centrury 1 or 2 with 000 though 999?
what about if the date doesnt have a 01/01/2007 format? aka missing the
leading zero so its now 1/1/2007? how could you extend it to match both
variations? thanks a lot!


Phill W. said:
Smokey said:
Ok I must admit I stink at regular expressions... been trying to learn
them for a while now and its not sticking how I wish it would... but I am
trying to take a very long string (about 30KB) and pull out all the dates
in it that are in mm/dd/yyyy format and put them into a collection... how
would you go about writing this regex? and then how would you move them
into a collection? thanks a lot!

I'm no Regex-pert by any means, but this should get you close:

Imports System.Text.RegularExpressions

Dim sInput as String = ...
Dim sPattern as String _
= "([0123][0-9]\/[01][0-9]\/[12][0-9]{3})"
Dim mc As MatchCollection _
= Regex.Matches( sInput, sPattern )

If mc.Count > 0 Then
For Each m As Match In mc
Debug.Writeline( _
m.Groups(1).Value _
)
Next
End If

HTH,
Phill W.
 
Oops - 'tis indeed British format.

To cater for (a) US dates and (b) dates without leading zeroes, this
might do the trick:

"([0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4})"

Which would spot
1 or 2 digits (day or month),
slash,
1 or 2 digits (month or day),
slash,
4 digits (year).

I'm afraid I /don't/ know how to spot 2-digit or 4-digit years without
allowing some strange 3-digit variety as well... :-(

HTH,
Phill W.

Smokey said:
I think I understand whats going on there, is that in british date format?
aka dd/mm/yyyy? I am assumeing it is because it looks like you are saying
first digit is 0 to 3, second digit is 0 through 9 then a divider then 0 or
1 and second number 0-9 divider then centrury 1 or 2 with 000 though 999?
what about if the date doesnt have a 01/01/2007 format? aka missing the
leading zero so its now 1/1/2007? how could you extend it to match both
variations? thanks a lot!


Phill W. said:
Smokey said:
Ok I must admit I stink at regular expressions... been trying to learn
them for a while now and its not sticking how I wish it would... but I am
trying to take a very long string (about 30KB) and pull out all the dates
in it that are in mm/dd/yyyy format and put them into a collection... how
would you go about writing this regex? and then how would you move them
into a collection? thanks a lot!
I'm no Regex-pert by any means, but this should get you close:

Imports System.Text.RegularExpressions

Dim sInput as String = ...
Dim sPattern as String _
= "([0123][0-9]\/[01][0-9]\/[12][0-9]{3})"
Dim mc As MatchCollection _
= Regex.Matches( sInput, sPattern )

If mc.Count > 0 Then
For Each m As Match In mc
Debug.Writeline( _
m.Groups(1).Value _
)
Next
End If

HTH,
Phill W.
 
Phill said:
I'm afraid I /don't/ know how to spot 2-digit or 4-digit years without
allowing some strange 3-digit variety as well... :-(

Welcome to:

(?:2digitYearMatch|4digitYearMatch)

Logical OR. Similar to (matchRegex) grouping, but does not capture to
\1, \2, $1, $2, etc...

Very handy for simple spam filtering.

~Jason

--
 
Smokey Grindle said:
Ok I must admit I stink at regular expressions... been trying to learn
them for a while now and its not sticking how I wish it would... but I am
trying to take a very long string (about 30KB) and pull out all the dates
in it that are in mm/dd/yyyy format and put them into a collection... how
would you go about writing this regex? and then how would you move them
into a collection? thanks a lot!

Here ya go. I'm moving the results into a DateTime-array instead. But made
a Function out of it so all you have to do is pass in the string that you
want to search through and the function will return all of the VALID dates
as the DateTime-array. Note: There is no exception handling.


Private Function ParseDates(ByVal Text As String) As DateTime()
' The following pattern matches string in the format of ##/##/####. It
' does not check for invalid dates, just matches...we perform the valid
' date checks when converting them to dates anyways, so why do it here?
Dim pattern As String = _
"(?<Date>\d{2}/\d{2}/\d{4})"

' Create the ArrayList that stores the results.
Dim validDates As ArrayList = New ArrayList()

' Search the Text for the first match.
Dim match As Match = Regex.Match(Text, pattern)

' Loop through each match. End the loop when the first unsuccessful
' match occurs.
While match.Success
' Store the match for easier reference.
Dim matchText As String = match.Groups("Date").Value

' Ensure the match is a valid date. In .Net v2.0+, I'm pretty sure
' there is a DateTime.TryParse that you *should* use instead, but
' since I'm writing this using .Net v1.1, I'm using IsDate.
If IsDate(matchText)
' Parse the text as a DateTime and add the resulting DateTime
' to our ArrayList.
validDates.Add(DateTime.Parse(matchText))
End If

' Search the Text for the next match.
match = match.NextMatch()
End While

' Convert the ArrayList to an array, cast the array as a DateTime array,
' and return the resulting DateTime array.
Return DirectCast(validDates.ToArray(GetType(DateTime)), DateTime())
End Function

HTH,
Mythran
 
Back
Top