best design for parse

  • Thread starter Thread starter gs
  • Start date Start date
G

gs

let say I have to deal with various date format and I am give format string
from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming format
string
a) use two array and statically match
b) use regex to find the order
 
gs said:
let say I have to deal with various date format and I am give format
string from one of the following
dd/mm/yyyy
mm/dd/yyyy
dd/mmm/yyyy
mmm/dd/yyyy
dd/mm/yy
mm/dd/yy
dd/mmm/yy
mmm/dd/yy
dd/mm
what is the best way to come up a relevant regex for the incoming format
string

Maybe you are looking for 'DateTime.ParseExact'.
 
thank you, I give that a shot Hopefully . it will take care most of what I
need, or at least make the rest easier. except one thing,

I am dealing with lines of string data (up to 300 lines) and the date fields
position may not be known before hand although for a given set of lines,
they stay in the same place 99.999 of the time except for the odd comments
which is not that critical;
 
GS,

Maybe can you avoid this in 2007 and all things like that as
DateTime.parseExact, but have a look to the nicely by Microsoft inbuild
globalization and than the to that related ToString option.

Cor
 
thank you, Cor.

However, I must be thick. I don't quite get the drift as with regard to
2007. are we talking about a new release of studio, .net frame work or just
the release or patch to come out in 2007.

how would that handle string date mixed with other data?

Actually the original source of the data is displayed html table placed in
clipboard. the objective to standardize the date string to yyyy-mm-dd and
then pass on to other components for processing and storage
 
GS,

I was thinking about writting that this was not in the case with webpages.
However windowforms is the default in this newsgroup, therefore please tell
this next time.

Cor
 
the target is actually part of a windows .net application with winform that
embed webbrowser control.

I despite the clipboard source may well be in html table, but I can get the
text. the resulting text will have columns delimited by a couple of space
like characters

I am just in the designing stage to find the an easy to maintain approach
that will yield adequate performance on target PCs.
 
thanks for all pitched in so far.

let give it another shot.

looks like an easier way out would be
1.copy the date format string regex string holder and then derive the
relevant regex expression to be used for date normalization later in part 2:
replace the regex string the yyyy to regex year expression with year
identifier
look for yy and replace with 20yy and repeat the step above
replace mmm with the month regex expression associated with month
identifier
replace mm with the 2 digit month regex expression associated with month
identifier
replace dd with the 2 digit day regix expression assoc. with day
identifier

2. use the resulting regex in regex replace to normalize to yyyy--mm-dd


any problem with the above approach?
 
Stephany,

You would have seen (you are not a newbie) how much time it took especially
for me, before I got it accepted that the used VB.net language in ASPNET was
also a part of the language and not of the framework and therefore suspect
of this newsgroup. Maybe you even saw that last week I wrote that again in
the C# newsgroup.

I only ask to the OP to tell that if it is specialized on a webpage (what
seems to be not the case) to tell that. Most of the persons answering here
are taking windowsforms as default, and in the case of date times I seldom
ask that, because there is "leiter" no DateTime Value equivalent in HTML.

Cor
 
I think that you are missing the whole point.

Regular Expressions (Regex) are about pattern matching, not format matching.

It does not matter whether the source data comes from a HTML page, a Windows
Forms TextBox or a disk file. The source data is the source data and that is
all there is to it.

If the source data only contained one instance of a 'date' in dd/MM/yyyy
format then to find it by your methodology, you would need to test for up to
3,719,628 permutations from 01/01/0001 all the way up to 31/12/9999, i.e.,
31 (days) * 12 (months) * 9999 (years). Couple this up with the other 8
'formats' and you can how such a task will quickly become unmanagable.

But ... what you really are looking for is a sequence of 2 digits followed
by a slash followed by 2 digits followed by a slash followed by 4 digits.
That immediately takes care of 2 of your 'formats'. Off the top of my head
the regex for that is "[0-9]{2}/[0-9]{2}/[0-9]{4}".

The next pattern you are looking for is 2 digits followed by a slash
followed by 3 alphas followed by a slash followed by 4 digits.
"[0-9]{2}/{A-Za-z}{3}/{0-9}{4}".

The next pattern you are looking for is 3 alphas followed by a slash
followed by 2 digits followed by a slash followed by 4 digits.
"{A-Za-z}{3}/[0-9]{2}/{0-9}{4}".

The next 4 formats are taken care of by varying the above.
"[0-9]{2}/[0-9]{2}/[0-9]{2}", "[0-9]{2}/{A-Za-z}{3}/{0-9}" and
"{A-Za-z}{3}/[0-9]{2}/{0-9}{2}" respectively.

The last format is simply the pattern "[0-9]{2}/{0-9}{2}".

Now, the real secret is what directly precedes and follows your 'dates'. For
instance, are your 'dates' ALWAYS 'wrapped' in a tag? E.g.,
<td>07/01/2007</td>. It might be that there is always a space character
directly before 'date and another directly after the 'date'. Any such
information will allow you to 'tune' your pattern so that it doesn't pick up
false positives. The pattern [0-9]{2}/{0-9}{2} would pick up the 01/02 out
of 01MyQuite01/02YourQuote02.

All the patterns need to be put together in a regular expression woth or's
so that you can find all the candidate dates in one operation.

"\d{2}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/\d{2}"

Please feel free to jump in here if I've got that wrong because I'm by no
means a regex expert.


Once you have your candidate dates (matches) you need to deal with each one
in turn.

As Herfried said earlier you need to use DateTime.ParseExact.

For that you need an array of strings to hold all your formats.

Dim _formats As String() = new String() {"dd/MM/yyyy", "MM/dd/yyyy",
"dd/MMM/yyyy", "MMM/dd/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
"MMM/dd/yy", "dd/MM"}

For each candidate call DateTime.ParseExact, trapping an exception if it
occurs:

Dim _d As DateTime

Try
_d = DateTime.ParseExact(_candidate, _formats, Nothing,
DateTimeStyles.None)
' DateTime.ParseExact succeeded so we can deal with it
...
Catch _ex As FormatException
' Because we know that _candidate is not an empty string and none of the
elements of _formats is an empty string then _candidate does not contain a
date and time that corresponds to any element of _formats
....
End Try
 
You are sort of on the same track as mine.


I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user


Therefore, I can transform the date format mask to regex in the appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract
 
Again you're missing the point.

I think the best thing you can do is post a relatively small sample of the
text you are attempting to parse.

While you're doing that, execute the following and observe the results. It
demonstrates what I am talking about:

Dim _source As String = "On 07/01/2007 the quick brown fox jumps over the
lazy dog." & Environment.NewLine & _
"On 08/01/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On Jan/09/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 10/Jan/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 11/01/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 01/12/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On Jan/13/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 14/Jan/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 15/01 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"The part number XYZ/72/84 is now discontinued."

Dim _regex As New
Regex("\d{2}/\d{2}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|\d{2}/\d{2}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|\d{2}/\d{2}")

Dim _candidates As Integer = 0
Dim _matches As Integer = 0

Dim _match As Match = _regex.Match(_source)

While _match.Success
_candidates += 1
Console.WriteLine("{0} found at index {1}", _match.Value, _match.Index)
Try
Console.WriteLine("Converted value = {0:yyyy-MM-dd}",
DateTime.ParseExact(_match.Value, New String() {"dd/MM/yyyy", "MM/dd/yyyy",
"MMM/dd/yyyy", "dd/MMM/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
"MMM/dd/yy", "dd/MM"}, Nothing, DateTimeStyles.None))
_matches += 1
Catch _ex As Exception
Console.WriteLine(_ex.Message)
End Try
_match = _match.NextMatch()
End While

Console.WriteLine("{0} candidates found", _candidates)

Console.WriteLine("{0} matches found", _matches)
 
GS,

As long as you don't know the date format, you can probably do nothing.
As soon as you know the dateformat, you can try to use the
DateTime.ParseExact with the given patern.
(Don't forget to set the mm in Upercase and let it not be done by the user).

Cor
 
look like I am not expressing myself clearly. although the application does
not know which format is used but does know for a given Set which date
format I deals with and can expect the same format for a given Set of input.
I should not have used the term batch but a set of record. The only
possible variations are some records in certain sets may be split into 2
lines but that is not critical as the conditions can be described before
hand and normalized by the another parse component

sample date

Set1: date format mask is "dd MMM"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 Dec A1234988 Sample Parts description 1 10.00 20
18 Dec A1234988 Sample Parts description 1 10.00 20
19 Dec A1234988 Sample Parts description 1 10.00 20
12 Dec A1234988 Sample Parts description 1 10.00 20


Set 2 date format Mask is "dd MM yy"
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15 12 06 A1234988 Sample Parts description 1 10.00 20
18 12 06 A1234988 Sample Parts description 1 10.00 20
19 12 06 A1234988 Sample Parts description 1 10.00 20
12 12 06 A1234988 Sample Parts description 1 10.00 20

Set 3 date format mask "dd/MMM/06"
Parts Description location Quantitiy Unit Cost Total Cost
11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/12/06 A1234988 Sample Parts description 1 10.00
2018/12/06 A1234988 Sample Parts description 1 10.00
2019/12/06 A1234988 Sample Parts description 1 10.00
2012/12/06 A1234988 Sample Parts description 1 10.00 20

Set 4 date format mask ""
Date Parts ID Parts Description location Quantitiy Unit Cost Total
Cost
11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00
15/dec/06 A1234988 Sample Parts description 1 10.00 20
18/dec/06 A1234988 Sample Parts description 1 10.00 20
19/dec/06 A1234988 Sample Parts description 1 10.00 20
12/dec/06 A1234988 Sample Parts description 1 10.00 20

how do I deal with format without year, I do have cluse for other parts of
teh originatin website and optional default set by user

the sample data show variation of date format from set to set but the date
format that I need to deal within a given set are consistant and user has
influence to date format mask used.

Like Cor suggestion. don't let user enter the format but let the user pick
from a list. that will like be case at least n the version 0
 
thank you.

you do have a point but the application I have in mind to get most of easy
to do but boring and repetitive task out user quickly to get their buy in
for the next phrase. The application is not going to be perfect on version 0
but must be flexible to adapt to need change.

Furthermore I choose normalizing date format to yyyy-mm-dd because that is
the standard string date format that is acceptable by almost all standard
windows applications
for the users that I deal with despite locale, despite default display
format.

as a side note right now this application at version zero is not to automate
everything but help users to do their jobs and help us to gain understanding
of what they do. at the same time validate the transform process that will
be used later for automation. version 1 will automate a lot more and may
actually drive some excel, word application process


you could say the version zero is closer to Mickey mouse utility with, if
you wish

Stephany Young said:
Again you're missing the point.

I think the best thing you can do is post a relatively small sample of the
text you are attempting to parse.

While you're doing that, execute the following and observe the results. It
demonstrates what I am talking about:

Dim _source As String = "On 07/01/2007 the quick brown fox jumps over the
lazy dog." & Environment.NewLine & _
"On 08/01/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On Jan/09/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 10/Jan/2007 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 11/01/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 01/12/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On Jan/13/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 14/Jan/07 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"On 15/01 the quick brown fox again jumps over the lazy dog." &
Environment.NewLine & _
"The part number XYZ/72/84 is now discontinued."

Dim _regex As New
Regex("\d{2}/\d{2}/\d{4}|[A-Za-z]{3}/\d{2}/\d{4}|\d{2}/[A-Za-z]{3}/\d{4}|\d{
2}/\d{2}/\d{2}|[A-Za-z]{3}/\d{2}/\d{2}|\d{2}/[A-Za-z]{3}/\d{2}|\d{2}/\d{2}")

Dim _candidates As Integer = 0
Dim _matches As Integer = 0

Dim _match As Match = _regex.Match(_source)

While _match.Success
_candidates += 1
Console.WriteLine("{0} found at index {1}", _match.Value, _match.Index)
Try
Console.WriteLine("Converted value = {0:yyyy-MM-dd}",
DateTime.ParseExact(_match.Value, New String() {"dd/MM/yyyy", "MM/dd/yyyy",
"MMM/dd/yyyy", "dd/MMM/yyyy", "dd/MM/yy", "MM/dd/yy", "dd/MMM/yy",
"MMM/dd/yy", "dd/MM"}, Nothing, DateTimeStyles.None))
_matches += 1
Catch _ex As Exception
Console.WriteLine(_ex.Message)
End Try
_match = _match.NextMatch()
End While

Console.WriteLine("{0} candidates found", _candidates)

Console.WriteLine("{0} matches found", _matches)


GS said:
You are sort of on the same track as mine.


I must first apologize I did not tell you the complete story.

Although the application does not exactly know before hand what format the
data may come in, however part of the application allow user to define and
record favourite for a website
- to extract by text or html
- header content and format
- record format and date format ( that is where the date format mask
come in)
- optionally ordinal number for each column or re-ordering
- trailer content and format

For a given batch, at least for the body, date format are uniform

furthermore, the need to make the extract process generic and adaptable to
the front end that takes the user definitions, I believe it would be
easier
to "normalize" date string to "yyyy-mm-dd".

Also the end target for of may not necessarily be SQL database but may be
text, pasted to word report. or excel by user


Therefore, I can transform the date format mask to regex in the
appropriate
format and identifier I can use regex,replace to normalize the date. As a
matter of fact the date separator does not have to / but can be space as
long as there are identifiable delimiter around the date string.

I already have code for dealing with regex for dates from prior project.
all I have to do is adapt to the present need

who knows, maybe I taken on a totally offbeat tract

part
2:
 
Now we're cooking with gas. I think that regex is overkill for this
'problem'. Sure, you can use it if you wish but I think you will be making a
rod for your own back.

Here is a solution that works for your sample data. Create a Windows Forms
project, plonk a button on the form and paste the following into the form:

Private m_source1 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 Dec A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 Dec A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 Dec A1234988 Sample Parts description 1 10.00 20"

Private m_source2 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11 12 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19 12 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private m_source3 As String = "Parts Parts ID Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/12/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/12/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/12/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source4 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"11/dec/06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"15/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"18/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"19/dec/06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12/dec/06 A1234988 Sample Parts description 1 10.00 20"

Private m_source5 As String = "Date Parts ID Parts Description location
Quantitiy Unit Cost Total Cost" & Environment.NewLine & _
"12 13 06 A1234987 Sample Parts description W1I1R4S1 2 10.00 20.00" &
Environment.NewLine & _
"12 15 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 18 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 19 06 A1234988 Sample Parts description 1 10.00 20" &
Environment.NewLine & _
"12 12 06 A1234988 Sample Parts description 1 10.00 20"

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click

Console.WriteLine()

Console.WriteLine("Sample 1")

ProcessData(m_source1)

Console.WriteLine()

Console.WriteLine("Sample 2")

ProcessData(m_source2)

Console.WriteLine()

Console.WriteLine("Sample 3")

ProcessData(m_source3)

Console.WriteLine()

Console.WriteLine("Sample 4")

ProcessData(m_source4)

Console.WriteLine()

Console.WriteLine("Sample 5")

ProcessData(m_source5)

Console.WriteLine()

End Sub

Private Sub ProcessData(ByVal source As String)

' Assumption: Lines of data are seperated by a carriage return/line feed
pair
Dim _lines As String() = source.Split(New String()
{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)

' Determined by eyeballing data: All 'fields' are delimited by a pair of
spaces
Dim _ss As String() = _lines(0).Split(New String() {" "},
StringSplitOptions.None)

' Determine which line is the first line of actual data
' If the first line is a heading line then all characters of the first
field will be letters
Dim _lettercount As Integer = 0
For Each _c As Char In _ss(0)
If Char.IsLetter(_c) Then _lettercount += 1
Next
Dim _firstline As Integer = 0
If _lettercount = _ss(0).Length Then _firstline = 1

'Split the first actual line on the field delimiter
_ss = _lines(_firstline).Split(New String() {" "},
StringSplitOptions.None)

' Determined by eyeballing data: The date field is always the first
field in the line

' Determine the delimiter to be used for the date format
Dim _delimiter As String = ""
If _ss(0).IndexOf(" ") > 0 Then
_delimiter = " "
ElseIf _ss(0).IndexOf("/") > 0 Then
_delimiter = "/"
ElseIf _ss(0).IndexOf("-") > 0 Then
_delimiter = "-"
Else
Console.WriteLine("Unable to determine delimiter out of " & _ss(0))
Return
End If
Console.WriteLine("Determined delimiter as '" & _delimiter & "'")

' Construct the date format to be used
Dim _format As String = String.Empty
' Split the first field on the date format delimiter
Dim _parts As String() = _ss(0).Split(New String() {_delimiter},
StringSplitOptions.None)
If _parts.Length = 2 Then
' If there are 2 parts then we only have day and month components
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter & "MMM"
If _parts(1).Length > 3 Then _format &= "M"
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the 2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length))
If Integer.Parse(_parts(1)) > 12 Then
' The 1st part must be the month and the 2nd part must be the day
_format = New String("M"c, _parts(0).Length) & _delimiter & New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if both parts are < 12 and are different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
ElseIf _parts.Length = 3 Then
' If there 3 parts then we have day, month and year components
' Assume that the year is always th 3rd part
If Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsLetter(_parts(1).Chars(0)) Then
' The 1st part starts with a digit and the 2nd part starts with a
letter
' so we can assume that the 1st part is the day and the 2nd part is
the month
_format = New String("d"c, _parts(0).Length) & _delimiter & "MMM"
If _parts(1).Length > 3 Then _format &= "M"
_format &= _delimiter & New String("y"c, _parts(2).Length)
ElseIf Char.IsDigit(_parts(0).Chars(0)) AndAlso
Char.IsDigit(_parts(1).Chars(0)) Then
' Both parts start with a digit
' Start with the assumption that the 1st part is the day and the 2nd
part is the month
_format = New String("d"c, _parts(0).Length) & _delimiter & New
String("M"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
If Integer.Parse(_parts(1)) > 12 Then
' The 1st part must be the month and the 2nd part must be the day
_format = New String("M"c, _parts(0).Length) & _delimiter & New
String("d"c, _parts(0).Length) & _delimiter & New String("y"c,
_parts(2).Length)
End If
' There is big gotcha here if the forst two parts are < 12 and are
different
' E.G. We don't really if 01 02 (01/02, 01-02, etc.) should be 1
February or January 2
End If
End If
If _format.Length = 0 Then
' We were unable to determine the date format from the available
information
Console.WriteLine("Unable to determine format from " & _ss(0))
Return
End If

' We were able to determine the date format so we can continue and parse
the dates
Console.WriteLine("Determined format as " & _format)

' Start from our actual first line of data
For _i As Integer = _firstline To _lines.Length - 1
_ss = _lines(_i).Split(New String() {" "}, StringSplitOptions.None)
Dim _date As DateTime = DateTime.ParseExact(_ss(0), _format, Nothing)
Console.WriteLine("Read from input: " & _ss(0) & " - Interpreted date:
" & _date.ToString("yyyy-MM-dd"))
Next

End Sub

Note, from the results, that if there is no year part then
DateTime.ParseExact will interpret tahe date being in the current year as
determined from the system date at the time the code is executed.
 
Stephany,

I am curious, what does this phrase mean, I don't know it.
Now we're cooking with gas.

(Living in Holland which is above one of the former biggest gasbells of
Europe)

Cor
 
It's a euphemism for:

Efficiently performing a task after a long period
of inefficient performance or possibly failed
attempts at the entire task or certain steps in the process.

Vefore we saw the sample data we were 'shooting in the dark'. As soon as the
sample data was posted it all became clear.
 
I see the code work hard on does work most a lot of cases, but it not better
we get assistance from user who knows what date format being used? That is
the rationale I let user somehow pick the date format mask. Guessing date
format is tough to master for all cases. Not only months, days can be
indeterminate at time; worse when 2 digit year is used. I have seen some
sample data that is way out of ordinary date format commonly seen in US.

relying the first 1 or 2 being numeric would miss out quite a few cases.
Nonetheless. the code can be a default in absence of user spec. . thank you
very much for that

Sorry for misleading you with incomplete data samples.
There are sample data set where the first column is not date. on the other
sometimes first 2 columns can also be dates as well as rarely another column
else where can to date. this sound incredulous but that's what users have
to content with.
 
Back
Top