Find Missing Data

  • Thread starter Thread starter Justin Emlay
  • Start date Start date
J

Justin Emlay

I'm hopping someone can help me out on a payroll project I need to
implement.

To start we are dealing with payroll periods. So we are dealing with an
exact 10 days (Monday - Friday, 2 weeks).

I have a dataset as follows (1 week to keep it short):

Employee 1 - Date 1
Employee 1 - Date 2
Employee 1 - Date 3
Employee 1 - Date 4
Employee 1 - Date 5
Employee 2 - Date 1
Employee 2 - Date 3
Employee 2 - Date 4
Employee 2 - Date 5

You'll notice Employee 2 - Date 2 is missing. I need to develop a routine
that will give me all missed dates per employee.

I'm open to any ideas/theories. I have one of my own but before I get
started on it I wanted some opinions on it. I'm not sure how productive
such a routine will be. It sounds like a cluster f. to me.


Create an array of unique employee numbers

then

Loop
through that array and create a second array of unique dates. If the ubound
of that second array is < 9 then figure out missed date(s) (I have period
start). Dump employee number and date into a third array. Destroy second
array.
Next


Thoughts?
 
Hi Justin,

I assume every date is a row.

Seeing this than there are two question, you want to know if there are dates
missing, that is easy, you set them employee by employee in a dataview, the
count should always be 10.

Than you can show which dates are there, it should than for every programmer
be very easy to know what is missing.

(I would first make an array with dates which should be and test when there
is a missing one. The array because you have to deal with weekends which can
make it complex when you do it in a direct routine calculating everytime the
dates)

I assume that you do not have to test on doubles, which is of course with
that dataview as well a piece of cake.

Or do I see this the simple?

Cor
 
Have you thought about creating a checksum?


For I = 1 To 10
' parse data into line (ie: first line would be "Employee 1 - Date
1")
Do Until Right$(line, I) = I
Missing = Missing + 2^(I - 1)
I = I + 1 'double execute
Loop
' (This is a loop because there might be consecutive missing
records)
' save data at right spot, if data was missing, the counter has been
updated to reflect that, and ' data will still be stored at the
index corresponding to the Index
Next I

' Now you have a Missing variable, which is unique for all occasions. I
guess you don't need explanations how to read the variable again and figure
out the missing links... if you do, post here again, and I'll try to help.
BTW: this is a solution which doesn't require too much memory (only 1
integer which holds the checksum). If you'd rather have a less processor
consumtive way (you will need to read this variable again), then you might
consider just storing the numbers of all the missing lines into an array,
which might be a bit faster. Not much though, and it will take loads of
extra memory.


I'm guessing the distinguishing between dates etc. won't be as easy as
described (ie: the lines won't just end with 'date 1', but with a real
date), but you can always load the supposedly correct dates into an array
and then examine if the input matches array(I).

Hope this helps and wasn't too fuzzy, I'm not great at explaining things...

John
 
The problem extents a little further. If there is no date then there is no
record. There will always be a date and an employee number as a record was
added to the DB. Each record is a "time sheet". If entered, thats great.
We need to know missing timesheets. I wish there was someway to compair
arrays and output the differences.

ArrayDates() - pre defined what the 10 dates are
Array1() - Unique set of employee numbers
Array2() - Unique set of dates per employee Array1(i)

Compair ArrayDates() with Array2() and give me the difference.
 
Hi John,

With a dataview this is.

dim dv as datatable(datatablesheets)
for each employee in tableEmployee
dv.rowfilter = "employee = " & employeenumber
If dv.count <> 10 then
'there is a date missing,
next

And when there are more dates than the datatable with sheets has to be
filled with a select distinct on order date

Why do you want to use that in my opinion more difficult method, I can
assure you that the method I show goes very fast?

Cor
 
Justin,
I would consider doing this in the database itself with correlated outer
joins if possible.

Presumable you really want to know a list of employees with the dates they
are missing, correct?

If you "needed" to do it client side, I would probably use DataTables
instead of arrays...

Here is one possibility:

Dim ds As New DataSet("JustinEmlay")

' define the tables needed
Dim employees As New DataTable("employees")
employees.Columns.Add("id", GetType(Integer))
employees.Columns.Add("name", GetType(String))
employees.PrimaryKey = New DataColumn() {employees.Columns("id")}
ds.Tables.Add(employees)

Dim dates As New DataTable("dates")
dates.Columns.Add("id", GetType(Integer))
dates.Columns.Add("name", GetType(String))
dates.PrimaryKey = New DataColumn() {dates.Columns("id")}
ds.Tables.Add(dates)

Dim timeSheets As New DataTable("timeSheets")
timeSheets.Columns.Add("employeeId", GetType(Integer))
timeSheets.Columns.Add("dateId", GetType(Integer))
timeSheets.PrimaryKey = New DataColumn()
{timeSheets.Columns("employeeId"), timeSheets.Columns("dateId")}
ds.Tables.Add(timeSheets)

Dim missing As New DataTable("missing")
missing.Columns.Add("employeeId", GetType(Integer))
missing.Columns.Add("dateId", GetType(Integer))
missing.PrimaryKey = New DataColumn()
{missing.Columns("employeeId"), missing.Columns("dateId")}
ds.Tables.Add(missing)

' define relationships between the tables
ds.Relations.Add("dateTimeSheets", dates.Columns("id"),
timeSheets.Columns("dateId"))
ds.Relations.Add("employeeTimeSheets", employees.Columns("id"),
timeSheets.Columns("employeeId"))

ds.Relations.Add("dateMissing", dates.Columns("id"),
missing.Columns("dateId"))
ds.Relations.Add("employeeMissing", employees.Columns("id"),
missing.Columns("employeeId"))

' add some sample data
With employees.Rows
.Add(New Object() {1, "Employee 1"})
.Add(New Object() {2, "Employee 2"})
.Add(New Object() {3, "Employee 3"})
End With

With dates.Rows
.Add(New Object() {1, "Date 1"})
.Add(New Object() {2, "Date 2"})
.Add(New Object() {3, "Date 3"})
.Add(New Object() {4, "Date 4"})
.Add(New Object() {5, "Date 5"})
End With

With timeSheets.Rows
.Add(New Object() {1, 1}) 'Employee 1 - Date 1
.Add(New Object() {1, 2}) 'Employee 1 - Date 2
.Add(New Object() {1, 3}) 'Employee 1 - Date 3
.Add(New Object() {1, 4}) 'Employee 1 - Date 4
.Add(New Object() {1, 5}) 'Employee 1 - Date 5

.Add(New Object() {2, 1}) 'Employee 2 - Date 1
.Add(New Object() {2, 3}) 'Employee 2 - Date 3
.Add(New Object() {2, 4}) 'Employee 2 - Date 4
.Add(New Object() {2, 5}) 'Employee 2 - Date 5

.Add(New Object() {3, 2}) 'Employee 2 - Date 5
.Add(New Object() {3, 5}) 'Employee 2 - Date 5
End With

' find the missing dates for each employee
For Each employee As DataRow In employees.Rows
For Each [date] As DataRow In dates.Rows
Dim keys() As Object = {employee!id, [date]!id}
If Not timeSheets.Rows.Contains(keys) Then
missing.Rows.Add(keys)
End If
Next
Next

' display the results
For Each miss As DataRow In missing.Rows
Debug.WriteLine(miss!dateId, miss!employeeId.ToString())

Dim [date] As DataRow = miss.GetParentRow("dateMissing")
Dim employee As DataRow = miss.GetParentRow("employeeMissing")
Debug.WriteLine([date]!name, DirectCast(employee!name, String))
Next

Hope this helps
Jay
 
Jay,

Do not see this as a flame, I am really curious what is wrong with the
solution I advise.

I know that in that solution you only know that it is not complete, however
not what is missing, however that is the most simple part to find then. For
that I have maybe thousand solutions.

I really become in doubt what is wrong with my solution?

Cor
 
Cor,
I don't see anything "wrong" with your solution.

Why would you think there is something wrong with it?

Jay
 
Cor,
Now that you mention it (something wrong) :-)

In John's original post he stated:

<quote>
You'll notice Employee 2 - Date 2 is missing. I need to develop a routine
that will give me all missed dates per employee.
<quote>

Hence my routine that "gives all missed dates per employee"...

You do realize that if you can also define a relationship between the
employee table & the timesheet table you can simply check the length of the
child rows to see if there is a date missing...

Something like (untested)
for each employee in tableEmployee
If employee.GetChildRows("employeeTimeSheets").Length said:
'there is a date missing, End If
next

For Each employee As DataRow In employees.Rows
Const format As String = "employeeId = {0}"
Const expression As String = "count(employeeId)"
Dim filter As String = String.Format(format, employee!id)
Dim count As Integer
count = CInt(timeSheets.Compute(expression, filter))
Next

Which one of the three I used would depend on the requirements of the
problem.

I still don't really see anything wrong with your solution... We both just
have alternatives!

Hope this helps
Jay
 
Doh!

I lost something there...

I wanted to say you can use either GetChildRows or DataTable.Compute, in
addition to a DataView as you showed.

This is an example of using DataTable.Compute:
For Each employee As DataRow In employees.Rows
Const format As String = "employeeId = {0}"
Const expression As String = "count(employeeId)"
Dim filter As String = String.Format(format, employee!id)
Dim count As Integer
count = CInt(timeSheets.Compute(expression, filter))
Next

Jay
 
Thanks for all the info guys!

Looks like I need to get up to speed on child/parent data in tables. I've
never done that before. I should be able to make something work with all
this.

Thanks again!
 
Jay,

This was the routine I had in mind, when there are no weekends it is of
course a lot easier, however here in the EU most people have a work week of
5 days. There are maybe exceptions as sailors and truckers, however than
there is no regular scheme and you can not check if every day exist.

(By the way, you do probably not know how strange it is for us not USA
people to have to use that in my opinion crazy USA date system in VB.net (I
normaly use therefore Cdate). Do you know if in the next version ISO is used
for dates?)

Dim dates(9) As DateTime
count = 0
Do Until count = 10
If startdate.DayOfWeek < 6 _
AndAlso startdate.DayOfWeek > 0 Then
dates(count) = startdate
count += 1
End If
startdate = startdate.AddDays(1)
Loop
Dim dv As New DataView(dt)
dv.RowFilter = "Employee = 100"
Dim sb As New System.Text.StringBuilder
If dv.Count <> 10 Then
For Each testdate As Date In dates
Dim exist As Boolean = False
For Each dr As DataRow In dt.Rows
If testdate = CDate(dr(0)) Then
exist = True
Exit For
End If
Next
If exist = False Then
sb.Append(testdate.ToString("dd-MM-yyyy"))
sb.Append(" ")
End If
Next
MessageBox.Show("Missing are: " & sb.ToString)
End If

Cor
 
Jay,

When you want to test it, than this, I send it seperatly not for you however
I get the idea that some people become afraid that it is to complex when
they see the building of the test.
(I saw that I had not put the declaration of the startdate in the sample
above that is in this).

\\\
Dim dt As New DataTable
dt.Columns.Add(New DataColumn("Dates",
Type.GetType("System.DateTime")))
dt.Columns.Add("Employee")
Dim startdate As DateTime = #7/1/2004#
Dim workdate As DateTime = startdate
Dim count As Integer = 0
Do Until count = 10
If workdate.DayOfWeek < 6 AndAlso workdate.DayOfWeek > 0 Then
Dim dr As DataRow = dt.NewRow
dr(0) = workdate
dr(1) = "100"
dt.Rows.Add(dr)
count += 1
End If
workdate = workdate.AddDays(1)
Loop
dt.Rows.RemoveAt(2)
dt.Rows.RemoveAt(4)
///
Cor
 
Cor,
(By the way, you do probably not know how strange it is for us not USA
people to have to use that in my opinion crazy USA date system in VB.net (I
normaly use therefore Cdate). Do you know if in the next version ISO is used
for dates?)
You should only have to use the "USA date" for date literals in your source.
If you call ToString (without a format) on a date it should follow your
settings under the Regional settings.

I don't see the compiler changing date literals as it would break too much
existing code, although I would expect all literals to be in ISO formats...

Hope this helps
Jay
 
(By the way, you do probably not know how strange it is for us not USA
You should only have to use the "USA date" for date literals in your source.
If you call ToString (without a format) on a date it should follow your
settings under the Regional settings.

I don't see the compiler changing date literals as it would break too much
existing code, although I would expect all literals to be in ISO formats...
That is why I use
dim datum as date = Cdate("01-07-2004") for #07/01/2004#
however it looks weird in my opinion, when you know there is a literal
possibility.

I thought there was no alternative, however maybe you knew.

Thanks

Cor
 
Cor
dim datum as date = Cdate("01-07-2004") for #07/01/2004#

Unfortunately your code is ambiguous! do you mean Jan 7th or July 1st?

Dim datum As Date = CDate("01-07-2004") 'for #07/01/2004#

Debug.WriteLine(datum.ToLongDateString(), "datum")

I get Jan 7th, which I don't think you were expecting!

I would recommend using the date literals, where are defined not to be
region specific, or the DateTime constructor.

Dim datum As New DateTime(2004, 7, 1)

Hope this helps
Jay
 
Hi Jay,

That what you show now is what I was looking for.

I knew it only forgot it, thank you for helping me finding again.
(Practicly that is of course ISO in my opinon).

Thanks

Cor
 
And how about that routine?

That saterday and sunday is the trouble for me in this chalenge, without it
it is even much easier because than you can just calculating go through it.

Cor
 
Back
Top