repeated conversion of data summarized into rows and columns

  • Thread starter Thread starter Ian
  • Start date Start date
I

Ian

I have data ("Usage") with hours 1-24 as the column headers and dates Jan
1-Dec 31 in rows 1-365 for user 1, rows 366-733 for user 2, etc... for 1200
users. The year 2005 is on the first sheet, 2006 on the second, etc....

I'm going to clean up the data by putting it into an access db with only 4
fields (Date, Hour, User, Usage) (31 million rows) then connect with Excel to
use pivot table functions.

Any good ideas about how to unsummaryize the data?

Hour
User Date 1 2 3 4 5
1 1-Jan 5.00 6.00 4.00 4.00 10.00
1 2-Jan 3.00 2.00 5.00 9.00 3.00
1 3-Jan 11.00 6.00 8.00 2.00 2.00
1 4-Jan 7.00 5.00 8.00 4.00 10.00
1 5-Jan 2.00 8.00 4.00 9.00 7.00
 
Possible double post - system hiccuped.

First of all, 31 million rows of data is a pretty hefty load for Access.
But for any database I'd use 5 tables :
UsersTable 1200 entries for your users individual identification information;
DatesTable 365 or 366 entries for each year and
HoursTable with up to 24 Hours entries per date (you could possibly do away
with this one and store the Hour as part of the 4th table);

The 4th table, UsageTable, would be a table that holds pointers to the
UsersTable, Dates and to the HoursDates table and containing a usage entry.

To add new usage entries, you'd choose User, Date and Hour from 3 drop downs
and enter the usage on a form. An entry in UsageTable would contain nothing
but Long values for the 3 pointers, plus an integer or single field for the
usage. Fields in it would be your 4 fields. Single record in it might look
like:
USER DATE HOUR Usage
1 2 3 5.00
which is the same as your User 1, 2-Jan-2005, (hour) 3 5.00 entry in the
sample table.
That's about as compact as I can envision it, and relatively easy to create
queries into, and it actually adheres to normalization rules.

Now, what exactly are you looking for from this forum with your question of
"Any good ideas about how to unsummarize the data?" Code to move into Access
tables?, Way of reorganizing it to do the same? Something else?
 
Thanks a lot -- I'll restructure the tables to accomidate the normalization.
I have no idea how to create the code so I appreciate it. I got the db
working at 1.2GB but it was pushing the boundries. I may move it to a blade
server running SQL.

I'll check here but my email is eye (one letter) furst at cdsg dot canada
(suffix)

Thanks again. Ian.
 
Lets see how this works for you. For the moment I did not try to normalize
anything - this is going to write it all into one huge table. You can see
the database name, table name and field names I used in the code. You will
have to change all of that by changing the "Const" values in the code.

This will do the job in one 'pass', so it may take a while to complete.
This may be a task you want to start just some time before you head off to
bed, or leave from work some day. 31 million of anything is a lot of things
to work through.

To put the code into your workbook:
Start by makiing a copy of your workbook to work/test with!! Nothing like
losing 31M rows of data to just ruin your day.
Open the copy of the workbook. Press [Alt]+[F11] to open the VB Editor
(VBE).
From the VBE menu, choose Insert --> Module
Copy the code below and paste it into the empty module presented to you.
Make edits to the Const values to set it up to access your database and a
table set up to receive the data.
Set up the Tools --> References (again from the VBE menu), to reference the
latest
Microsoft ActiveX Objects x.x Library
where 'x.x' is the version number listed - there may be several versions in
the list.

My thanks to www.exceltip.com for providing the code snippet that made up
the heart of this solution. Saved me a lot of time in reviving old memories
on how to connect to Access!

Sub ADOFromExcelToAccess()
'requires a Tools --> References entry to
'Microsoft ActiveX Data Objects x.x Library
' used 6.0 in this application.
'*********************************************
'Redefine these Const values to correspond to*
'the information about your Access database *
'*********************************************
'your full path to the database file
Const fullPathToDB = "X:\IAN_DB\IansUserDB.mdb"
'the name of the table to put the data into
Const tableName = "tbl_RawDataFromExcel"
'the field names in the table
Const UserIDField = "fUser"
Const DateField = "fDate"
Const HourField = "fHour"
Const UseField = "fUsage"
'the first row with user information on the worksheets
Const firstRow = 2 ' assumes row 1 has labels
'end of user definable values

Dim WS As Worksheet
'user from row being examined
Dim currentUser As String
'date from row being examined
Dim currentDate As Date
' used cells in column A
Dim userListRange As Range
' individual cell within userListRange
Dim anyUser As Range
' Column Pointer to work through Hour entries
Dim CP As Integer

' exports data from the active worksheet to a table in an Access database
' this procedure must be edited before use
Dim cn As ADODB.Connection
Dim rs As ADODB.Recordset
Dim r As Long

'assume there will be some work to be done, so
'set up link over to the Access database here
Set cn = New ADODB.Connection
cn.Open "Provider=Microsoft.Jet.OLEDB.4.0; " & _
"Data Source=X:\IAN_DB\IansUserDB.mdb;"
' open a recordset
Set rs = New ADODB.Recordset
' connect to the Access database
' all records in a table
rs.Open "tbl_RawDataFromExcel", cn, _
adOpenKeyset, adLockOptimistic, adCmdTable

'works through ALL worksheets in the workbook
'assuming each has user entries and that no
'other type of information is in the workbook.
For Each WS In ThisWorkbook.Worksheets
'set a reference to the used cells in column A
'of the worksheet
r = WS.Range("A" & Rows.Count).End(xlUp).Row
If r >= firstRow Then ' there is data on the sheet
Set userListRange = WS.Range("A" & firstRow & ":A" & r)
For Each anyUser In userListRange
'skip any gaps in user ID entries
If Not IsEmpty(anyUser) Then
r = anyUser.Row
currentUser = anyUser.Value
currentDate = anyUser.Offset(0, 1) ' from col B
'work through the hourly entries col's C through Z
For CP = 3 To 26 'columns C through 26 = hrs 1-24
'if usage is not zero, add a record to Access
If WS.Cells(r, CP) <> 0 Then
With rs
.AddNew ' create a new record
' add values to each field in the record
.Fields(UserIDField) = currentUser
.Fields(DateField) = currentDate
.Fields(HourField) = CP - 2 ' 3-2=1 ... 26-2=24
.Fields(UseField) = WS.Cells(r, CP).Value
.Update ' stores the new record
End With
End If
Next ' next hour column
End If ' end of empty cell test
Next ' end of anyUser loop
End If ' end of r >= test
Next ' move on to next worksheet
rs.Close
Set rs = Nothing
cn.Close
Set cn = Nothing
Set userListRange = Nothing
Set WS = Nothing
End Sub
 
Back
Top