Interested in thoughts on keeping the integrity of historical data

  • Thread starter Thread starter hollyylloh
  • Start date Start date
H

hollyylloh

Just to be clear, I am not looking for an explanation of how relational
databases work in this question, I have been creating relational databases
for many years, please read closer.

I have a database that needs to store information as it was originally
entered. So, for example, normally I would enter client information (name,
address etc) in one table, and specific transaction information in another.
Reports would print out with the appropriate information for now. In the
future, let's say the clients address changes. When I go in and change the
clients address, I am actually changing it for historical purposed as well,
unintentionally of course, and this is usually acceptable. If for example I
need to keep the historical transaction intact as it was originally entered
(with the now old address), I would need to inactivate the old client
information and create basically a new client to store the new address. This
of course is difficult to get the user to do, they will just change the old
address to the new address.

One way to go about this is to force the user to create a new client by not
allowing edits or creating a routine that aids the user in the process.

Another way to go about this is to create what is really a flat file for all
the information that needs to be historically accurate. The client table thus
becomes more of an extended drop down menu for entering multiple values into
the main historical table. I really think this is the better way to go about
this. What do you think?

I am interested in additional thoughts on this, thank you in advance.
 
We've needed to keep "historical records" (not of addresses, but let's use
that as an example)...

Given what you've described, we handled it by creating a person table, and
address table, and a person-at-address table.

That way, when a person took a new address, we didn't need to create a new
person, just the new address and a new person-at-address record.

Regards

Jeff Boyce
Microsoft Access MVP

--
Disclaimer: This author may have received products and services mentioned
in this post. Mention and/or description of a product or service herein
does not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.
 
If a client has multiple addresses over time, then there is a one-to-many
relationship between client and client address. When a one-to-many
relationship exists, you need a table on the many side. So you need:
TblClient
ClientID
ClientName
<other client fields that don't change>

TblClientAddress
ClientAddressID
ClientID
NewAddressStartingDate
<address fields>

When you need to retrieve a client's address, you find the current address
by looking for the address associated with the max NewAddressStartingDate.

Steve
(e-mail address removed)
 
Thank you, yes that makes sense. The address was just an example, but I think
with some close planning that is what I need to do. To continue the example,
there would be many possible addresses for the client and they would change
back and forth between each as needed, and the user would not be able to
delete an address.
 
In the example that you gave, you used a transaction as an example of a case
where you needed the historical data (e.g. address) This is an example of a
common areas where this is needed. If this is the main need, you might want
to start databasing the entire transaction events (e.g. invoices, e.g.
including the at-the-time billing address) as entities. (vs. treating only
certain items in them as entities, and "deriving" the invoice each time that
it is printed.)
 
A way of maintaining an archive of old data might be to create a table that
has the same structure as your data table, with extra fields for timestamp
and userID.
Whenever the data in the table changes, you trigger a BeforeUpdate event
which saves the old data to the archive table, and then makes the change.
The user does not see this happen, but you have a complete archive of all
orevious changes to the table.

Stephen
 
Fred, thank you for your thoughts. Just to be clear, I think you are agreeing
with my first inclination as stated above? In saying "databasing the entire
transaction event" do you mean: Store all the data, that has historical
significance, in a single table? Thanks again.
 
Stephen,

Thank you, that is an interesting idea. I will keep that as a possibility as
I make the decision on this.
 
Storing all the data, that has historical significance, in a single table is
a very bad idea! In a single table design, you would need to type in the
address for each transaction. What happens if a user mistypes a part of an
address that was previously used by your client. Now you have two addesses
for the client when actually it should be one - the client used the same
address both times. The correct design is a TblClient and a
TblClientAddress. You just need to select the correct client address at each
transaction. If the client has a new address at the time of the transaction,
you need to first add the new address to TblClientAddress and then select
the new address as you enter the transaction.

Steve
(e-mail address removed)
 
Hello hollyylloh,

I guess I would say it this way. You really haven't told us what your exact
needs are for storing historical data. If your answer is "everything", such
is a big task, and probalby broader than needed.

In my response I was trying to "read between the lines" and guess at the
answer that you didn't give us. You described a very common need/ problem
which is that most Access db's that handle "invoicing" don't treat invoices
as entities to be stored. They treat it as something that is "derived" from
other data (only) at the time of printing. This cause the type of problems
that you describe. You want to look at an old invoice as sent, and
sometimes it no longer possible to do so. Because some of the data has
changed.

If that is the issue, and if it is enough of a problem to be worth some
extra complexity to solve, that would be to consider an invoice to be an
item(entity) that is created and stored (separately from the data that it
was derived from) at the time of invoicing.
 
hollyyloh

I use the approach that Stephen suggested and it works pretty well.

I have a public variable for Name1 (line1), Name2 (line2), Address1,
Address2, City, St, Zip, and an Name Change Flag.

In the On Current event, I set the Name Change Flag to false and save the
orginal address values in the above variables. Then in each field's After
Update event, I check to see if he address had change. Thinking about it,
doing it in the form's Before Update event would probably be better only
because you have all of the logic in one place rather than spread out over
the entire form. In any case, if any of the fields change, I set the Name
Change Flag to true.

In the form's Before Update, I check the the Name Change Flag. If it is
true, then I write out the Address history record.

The only issue with this is all of your reports will point to the current
address record (assuming you are storing just the customer number). Will
this be an issue?

Dennis
 
Thank you Fred I think you understand my question.

Let's say I am storing invoices that have historical significance and need
to be stored as they were originally derived and printed. Yes, I understand
how relational databases pull the information from the various source tables
at the time of printing and derive the printed invoice. And thus, if I change
the address, all invoices show the new address even on old invoices. This is
precisely the question: What is the best way to ensure the original invoice
transaction is stored as it was originally derived.

Are you suggesting that I create a separate table, for storage purposes,
that holds all the derived data from the various tables in a single table? Or
did you have something else in mind?

The ultimate goal is to retain the historical data while maintaining a very
simple user interface that hides any and all complexity of the design.
 
Dennis please see my coments below:

Dennis said:
hollyyloh

I use the approach that Stephen suggested and it works pretty well.

I have a public variable for Name1 (line1), Name2 (line2), Address1,
Address2, City, St, Zip, and an Name Change Flag.

In the On Current event, I set the Name Change Flag to false and save the
orginal address values in the above variables. Then in each field's After
Update event, I check to see if he address had change. Thinking about it,
doing it in the form's Before Update event would probably be better only
because you have all of the logic in one place rather than spread out over
the entire form. In any case, if any of the fields change, I set the Name
Change Flag to true.

In the form's Before Update, I check the the Name Change Flag. If it is
true, then I write out the Address history record.


***What do you mean by "I write our the Address history record?"
The only issue with this is all of your reports will point to the current
address record (assuming you are storing just the customer number). Will
this be an issue?

***If I understand what you are saying, yes this is an issue, and is the
actual problem I am trying to address.
 
Steve,

I am not suggesting that I would use a single table design. I am talking
about using a table to store the data derived from a multi-table design. The
reports would then pull from the storage table.

I understand what you are talking about with Client and ClientAddress
tables, and I understand this is the normal and correct way to design
relational databases.

The reason I am considering stepping away from the normal design scenario is
1) to insure historical accuracy and 2) in a effort to make the user
interface very simple.
 
hollyylloh,

I just re-read your initial comment of "When I go in and change the
clients address, I am actually changing it for historical purposed as well,
unintentionally of course, and this is usually acceptable."

Opps, I forgot that when I responded. Ok, if I understand your
requirements, you need to:

1. Maintain a history of customer name and addresses. Each time a change
is made, you need to keep both the new "current" and the old "current" or
historical record. If you were to change a customer's address 5 time, you
would have 1 current record and 5 historical records.

2. Each transaction needs to be forever linked to the customer address
record that was "current" at the time the transaction was created. In this
way, we can alway retrieve the customer address that was in effect at the
time the transaction.

3. Most report will use the original address record when they print name
and address. Some reports / forms will use the most current (for example,
invoices, statements, etc.).


Is this a valid statement of your requirements?

If so, you will need multiple records per customer. The customer will have
two different keys.

The first key would be the unique address key for each record. The second
key would be account number and would have duplicate entries. You would also
have an CurrRcd flag. The CurRcd flag would be set to “Y†on the most
current record and “N†on the old records.

When the user changes the customer’s address, you will:
1. Set the CurrRcd Flag to “Nâ€.
2. Update the current record
3. Copy the current record to a new record.
4. Set the new record’s CurrRcd flag to “Yâ€
5. Write out the new current record.
6. Make the new “current rcd†the current rcd.

Or something like that.

On you transaction records, you will need to put both the unique address key
and account number key on each transaction record. You will also need to
have to query tables – qryCustAddrTbl and qryCustomerTbl. Both of these
tables would point at the tblCustomer table. The qryCustAddrTbl will
contains ALL customer address records. You would use the Address Key to
access this file. The qryCustomerTbl would only show (via WHERE) those
records where the CurrRcd Flag = “Y†The query would only select those
records CurrRcdFlag = “Yâ€. You would use the customer account number field
as they key in this file.

The queries would permit all of the data to reside in one physical table,
which makes queries easy.

I’m tired and I’m not sure I’m making sense. When you read this and if does
not make sense, please post the questions and I will try to answer them or
rephrase my answer.

Basically, if you are going to want to access two potentially different
records (orig cust addr and curr cust addr) you are going to need multiple
master records. The individual transaction rcd will point to the original
address rcd (orig addr key) that was inforce at the time of its creation and
at the current customer address rcd.

Dennis
 
Hollyylloh,

Sorry it took me a couple of days to get back to you and I forgot your main
issue of maintaining the old data.

Couple of questions / assumptions:

a. If a name or address field is changed, you will ALWAYS create a
historical address record.


Ok, how about this solution:

Database changes:

1. Add a Client Address or History table. This will be keyed by the
ClientAddressId.
- On the Client Address table, include the ClientId number.
- On the Client Address table, include a field call ActiveFlag. This
field will have a value of either “Y†or “Nâ€.

2. In all tables where this is a customer number, I would also add a
CustAddressId field. In those tables, you would set the ClientAddressId
field equal to the customer’s current CliendAddressId key.

As Steve suggested, you could use a date field to match back to the customer
address record that was in affect at the time, but having a direct key to the
history record is so much simpler.

New Query Table qrytblCurrClientAddr:
Create a new query that looks at the current Client Address table. You will
use all fields and select ActiveFlag = “Yâ€. This query will show only active
client address records.

You could join the Client Table to the qrytblCurrClientAddr by ClientID.
This will give you a complete Current Client Table using the customer number.

You could then create another query, called qrytblClientAddr, that joins the
Client Address table to the Client Table by ClientID. The primary key to
this record would be the ClientAddressID. This would provide a complete
customer record for each ClientAddrID value.


Description:

When the user changes an address, your software will:

1. Write out the current customer address record and set its ActiveFlag
field to “Nâ€. Keep the old address information on this record.

2, Write a new customer address record with a new ClientAddrId and the new
address information. The ActiveFlag on this record will be set to “Yâ€.

As you write out transaction record, you will need to post both the ClientID
and ClientAddrID to the transaction record.

Any report that should utilize the original address will need to be modified
to use the qrytblClientAddr table.

Is that acceptable?

Dennis
 
As a preface, the current behavior isn't inherent to relational databases.
It is inherent to the particular database design that every Access template
or forum answer I've seen uses. The alternative is a design which treats
and stores invoices as entities / events rather than a derived "view",

An approach that fully administers and automatically stores invoices as
entities/events would be quite complicated. But I think it would be less
complicated to make 2 tables and 2 append queries (manually launched from one
button) to store it:

In you base work, create and populate an invoice number field.

Make an "Invoice header info" table with all one-per-invoice type info.
Make an append query to load such data into it for the invoice-at-hand.

Make an "invoice details" table for line item type info. Include the
invoice number field. Make an append query to append all of the line items
for the invoice-at-hand.



An lower-tech solution would be to pdf and store the invoice.

An even lower tech solution would be to print and file/store the invoices.

Sincerley,

Fred


One low tech way would be to print and file the invoices.
 
Back
Top