Restated: "Fields are expensive, records are cheap"

  • Thread starter Thread starter Dennis
  • Start date Start date
D

Dennis

Hi,

First let me apolozie for the empty question below. I hit the Post button
my mistake and it posted a blank question. Sorry.

Hi,

I am restating my question because based upon the responses I receive I
obviously stated my question incorrectly. So please let me try again.
Hopefully I will be a bit more sucessful this time around.


In a previous question, I stated that I added 30 fields the membership table
(one of many in the system). The initial implementation was SO successful,
that the user requested quite a few enhancements resulting in the in a HUGE
increase in the scope of the original project. These changes resulted in
the addition of 30 new fields to the membership table.

The last time people responded to my question, they were concerned about the
whether or not these were normalized fields. Please let me state that the
row has been normalized to 3rd normal form and these fields are NOT
re-occurring fields.

One MVP who responded to my original question stated "Fields are expensive,
records are cheap".

Maybe I am misinterpreting his comment. But to me this means that it is
better to have smaller row in multiple linked tables than it is to have one
large row that contains all of the normalized data. (IE – it is better to
split a large normalize row in a single table into multiple rows in multiple
tables). I hope I explained that the right way.


My question pertains to the difference in the disk access time it takes to
retrieve one record from the disk over a network versus retrieving multiple
records versus the overhead involved in Access parsing a lot of fields out of
a large row or parsing the same number of fields from multiple smaller rows.


I've always been taught the exact opposite - that "Fields are cheap, records
are expensive" since going to disk is so SLOW versus accessing data in
memory.

Is there something different about Access where the statement "Fields are
expense, records are cheap" is true? I'm using Access on local machine where
the front and backs end reside on the same machine as well as having multiple
front ends on each client's machine tied into the back end which resides on a
file server over a cat 5 hardwired Ethernet network.

My question is strictly concerning the data access time of multiple row over
the network versus Access’ overhead of extracting data from multiple small
rows versus one large row. And we are assuming a 3rd normal form database
design.

And it may well be that I am totally misinterpreting the “Fields are
expensive, records are cheap†comment.

Thank you for your comments.



Dennis
 
The mantra in the subject line is purely a rule-of-thumb for beginning
database designers. It has no bearing on your question about data access.

Your question revolves around how the data is physically stored on the disk.
You're saying that there is a delay to move the disk heads to different
physical locations to retrieve records, and that delay represents degraded
performance.

While there's some truth in that simple view, the reality is much more
complex than that. Access uses pointers to store the data. If you define a
Text field as 255 characters but enter 2 characters in it, it does not use
up 510 bytes (unicode) of disk space to store those 2 characters.
Internally, the database keeps track of where each field starts in each
record. Some fields (memo, OLE, attachments, MVFs) are not actually stored
in-line with the other data in the record either. So it's not a matter of a
single record being faster to retrieve. There are many other factors at work
here, including whether the database has been compacted recently, whether
you are using page- or record-locking, whether the disk is badly fragmented,
how intelligent are the various layers of caching (physical on the drive, or
the controller, or the operating system, or in Access), the Rushmore
algorithms in JET, and so on. Then we may have to start another mindset to
handle SSD storage issues as well.

In the end, you don't really need to consider these as factors when
designing JET tables. It makes no sense to break a 60-field table down into
one main table with several one-to-one relationships just for performance
reasons. The management of those relationships alone represents unnecessary
complexity for the database and for developer. You may be forced to go that
route if you are subclassing, or if your record is too wide to fit into the
4K page size; otherwise don't even consider it (which is probably what you
were thinking in posting.)

Having said that, having 60 fields in one table is unusual. The database I
happen to be working on right now has 93 tables, and none of them has more
than 32 fields (including 6 for tracking when and by whom the record was
created, most recently edited, and deactivated.)
 
The last time people responded to my question, they were concerned about the
whether or not these were normalized fields. Please let me state that the
row has been normalized to 3rd normal form and these fields are NOT
re-occurring fields.

One MVP who responded to my original question stated "Fields are expensive,
records are cheap".

That was me, and that (non-normal structure) was my concern. It is impossible
to tell from a brief post how knowledgable the poster might be about
relational design. As you well know, wide-flat, spreadsheetish, non-normalized
designs are a very common trap for beginners.

You're obviously not a beginner (now that I know more about your background!)
so my reply was out of line. My only excuse is that at the time I did NOT know
your level of skill. I apologize for jumping the gun.

What I consider "expensive" is that an improperly normalized table structure
will require far more work in getarounds, contorted queries, form and report
maintenance, and code than would a properly normalized design. Given that you
have (it seems) valid Entities with sixty or more discrete, atomic,
non-interdependent Attributes, I'll just express my mixed admiration and
sympathy and bow out.
 
Allen,

Thank you very much for the answering my question.

I understand the issue of disk access quite complex given all those issues
and SCSI versus SATA, non-raid vs different level of raids, speed of the
controller, network switches, routers, and all of the other issues that you
brought up just causes my head to hurt when I even try to think about them.
So you are correct, I did take a VERY simplistic approach in my question.

I did not realize that database actually kept track of where each field
starts and stop. The other variable length record databases I have worked
with just buried a start and stop marker in the record. So while the
"presentation" layer of the database that knew where each field started and
ended, the actual engine had no idea.

I will have to go back and check if it would be possible for all of the
fields to be filled out and possible exceeds the 4k record / row limit. I
did not know about the 4 k limit. Thank you for bringing that up!!!!! You
may have saved me again.

You are correct when you say I was thinking about breaking my 60 field
record into multiple tables. I interpreted the comment from the MVP as
suggesting I should break up 60 field record up into multiple tables. When
an MVP takes time to comment on something I doing, I do try to follow their
advice. It has not yet led me astray.

I have since been informed that the comment "Fields are expensive...†refers
to the dollar cost of added the field to the forms, reports, and the future
maintenance cost associated with that field. But that is another story.

My background is with large commercial insurance companies. All of the
companies have had relational databases designed to the 3rd normal form.
Between the policy information, reinsurance tracking, state, federal, rate
development information we have to maintain for each policy, the typical
policy master table had well over 200 fields in it. The policy master table
just had fields that were common to all policies! Any data that applied to a
specified line (auto, homeowners, etc) is stored in line specific master
policy field. Our coverage tables have over 100 fields. Our Claim master
table had over 100 fields in it.

So for me, 60 fields are actually pretty small. However, I will go back and
re-examine my design. There are things that I could put in a separate table,
but they really are properties of the member.

Thank you for responding and supplying all that wonderful information. It
gives me a better understanding of how Access works internally.


Thanks once again,

Dennis
 
John,

I know it was you, but I did not feel it was appropriate for me to use your
name in my questions, hence “an MVP commentedâ€.

John, no offence was taken by your comment. I did go back and re-read my
original posting and realized that I did not provide anywhere near enough
information for you to think of anything else. That was entirely my fault
for not being more detailed. I’m still having a problem drawing the line
between not enough detail and too much detail. Sorry about that. I agree
that when you read a brief posting, it is very difficult to tell what a
person’s background is. In trying to payback you MVP’s for answering my
questions, I have been trying to answer forum questions that I have
previously asked and received answers. I know it is impossible to tell from
a brief posting how knowledgeable a poster might be. Fully agree there.

I also agree with your comment about wide-flat spreadsheetish non-normalized
designs are. I responded to someone tonight about the training database.
They had quite a few issues with the tables. I fully understand the issue.

I had NO problem what so ever with your response and you have not reason at
all to apologize. Personally, I feel bad that you feel the way you do. I
never took offence at your comments.

While I have a lot of db experience, I am still a newbie when it comes to
Access. I thought that Access was like other database engines. While that
is somewhat true, it is mostly a false statement because the VBA code and
events work so differently from anything I’ve used. I’m still beating my
head against that learning cliff.

I interpreted your comment as implying that it was better to have a couple
of small master tables that one large master table. Since it was coming from
an MVP, I figured that I had better ask some more questions. Since I did not
know how to ask you directly, I just put it on the forum.

I agree with comment about a field being expensive in an improperly
normalized table structure resulting in all sorts of workarounds. 10 of the
30 new fields had to do with funeral information for a deceased member.
Since all of the information is specific to the individual member, I included
on the member table. You might be able to argue that it should go in it own
table, but for data entry and reporting simplicity I included it my member
table.

Actually, your comment did help me because I was reminded about Access 4k
record size limitation. I’m not sure if a completely filled out record will
excess 4k or not, I will have to check. So that alone will prevent a problem
from occurring.

If the potential record size will exceed 4k, I will break on the funeral
information into a separate table as that information is quite lengthy.

I do have one question for you about your “non-interdependent†field. What
do you mean by that? To me, a policy effective date and expiration date are
interdependent because the effective date has to be before the expiration
date. Also City, St, and zip seem to be interdependent. So, what do you
mean by non-interdependent?

Please, never bow out of one my questions. I have learned so much from you
and the other MVP that I would be on the loosing end of that deal.

Thank you for all of your assistance. I have no way to express my
appreciation except to say if you are ever in the Orlando, Fl area I would be
glad to give you and airboat ride in the whiles of the Florida swamps. I’ll
show you the real Florida.

Thanks again for all of your help.

Dennis
 
While there's some truth in that simple view, the reality is much
more complex than that. Access uses pointers to store the data. If
you define a Text field as 255 characters but enter 2 characters
in it, it does not use up 510 bytes (unicode) of disk space to
store those 2 characters. Internally, the database keeps track of
where each field starts in each record. Some fields (memo, OLE,
attachments, MVFs) are not actually stored in-line with the other
data in the record either. So it's not a matter of a single record
being faster to retrieve. There are many other factors at work
here, including whether the database has been compacted recently,
whether you are using page- or record-locking, whether the disk is
badly fragmented, how intelligent are the various layers of
caching (physical on the drive, or the controller, or the
operating system, or in Access), the Rushmore algorithms in JET,
and so on. Then we may have to start another mindset to handle SSD
storage issues as well.

And none of this considers the issue of disk caching, such that
there's not that much difference between data loaded into RAM and
data that is being read from/written to disk, since the latter
usually takes place through the caching mechanism, and is not going
to be limited by the speed of the actual storage medium, but by RAM.
This has been the case in Windows since at least c. 1991-2, when
Windows 3.1 was released with disk caching turned on by default. It
was essential for decent performance in Windows, but it also meant
that your databases were going to be speeded up because of the disk
cache, too (although back then it was largely a read-only
improvement, as lazy writes and such had not been implemented in the
DOS disk cache; any modern version of Windows, though, caches both
reads and writes).
 
My background is with large commercial insurance companies. All
of the companies have had relational databases designed to the 3rd
normal form.

The determination of whether or not those tables were actually
normalized depends on the chosen definition of the entity being
modelled in the table. I would consder the 200-field table you
mention later to be unquestionably denormalized, even though I know
nothing at all about the content/function of those fields. That
sounds like a table that has a bunch of fields that are used only
for a single record type, so that an auto insurance policy has one
set of fields, but a renter's insurance policy has a different set
of fields. Any time you're using some fields for some records and
not for others, it's an indication to me that the entity has been
misdefined, and should probably be broken into at least two tables,
with a narrow header table and a long child table, where each row
stores what was formerly a field in the wide table.

All that said, my conclusion could be wrong for any particular
application. But "fields are expensive, rows are cheap" is a
generalized rule of thumb, not a hard-and-fast law of nature. It
allows for exceptions for certain purposes, but is a starting point
for evaluating a schema design.
 
I agree with comment about a field being expensive in an
improperly normalized table structure resulting in all sorts of
workarounds. 10 of the 30 new fields had to do with funeral
information for a deceased member. Since all of the information
is specific to the individual member, I included on the member
table. You might be able to argue that it should go in it own
table, but for data entry and reporting simplicity I included it
my member table.

That sounds like the type of data I'd put in a separate 1:1 table,
as it only applies once a particular threshold has been crossed. A
record in that table also means the person is deceased (though you
may not have the information and might still need to store a date of
death in the main record).

I wouldn't call that denormalized, but I have always found a certain
utility in using the 1:1 table for things that apply only after the
entity has reached a certain milestone.

However, I would likely avoid having multiple 1:1 records, though,
as it then becomes complicated to get a single editable row, unless
the 1:1 records are related in a logical chain that is modelled in
the database in that way.
 
David,

I can see where disk caching would help in a sequential process, but does
disk caching really help in a randomly accessed database during data entry?

The first data entry might access the first record, the next the 1,000th
record, then next on the 5,000th record, and so on and so on. So, unless the
entire table is cached, does it really help?

Dennis
 
David,


Hmmm, I see your point and kind of agree with it. My back ground in on
large and midrnage computers where it is nothing to have a 200 field, 30K
record.

However, I realize that Access is a different beast and I'm having to learn
to adjust for it restrictions. Thanks for the insight. Just more to think
about. But then I learn something new also.

Thanks,

Dennis
 
David,

Your Comment: The determination of whether or not those tables were
actually normalized depends on the chosen definition of the entity being
modelled in the table. I would consider the 200-field table you mention later
to be unquestionably denormalized, even though I know nothing at all about
the content/function of those fields.

Response: This is a very common statement for people who do not know the P
& C Insurance industry. The 100 to 200 field master record (depending upon
the vendor) contained ONLY the common and non-recurring policy information
between the different lines of businesses (type of policies). The insured’s
name and address information is not even included in these field as that
information is in another table.

Actually, when you examine the data at the “policy†level for different
lines ( Personal Auto, Homeowners, Renters, Work Comp, Commercial Auto,
General Liability) you would be surprised to find it is actually quite
similar. It is not until you get to the “coverage†and insured “objectâ€
level that the different lines vary dramatically.

There is a lot of policy information that is never seen by the public. Just
off the top of my head some common policy information is reinsurance
information, cancelation and reinstatement status (the cancellation &
reinstatement history is maintained in a separate table), statistical report
to ISO and each individual state, premium accounting (not Acct Rec.) with
written, earned, unearned, and inforce premiums, renewal offers tracking,
voluntary audit information, physical audit tracking. You could break all
this information into it their own tables, but why? So much of the above
information is inter-related (like effective and expiration date) that
breaking in into separate tables just slows down data access and increase
complexity which just increas

Once of the client I worked was a state owned insurer of last resort. If
you could not find insurance anywhere else, you could purchase it from this
company. The company was less then a year old when I started working there.
There were located in a city where there were a lot of banking expertise but
very few little insurance expertise. Their staff had all sort of banking
experience, but no insurance experienced people. The first thing I did was
sit down the vendor and go over their system. I learned that system in less
than a week. It was simple to learn not because I that brilliant, but
because there are only so many ways you can build an insurance system. And if
you understand insurance, you can easily understand a new system. (Talk to
an auto mechanic – it does not take them long to learn a new “normal
passenger†car [I’m excluding the exotic engines] because there are only so
many ways to build a “normal passenger carâ€.) The vendor commented that they
were glad I was there because they had been trying to teach the banking
people and AS/400 people for about a year and no one really understood the
system. Again, that is not because the company did not have smart people or
people that lacked experience. It was because their people lacked insurance
inexperience. I had to give the insurance company’s CFO the reinsurance
premium accounting journal entries for their financial books. This was not
because she was dumb (as a matter of fact she was quite brilliant), it was
because she did not have insurance accounting experience, which is a quite a
bit different from normal accounting entries.

But I went through all that just to say the head of the company’s IT
department thought the same thing you do (he also came from a banking
background). So, he hired some database design consultants / experts to
review the database’s design, who again did not understand insurance. (Had
the understood insurance, they could have taken a quick look and realize the
database was in pretty decent shape.) Instead, they gather all of the data,
all of the relationship, all of the interdependencies, and did there thing.
Guess what, they came up with some minor suggestions but no major changes,
which is what I told the CIO before he started this effort. But oh well.
There is where the experience comes in.

Also, as I stated in the other discussion on this subject (which I surprised
you missed as you are commenting in that discussion also), I’ve worked on 12
different vendor’s insurance systems over the years. Those system have been
written DECADES apart with totally different technology and totally different
development groups. At one extreme we have the flat file system running on a
mainframe and at the other end we have a Window’s base object oriented client
server system using an SQL database. And they have all had a 100 to 200 field
policy master table. (The more capable systems had the larger number of
fields). It is interesting that you would disagree with all those people
with all that experience. But whatever.


Your comments: That sounds like a table that has a bunch of fields that are
used only for a single record type, so that an auto insurance policy has one
set of fields, but a renter's insurance policy has a different set of fields.

Response: Well, it may sound like that but again this is the common
statement for a newbie in the P & C Insurance field.

Then normal way I’ve seen the policy master designed is to a common policy
master fields where all common fields (all 100 to 200 depending upon the
system) are stored in a single table. Then for each line of business (such
as auto or renters), you have a line of business policy master file that
contains those fields specific to that line of business. This table is an
extension of the common policy master table. In a good design, you simply
don’t store line specific line fields in the policy master table, you store
them in the line specific policy mater files. One of the reasons the policy
master record is so big is there is a whole lot of “behind the scenes†data
that is being stored that the policy holder never sees. (See above).

At the coverage and insured object level, the story is totally different.
While structure of the coverage tables mirrors the policy master and line
specific policy master, the coverage master table is actually quite narrow.
That is because there is not a whole lot of common information (other than
effective and expiration dates, policy accounting [not Acct / Rec. info],
statistical accounting, coverage limits, and reinsurance) at the coverage
level. Most of the coverage information is stored in different line specific
coverage and insured objects tables (two or more tables). These tables are
extension of the coverage master table and children of the line specific
policy master tables.

The homeowner coverage is actually comprised of multiple coverage tables
because a homeowner policy can cover multiple lines of business. For
example, home owner policy can coverage fire and property damage (1 line of
business), general liability (another line of business), theft (another line
of business), work comp for house hold help (another line of business).
These were just the lines of businesses that I could think of off the time of
my head. A full implementation of a homeowner policy is extremely involved
and very complicated.

But, back to your example. Your statement is incorrect. The personal auto
policy master, coverage and insured object tables contain the auto specific
coverage information, while the renter’s policy, coverage and insured object
tables contain the renter’s specific coverage information. The common
information for both the auto and renter’s policy is stored in the policy
master table.

Your comment: Any time you're using some fields for some records and not
for others, it's an indication to me that the entity has been misdefined, and
should probably be broken into at least two tables, with a narrow header
table and a long child table, where each row stores what was formerly a field
in the wide table.

Response: You are preaching to the choir here! I totally agree.

However, we are going to have to disagree on the “narrow header tableâ€
issue. The header table is as long as the data model / structure requires it
to be. If is it short, it short. If it is long, then it is long.

Your comment: All that said, my conclusion could be wrong for any
particular application.

Response: I agree with this point.

Your comment: But "fields are expensive, rows are cheap" is a generalized
rule of thumb, not a hard-and-fast law of nature. It allows for exceptions
for certain purposes, but is a starting point for evaluating a schema design.

Response: I now understand John’s logic behind “Fields are expensive, rows
are cheap†and, given the context, I fully agree with it.


Dennis
 
I can see where disk caching would help in a sequential process,
but does disk caching really help in a randomly accessed database
during data entry?

Yes, because every modern database use b-tree traversal of indexes
to locate records.

Next question?
 
Hmmm, I see your point and kind of agree with it. My back ground
in on large and midrnage computers where it is nothing to have a
200 field, 30K record.

However, I realize that Access is a different beast and I'm having
to learn to adjust for it restrictions. Thanks for the insight.
Just more to think about. But then I learn something new also.

Schema design should be as independent of the database engine as
possible, so Access is *not* different in any way, shape or form. I
would recommend *as a starting point* the same normalized design for
any database engine.

We are at least 15-20 years past the days when the developer of a
desktop database app needed to worry about the physical storage of
data. It's only in edge cases where any modern developer is going to
start considering it in the design of the database schema.
 
David,


Your comment: Schema design should be as independent of the database engine
as possible, so Access is *not* different in any way, shape or form. I would
recommend *as a starting point* the same normalized design for any database
engine.

Response: Your comment is self contradictory. Instead of saying “Schema
design should be independent of the database engine†you stated “Schema
design should be as independent of the database engine as possible.†The “as
possible†statement by definition states that thing will be different between
database engines. Which, throws you whole argument out the door.

Yes, Access is different from DB/400, and Oracle, and D3. It is very
similar, but it is not the same. From what I’ve read Access had not
implemented the entire SQL language. Also, Access does not support blobs
very well where Oracle does. From what I’ve read, it is highly recommended
that we not store blobs in Access database. Rather, we store the path and
file name to the blob and let DOS/Windows store the binary file in the
specified directory. From what I’ve read Oracle has no problems storing
blobs in their records.. I don’t know if DB/400 stores blobs. I know D3
does not store blobs, but jBase might. I know D3 support multi-valued list
and I think Oracle does also, which are very useful in exploding part on a
fully assembled unit (ie car door). Access does not support multi-valued
list. So much for “Access *not* different in any way, shape, or formâ€.

Even as a newbie, I knew that statement was false. Even a prima facia
examination of that statement indicates it is false. Are you saying Access
is not different in any way, shape of from DB/400, Oracle, My SQL or SQL
server? I give you one difference. The maximum record size on the
different databases are different. The maximum table and database sizes are
different. Those two differences alone negate your statement.

I also know that DB/.400 does things differently that Access because I tried
some DB/400 approaches on Access and found that that approach did not work on
Access. Given that DB/400 does not have a 4k record size limit as Access
does, that along could possibly cause a different schema. Access would have
multiple tables where DB/400 would have one.

I also remember that Access has a relatively small (from a commercial
insurance policy standpoint) maximum table / database size. DB/400 and
Oracle don’t have that same limitation. To me, this would definitely
influence the schema design. Assuming I had one table (which you would in
commercial policies) that exceeded the size limitation of Access, I would
have to design around that limitation. In DB/400 and Oracle, I would not
have to.

Each database engine has different capabilities, enhancements, different
levels of SQL implementation, and limitations than the next. What it appears
that you are saying is we should design our schema to the lowest common
denominator and ignore any additional capability offered by the particular
database. As soon as you move away from this position, you then have to
design different schema for different engines. Granted, those changes might
be slight. But as soon you design something different for the different
engines, you violated your statement that Access is not different.

I will agree that the general schema should be very similar for the
different database engines, they will not be the same. The design of a Pick
– D3, DB/400, Oracle, and Access schema would be different mainly because of
the different capabilities of the different database engine.



Your comment: We are at least 15-20 years past the days when the developer
of a desktop database app needed to worry about the physical storage of data.
It's only in edge cases where any modern developer is going to start
considering it in the design of the database schema.

Response: You are absolutely correct. That is why I was wondering about
John’s comment. I thought that he was implying something about Access’s
speed. It turns out I simply misunderstood his statement.

Dennis
 
David,

I stated "The first data entry might access the first record, the next the
1,000th
record, then next on the 5,000th record, and so on and so on. So, unless the
entire table is cached, does it really help? "

Your comment: Yes, because every modern database use b-tree traversal of
indexes to locate records?


My response: So what if the modern database uses b-tree traversal of
indexes to locate the records. What does that have to do with physically
reading the record from the disk on a server?

I can see where the speed of determining the location of the particular
record would be assisted by this, but knowing where the record is and getting
the record are two totally different things. Once the disk address / page /
whatever is deteremine the db engine still has to go to disk to get the
record unless the entire table is in cach or in memeory on the local machine.

So once again, how does all this caching and b-tree traversal speed up the
physical reading of a record that is not in memory. the database engine still
has to go to disk or worse yet - over the network to get the desired record.

If I've got it wrong (which I may well have), please explain where I missing
the poing.

Thanks,

Dennis
 
Also, as I stated in the other discussion on this subject (which I
surprised you missed as you are commenting in that discussion
also), Iƒ Tve worked on 12 different vendorƒ Ts insurance systems
over the years. Those system have been written DECADES apart with
totally different technology and totally different development
groups

I have looked at the data structures that have become common
practice for a lot of different application types and have
repeatedly seen bad designs become the norm for those creating those
types of applications.

That many people have used the same data structure doesn't make it a
good one.

Likewise, lots of applications have to support legacy applications
that can't easily handle more normalized relational structures and
thus a lot of structures that were required 15-20 years ago by the
available technology are still in use. This is quite understandable,
of course, but if you're developing a new application using modern
technology, there's no reason to maintain the old data structures
unless they are really the best model of the entities involved.

I see this kind of thing all the time. A 200-field table is just not
a good model and for Access is way too close to the 255-field limit
for me to be comfortable with it.

While it's theoretically possible that the structure is not
denormalized, I think it's extremely unlikely, and that there are
better structures to store and manipulate the very same data.
 
So once again, how does all this caching and b-tree traversal
speed up the physical reading of a record that is not in memory.
the database engine still has to go to disk or worse yet - over
the network to get the desired record.

Well, it doesn't, but I thought your question implied that what I
was proposing was somehow going to require reading the entire table
into memory, which is patently not the case. Only as much as is
needed will be retrieved, and once retrieved that will be cached (so
that if it's needed again, there won't be another trip to
disk/server).

This is EXACTLY the way any alternative is going to work, so I just
don't see your point in disputing the question. Jet is a
random-access engine, in that it doesn't have to read the file
sequentially to get to the data, nor does it use record numbers and
offsets (as in a fixed-length record). It retrieves the minimum data
pages needed to fulfill the request, having looked up in the indexes
which are the needed the data pages. It will make the trip to disk
only once and cache the result, and go back to disk for a read only
when the record has been updated (according to the locking data,
rather than checking the data page itself).

Your question seems to me to be based on assumptions that don't take
account of how modern random-access database access works, or how
modern OS's and database engines cache and lock data.

But maybe I'm just misunderstanding your basic question.
 
Your comment: Schema design should be as independent of the
database engine as possible, so Access is *not* different in any
way, shape or form. I would recommend *as a starting point* the
same normalized design for any database engine.

Response: Your comment is self contradictory. Instead of saying
“Schema design should be independent of the database engineâ€
you stated “Schema design should be as independent of the
database engine as possible.†The “as possible†statement
by definition states that thing will be different between database
engines. Which, throws you whole argument out the door.

Only in edge cases, which you don't encounter until implementation
time.
Yes, Access is different from DB/400, and Oracle, and D3. It is
very similar, but it is not the same.

Well, 99% of the differences between Jet/ACE (Access is not a
database) and the mentioned databases is that Jet/ACE is a
file-based database engine, and the ones I recognize (D3 is unknown
to me) are server-based databases.
From what I’ve read Access had not
implemented the entire SQL language.

There is no such thing as "entire SQL language." There are only SQL
standards, e.g., SQL 89, SQL 92, etc., and each establishes a
baseline meaning for particular SQL commands. No database engine is
required to implement 100% of a SQL spec, though from reading the
fanboy arguments about which db engine is better you'd think that
was the only thing that mattered.

Jet/ACE being a relatively old SQL implementation and predating the
first major SQL standard (SQL 89) is rather idiosyncratic in its SQL
implementation, and has a lot of legacy aspects (some of which would
be quite nice if they were implemented in other SQL dialects, e.g.,
DISTINCTROW). Jet/ACE SQL is also different in that it tends not to
support DDL as well as most other SQL dialects, but I would argue
that this is because Jet/ACE very early on got a very fine
high-level database interface, i.e., DAO, that was closely tied to
Jet/ACE and allowed full control of the database engine without
needing to write DDL SQL. There are still many things that you can
do in DAO that you can't do in Jet/ACE DDL, and not just things that
are Jet/ACE-specific.
Also, Access does not support blobs
very well where Oracle does.

Oh, come on. Who cares? Secondly, Oracle didn't always support BLOB
data well -- it's something that has been added into the engine as
its users needed it. Jet/ACE's BLOB support has not been as
important to its users so it doesn't perhaps support the same level
of features as Oracle's implementation.

But really, there about a million things that Oracle offers that
Jet/ACE lacks, but that's because the purposes of the two database
engines are completely different.
From what I’ve read, it is highly recommended
that we not store blobs in Access database.

Yes. And I think it's mostly advisable in other database engines,
too.
Rather, we store the path and
file name to the blob and let DOS/Windows store the binary file in
the specified directory.

I think you're actually confusing the advice regarding a specific
type of BLOB field, i.e., OLE fields, and BLOB fields in general.
OLE fields have a lot of overhead that makes them problematic, and
BLOB fields lack that overhead (because they are just a bucket in
which to store binary data). OLE fields are easy to use because of
the wrapper, but are inefficient because of it. BLOB fields are more
efficient but harder to use.

And Oracle doesn't support OLE fields at all.

This is because it has a different intended user than Jet/ACE, and
that's as it should be.
From what I’ve read Oracle has no problems storing
blobs in their records..

Nor does Jet/ACE.
I don’t know if DB/400 stores blobs. I know D3
does not store blobs, but jBase might. I know D3 support
multi-valued list and I think Oracle does also, which are very
useful in exploding part on a fully assembled unit (ie car door).
Access does not support multi-valued list. So much for “Access
*not* different in any way, shape, or formâ€.

You don't seem to be paying close attention, as multi-vield fields
were added in Access 2007 in the ACE in ACCDB format. This was done
for Sharepoint comatibility, not because multi-value fields are a
good idea -- they most categorically are *not* a good idea, in fact,
and support for them is not an indication of a good database engine.
Even as a newbie, I knew that statement was false.

You're arguing against something I never asserted. You seem to have
some sort of reading comprehension problem, as it is patently
obvious that when I said this:

Schema design should be as independent of the database engine as
possible, so Access is *not* different in any way, shape or form.

....the second phrase applies to the first. That is, in regard to the
ideal independence of schema design and database engine
capabilities, Access is not different from other databases.

It is entirely perverse to divorce the second clause from the first
and insist that I was saying there were no differences of any kind
between Access/Jet/ACE and other databases. No reasonable person
would count that as a valid or good-faith interpretation of what was
written.
Even a prima facia
examination of that statement indicates it is false. Are you
saying Access is not different in any way, shape of from DB/400,
Oracle, My SQL or SQL server?

No, I didn't say that. And you know perfectly well that I didn't,
unless you are blazingly stupid.
I give you one difference. ....

Who cares? You're off on a crazy tangent that has nothing to do with
what I wrote. I didn't make any assertion that all databases were
alike in all respects, and you know I didn't make that assertion.
Yet, took several paragraphs to provide evidence to contradict
something that was never asserted.

I deleted it. I didn't read it. I didn't need to, as it doesn't have
anything to do with what I actually assered.
Each database engine has different capabilities, enhancements,
different levels of SQL implementation, and limitations than the
next. What it appears that you are saying is we should design our
schema to the lowest common denominator and ignore any additional
capability offered by the particular database.

No, I'm not saying that at all.

I'm saying that schema design is a logical operation, not a physical
one. You model the entities according to their characteristics and
logical relationships and then you design tables that implement
those relationships and that can store the characteristics and do
all of that in a way that maximizes data integrity and consistency.

Minor details may differ (you may choose different data types in
different engines to store a particular attribute), but the logical
structure and relationships is determined not by the database
engine, but by the characteristics and structure of the real-world
entities being modelled for the purpose of the application at hand.

The last statement has a couple of corollaries:

1. the same entity may be modelled differently in different
applications because the purposes of the applications and role of
the entity in each application may be different.

2. two different structures can be logically equivalent, but in
terms of database performance one or the other may be more efficient
because of the real-world realities of the way database engines
store and retrieve data. In some cases, the particular features of a
particular database engine may make a difference as to which
structure you choose. For instance, if you are putting business
rules in the database schema, an engine that supports triggers is
likely to be a better choice, and this will have an impact on how
you implement the structure.

Too often people think of data normalization as a process of taking
a flat table and breaking it into constituent parts. But that gets
things backwards, because you're starting from the denormalized
structure and working backwards to the normalized structure. In my
opinion, you should begin with the normalized structure and only
denormalize when the application and/or database engine forces it on
you. The design of an application is often going to force you to
denormalize some things for practical reasons (usually performance),
but that doesn't mean you start from the denormalized structure
(that would be a form of premature optimization).
As soon as you
move away from this position, you then have to design different
schema for different engines.

Different in details, but usually not in terms of entity definitions
and logical structure. That is, you might have different data types
for certain fields, but the tables and their relationships will be
more or less the same. Now, I already outlined above a case where
this might not be the case, and that's the one where business rules
are incorporated into the schema design, which very often
necessitates the use of triggers. In that case, your design will
likely be very different structurally for a database that lacks
triggers as opposed to one that supports them.

Of course, modern practices deprecate the whole idea of
incorporating business rules at the engine level. The usual modern
preference is to put those in a layer between the application and
the database so that the business rules can be altered independent
of both the database structure and the application implementation.

Historically this has mostly not been possible with Access, as it's
designed around direct communcation between Access/Jet/ACE and the
database. This changes in A2010 with Sharepoint integration, because
Sharepoint segregates certain things into is own separate layers so
that things you'd normally put in the application or in the database
end up in the layer in between.
Granted, those changes might be
slight. But as soon you design something different for the
different engines, you violated your statement that Access is not
different.

This is not a statement that I made, so I really see no reason that
I should have to defend it.
 
Back
Top