Runaway use of Nulls

  • Thread starter Thread starter Paolo
  • Start date Start date
P

Paolo

Hi,

I think I'm really breaking the rules here. I asked a related question
a couple of weeks ago, and was told that it was ok, but this may tip
things over the edge.

I have a table that is filled with nulls and includes calculation
fields. The reason is the following: Field 1 is a Work Breakdown Code
that can be any length, and feature any number of periods or numbers as
text (xxx.x.xxxx or xx.xxxxx.x.xxxx.xxx, whatever). Field 2 is an
optional field that may or may not be applicable, and field 3 through
12 are the parsed remains of field 1 turned into integers(if field 1 is
13.254.12, then field 3 is 13, field 4 is 254, field 5 is 12 and the
remaining fields are null). I use fields 3 through 12 to sort my codes
as Work Breakdown Codes as text fields do not sort well. So we have:

Field 1 Field 2 Field 3 Field 4 Field 5 Field
6-Field 12
13.254.12 Optional 13 254 12 Null

As you can see Field two and Fields 3-12 are going to be populated with
nulls. I know I can create a one to one relationship with another
table to get rid of Field 2 (there are in reality more than one field
like Field 2), but is it really worth it? I know that I can create a
query that will concatenated Fields 3-12 to create an expression that
looks like Field 1 and thus get rid of the calculation, but then I
would have no primary key and Field 3-12 would still have nulls.

I do hope someone has the patience to wade through this.

Thanks in advance.

Paolo

(e-mail address removed), remove NOSPAM
 
Rather than have a lot of fields, many of which can be null, I'm wondering if
you could just do a many-to-many relationship.

tbl_Work
WorkID (PK) [autonumber]
WorkInfo

tbl_Codes
CodeID (PK) [use integers here that are like your field numbers]
CodeInfo

tbl_WorkCodes
WorkID (PK) [using both WorkID and CodeID for primary keys means you]
CodeID (PK) [won't duplicate a code for a particular work]
WorkCodeNumber [This will be the number that applies to this particular code
in this particular work]

Of course, this is just the bare bones of the tables. The advantage to this
is, that you only have an entry in WorkCodes table when there is a number for
a particular code. If you are not doing arithmetic on the WorkCodeNumber,
then you could set it as a text field, and then string all of your existing
WorkCodeNumbers, with periods in between, for your normal work breakdown
code. Then your work breakdown code would be generated as you need it.
 
Hi mnature,

Thanks for replying to my post. Your solution is very elegant, however
I have some questions:

1) If I don't have any use for CodeInfo, I can discard tbl_Codes,
right? And just keep a one to many relationship between tbl_Work and
tblWork_Codes?

2) If each WorkCodeNumber appears in its own record, how am I going to
sort on a work breakdown code? I know this is not really pertinent to
this group, but is there some simple way to do this in DAO or SQL?

3) For the work breakdown code, I suppose I would create a recordset
in DAO for the tblWorkCodes records I am looking for then go through
the recordset, retrieve each WorkCodeNumber and then concatenate? Or
is there a way to do this in SQL? Again, I know this is beyond the
scope of the group.

Thanks much,

Paolo
(e-mail address removed)
 
Paolo said:
field 3 through
12 are the parsed remains of field 1 turned into integers(if field 1 is
13.254.12, then field 3 is 13, field 4 is 254, field 5 is 12 and the
remaining fields are null). I use fields 3 through 12 to sort my codes
as Work Breakdown Codes as text fields do not sort well.

I know that I can create a
query that will concatenated Fields 3-12 to create an expression that
looks like Field 1 and thus get rid of the calculation, but then I
would have no primary key and Field 3-12 would still have nulls.

In practice, if having the elements both concatenated and separated
makes your SQL code (queries/modifications and/or constraints) easier
to write then you should consider this a valid option, even if in
theory you are storing data redundantly. The most important thing is to
ensure data integrity e.g. at the very least you would need a
constraint to ensure that your concatenated field is indeed equivalent
to your fields 3-12.

The only way I can think this could work is if, working from field3 to
field12 (left to right), as soon as a null is encountered the remaining
columns should also be null; otherwise, the following would be
indistinguishable in concatenation

(field3, field4, field5) (1, 2, NULL) -> '1.2'
(field3, field4, field5) (1, NULL, 2) -> '1.2'

To simply things a little, here's a demo with four 'element' columns
(restricted to max three digits) which are constrained
· to be the same in concatenation as the 'composite' column value;
· to ensure nulls are consistent 'left to right';
· for the 'composite' column to be unique ('primary key').

CREATE TABLE Identifiers (
element_1 INTEGER NOT NULL,
CHECK (element_1 BETWEEN 1 AND 999),
element_2 INTEGER,
CHECK (element_2 BETWEEN 1 AND 999),
element_3 INTEGER,
CHECK (element_3 BETWEEN 1 AND 999),
CHECK (NOT (element_2 IS NULL AND element_3 IS NOT NULL)),
element_4 INTEGER,
CHECK (element_4 BETWEEN 1 AND 999),
CHECK (NOT (element_3 IS NULL AND element_4 IS NOT NULL)),
composite_ID VARCHAR(15) NOT NULL UNIQUE,
CHECK (composite_ID =
CSTR(element_1)
& IIF(element_2 IS NULL, '', '.' & CSTR(element_2)
& IIF(element_3 IS NULL, '', '.' & CSTR(element_3)
& IIF(element_4 IS NULL, '', '.' & CSTR(element_4)
)))
)
);

Rather than have the onerous task of concatenating the 'composite'
each, you could define procedures to do it for you e.g.

CREATE PROCEDURE AddIdentifier (
arg_element_1 INTEGER,
arg_element_2 INTEGER = NULL,
arg_element_3 INTEGER = NULL,
arg_element_4 INTEGER = NULL
) AS
INSERT INTO Identifiers (
element_1, element_2,
element_3, element_4,
composite_ID)
SELECT DISTINCT arg_element_1 AS element_1,
arg_element_2 AS element_2,
arg_element_3 AS element_3,
arg_element_4 AS element_4,
CSTR(arg_element_1)
& IIF(arg_element_2 IS NULL, '', '.' & CSTR(arg_element_2)
& IIF(arg_element_3 IS NULL, '', '.' & CSTR(arg_element_3)
& IIF(arg_element_4 IS NULL, '', '.' & CSTR(arg_element_4)
))) AS composite_ID
FROM Identifiers;

It has not escaped my attention that you need to do things the other
way around i.e. you would want a procedure that accepts a single
'composite' which would be parsed into elements. The thing is, in
absence of a Jet REPLACE function, the code to parse the elements would
look very complex, whereas the above procedure was easy for me to write
because it uses the same concatenation as the CHECK in the create table
Field 2 is an
optional field that may or may not be applicable,

I know I can create a one to one relationship with another
table to get rid of Field 2 (there are in reality more than one field
like Field 2), but is it really worth it?

Similar to the above scenario, if putting field2 into a separate table
would make your SQL code (queries/modifications and/or constraints)
easier to write then is it is probably the right design choice.
Symptoms of field2 in a separate table being a bad design choice (a
flaw called 'attribute splitting') would include having to join to the
new table in *every* query that involves the original table. Also,
having values in different values of the same row (i.e. retaining two
columns in the same table) makes constraints easier to define e.g. from
the above example:

CHECK (NOT (element_2 IS NULL AND element_3 IS NOT NULL))

would be more complex to define if element_2 was in a different table
from element_3.

Jamie.

--
 
1) If I don't have any use for CodeInfo, I can discard tbl_Codes,
right? And just keep a one to many relationship between tbl_Work and
tblWork_Codes?

If the CodeID's are going to be specific numbers, and you don't use any
other information than those number, you could keep the CodeID's as one of
the primary keys, and just constrain it to those specific numbers. You are
right, you would not need tbl_Codes in that case.
2) If each WorkCodeNumber appears in its own record, how am I going to
sort on a work breakdown code? I know this is not really pertinent to
this group, but is there some simple way to do this in DAO or SQL?

If you are using a query which combines all of the codes into your Work
Breakdown Code, then you could create a form out of that query which sorts by
the Work Breakdown Code.
3) For the work breakdown code, I suppose I would create a recordset
in DAO for the tblWorkCodes records I am looking for then go through
the recordset, retrieve each WorkCodeNumber and then concatenate? Or
is there a way to do this in SQL? Again, I know this is beyond the
scope of the group.

If I understand your question correctly, I believe you need to find all
WorkID's that are associated with a particular Code and WorkCodeNumber. This
could be done in a simple form/subform, where the form allows you to filter
by a particular Code and a particular WorkCodeNumber, and have the subform
list all of the WorkID's (along with other info on the work).
 
Hi Again mnature,

Thanks for responding. Regarding your answer to question 2) and 3)

2) Problem is Access sorts the work breakdown codes in a funny
way...1.0.2 comes after 1.0.10. That's why I had them in separate
fields to begin with.

3) Mmm...actually the opposite, I wanted to find one WorkID given all
the WorkCodeNumbers and CodeIDs.

Thanks again
Paolo

(e-mail address removed)
 
2) Problem is Access sorts the work breakdown codes in a funny
way...1.0.2 comes after 1.0.10. That's why I had them in separate
fields to begin with.

Aye, that's why if you aren't using the numbers for arithmetic, you make
them text, and then put leading zero's in. Makes everything line up nicely
that way, too. Though you should be able to format integers to have leading
zero's, if you don't want to type them in that way.
3) Mmm...actually the opposite, I wanted to find one WorkID given all
the WorkCodeNumbers and CodeIDs.

Sorry, not quite sure how this is different from what I stated. I'm a
little slow today.
 
The problem starts with the work break down field. Relational database
theory prohibits the storing of multiple pieces of data in a single column.
The technically correct solution is a self-referencing table that joins to
itself recursively since the work breakdown is a hierarchy. However, this
can be difficult to work with so may people would flatten it out and store
10 levels in 10 fields (of course with this method, you're SOL if you need
more than 10 levels which is why the first solution is preferred). Since
all 10 might not be populated, you would need to use an autonumber as a
primary key and a 10 field unique index to enforce the business rule that
the work breakdown must be unique. You would then concatenate the fields
for display purposes.
 
Hi Pat,

Thanks for your input. However did you see mnatures solution to
avoiding the use of nulls in the 10 levels in 10 fields? Any comment.
Also, perhaps more pressing, I believe that I am breaking another rule
in that I am using a nested set solution to make the hierarchy. This
is overkill, no? As I am storing the same piece of information twice:
10 fields with 10 levels + intL and intR for nested set. I guess I
should have thought about this a little bit more. Mind you I'm just
starting out.

So I suppose one method is to use mnature's method to store my levels,
and then develop some method to 'move' through the hierarchy.

On another note, your recursion idea interests me. I am familiar with
the adjacency model which I believe is a self-referencing table that
joins to itself recursively, involving a bit of VB code if I'm correct
if the recursion is indefinite. I have an inkling on how to go about
this, but unless I use mnature's solution here as well I'm not quite
sure how I would avoid storing multiple pieces of data in a single
column or avoid nulls if I break the work breakdown code into multiple
fields.


Paolo

(e-mail address removed)
 
mnature's solution is using a relation table instead of recursion. The m-m
solution allows many parents as well as many children. I don't think that
is the situation you have. In the work break down structure you have, each
item has one and only one parent and that is the foreign key you store when
using a recursive relationship. This is a 1-m relationship. A record
without a parent is at the top of the tree. You don't have missing
intermediate levels in the work breakdown. Levels are filled from the top
to a point and from that point on, there are no lower levels.

This structure is similar to a corporate hierarchy. Each person has only
one supervisor. You use a single person table and one of the columns is
SupervisorID. This creates the recursive relationship. In this structure,
only the CEO should have a null supervisorID.
 
Hello Pat,

Thanks for the clarification. I understand what is going on now. One
final question, though, from my novice's brain: If you are going to
perform recursion, how would you go about doing it? I know that it
involves identifying the parent record from the child record. But
would you really include 1.0 being a parent to 1.0.1 in the child
record? Firstly, aren't multiple pieces of information being stored
here in one field, as you had previously mentioned? Secondly, isn't
1.0 really a calculation? That is to arrive at the parent code you
just cut off at the last period?

Thanks so much!!!

Paolo

(e-mail address removed)
 
Paolo said:
If you are going to
perform recursion, how would you go about doing it?

With the adjacency list model, as suggested by Pat Hartman (noting it
is denormalized model), you have to (self) join as many times as you
*think* may be necessary. When you consider you cannot use a cursor or
other procedural techniques in a Access/Jet SQL procedure, nested sets
is an attractive option... I assume you have the book 'Trees and
Hierarchies in SQL for Smarties' by Joe Celko
(http://www.dbazine.com/ofinterest/oi-articles/celko24)?

Jamie.

--
 
Hi Jamie,

Thanks for following this thread. Yes I am aware that Access does not
have procedural techniques and that I will have to rely on VBA to
create an adjacency model. In tems of the nested set model, I have
also, misguidedly, already written it into my code. However, both
options seem to me to be denormalized at this point, the adjacency
model doubly so, in that the positioning information is already written
into the work breakdown structure number. So why create additional
fields to identify where the node is (either a parent field in the
adjacency model or lft, rgt fields in the nested set model)?

As far as I can see, the least denormalized way of doing this, is to
just parse the work breakdown structure number into separate fields (as
suggested by Pat), leaving some fields null as I will write the code to
house at least 10 levels and to use either SQL or VBA/DAO to identify
parent/child elements (I'm a little fuzzy on how I'm going to go about
doing it at the moment -- whether I should concatenate to the work
breakdown number or search the fields themselves).

Suggestions or comments gladly welcomed,

Paolo

(e-mail address removed)
 
Once you limit the number of joins so that you can use simple queries rather
than VBA loops, you have effectively "flattened" the structure. Given that,
it is probably easier to simply store the ten fields so you don't need any
recursive code or 10-level joins. Just be aware that if you need 11 levels
in the future, you have a lot of changes to make.

If you use the self-referencing table, you are not storing the work
breakdown of the parent record, you are storing its primary key which should
be an autonumber. You need to do a 10-level join to obtain the entire work
breakdown structure because you need to gather the piece for each level and
concatenate them into a string.
 
Hi Pat,

Thanks for replying. Yes I think I am just going to store the 10
levels. It seems like the easiest and safest thing to do. Also, thank
you for clarifying how the recursion would be done, it's clear to me
now.

Seems to me I can do the same thing in a nested set. That is, store
[lft], [rgt] and [WorkCodeNumber] and gather the pieces for each level
of the [WorkCodeNumber] to concatenate into a work breakdown code. I
suppose I would have to have an autonumber here for a PK because [lft],
[rgt], are subject to constant change and [WorkCodeNumber] is not
unique.

Thanks again,

(e-mail address removed)
 
Back
Top