M
Mario T. Lanza
Greetings,
I don't know about you guys but on many occasions I've asked myself
whether or not someone else has solved a particular programming issue
-- whether or not they developed a clever pattern for doing some task
quite well. This usually leads me to a search on today's greatest
technical tool, The Internet. I indefinitely uncover many potential
code snippets, components, etc. and have to weed through them to find
the best one for my particular circumstance.
Usually, I'll grab ahold of any Microsoft proposed solution first as I
trust them more than unfamiliar, rogue developers. When building
production-level apps it's not easy to trust non-commercial code
especially when it's thousands of lines long. I mean, think about it,
most programmers search the web to find solutions that are quick and
easy to implement. If the solutions we find are difficult to
understand and require us to read through thousands of lines of code
and test them ourselves, the process becomes rather cumbersome.
This leads to "Plan B" many times. Many of us know "Plan B". We grow
weary with trying to understand the component we've downloaded. We've
followed all the instructions and sometimes it just doesn't seem to
work or it just about does everything we need it to do but not
everything. "Plan B" is, of course, when we decide that the easier
solution is indeed to build it ourselves. We trust ourselves and by
the end of the development and testing we know the code better than
anyone. If I'm going to have to spend X hours testing the unfamiliar
code, learning it's intricacies, determining whether it meets my
needs, trudging through oftentimes with frustration, then why does
having to spend the same X hours (maybe X * 125%) seem so daunting?
The answer: it doesn't.
Of course, I'm not suggesting we keep reinventing the wheel. It
certainly makes no sense. I'd rather have access to a reliable
repository of components that 1) are trustworthy, 2) are reliable, and
whose interfaces are 3) quickly and easily learned and understood. I
guess the real culprit here is documentation and presentation.
The documentation needs to be complete and outlined at a high level
(in addition to the very detailed level) so that it can quickly be
absorbed. At least I know that I only want to review about 1 or 2
pages of text in order to determine whether the component will meet my
specific needs and whether it can be painlessly accomplished. There
needs to be sample code making use of the component in many real-world
use cases. Examples are almost always the most useful.
I have to offer kudos to Microsoft in their openess. They offer all
sorts of freebies that could easily be sold commercially. One thing
that immediately comes to mind is their Application Blocks.
I recently reviewed one -- The Updater Application Block -- and it
very well documented. Granted, they could simplify it a bit more by
adding an "Updater Application Block For Dummies" section that makes
is seem almost effortless to meet any of a number of enumerated
real-world examples by offering STEP-BY-STEP EXAMPLES. I'm not saying
that I'm a dummy or that most programmers are, it's just that we
already have to spend so much time during the day in this intense
level of concentration that being able to break out of it periodically
because someone else took the extra efforts to make using their
component intuitively easy to use would be WONDERFUL.
I suppose having a standard format for component documentation and
presentation would be a step in the right direction. Futhermore, a
rating system could accompany the imposed standard to indicate how
well packaged the component was in a number of areas -- documentation,
the number pratical examples available, ease of use per developer
feedback, etc. Now I'm getting a bit too analytical on the whole
topic so I'll stop. Really, I'm just hoping for some good ideas to
making reuse (esp. of open source and other non-commercial components)
an easier process.
Let me give you one real-world example where I had to make the
decision to use a pre-packaged solution that seemed overly complex or
develop my own. REPLICATION. I have a solution that requires that
data be replicated between 60 SQL Server machines. The machines do
not exist on a network so replication had to take place over the
Internet. I spent several days reviewing the SQL Server documentation
regarding replication and let me tell you my friends, it didn't seem
easy to set up. Anyway, my frustration/confusion lead me to develop
my own solution. My solution seemed appealing and the way to go to me
because it rested comfortably in my mind -- that is, I could envision
the whole thing, the steps involved, and how they worked together. I
had a mental grip on it. This is where most reusable components fail.
They don't make it QUICK or EASY for the developer to get a mental
grip on reusing their component.
Here's what I did to implement REPLICATION:
1. All tables have an Identity field for a Primary Key (CustomerID,
LocationID, etc.)
2. All tables have two fields -- DateAdded, DateUpdated -- that hold
obvious date values.
3. All locations running a SQL Server instance we assigned a machine
number (1 for location 1, 2 for location 2, etc.).
4. The machine number was used to assign an identity range to each
table. Machine 1 uses identities in the range of 1000000 to 1999999.
This was easily assigned using "DBCC CHECKIDENT (@TableName, RESEED,
@Seed)".
5. I wrote a replication program that nightly iterates through all
tables looking for records that have either be added or updated (per
the dates) within the last 24 hour period (12am to 11:59:59pm) and
adds them to a dataset. When all of the records have been collected
in this way, I used the DataSet.WriteXml method to create an XML file
containing all of the latest updates for a particular machine.
6. Those updates are uploaded to a central location that also runs the
same replication program. This program can also read an XML files
into a DataSet. Once the DataSet is restored in memory on the central
location I iterate through each record on each table and attempt to do
a forced INSERT (this requires "SET IDENTITY_INSERT [@TableName] OFF".
If the insert fails (this normally indicates that the given Identity
already exists, exception checking can confirm this), I attempt to
update the record being careful to observe the DateAdded and
DateUpdated values to make sure that I don't accidentally restore an
older version of the record.
7. Eventally after all locations have sent their updates to the
central location and those updates have been processed, the central
location repackages the collective updates in an outgoing file that is
downloaded by each location.
As you can see, my solution is much easier to get you mind to grasp
than the solution offered integrally in SQL Server. There are some
finer points that I omitted, but not many. The most difficult aspect
dealt with the logistics behind making sure that all files were
produced/received in a timely manner.
In any case, this is just one example where reinventing the wheel
seemed the easier way out.
Have any of you run into this dilemma whereby coding it yourself
seemed easier than using a canned solution? Do you have any ideas for
making well-executed components (and patterns) available to the public
in a way that eliminates (or reduces) this difficult decision -- to
reuse or to reinvent?
Mario T. Lanza
Clarity Information Architecture, Inc.
I don't know about you guys but on many occasions I've asked myself
whether or not someone else has solved a particular programming issue
-- whether or not they developed a clever pattern for doing some task
quite well. This usually leads me to a search on today's greatest
technical tool, The Internet. I indefinitely uncover many potential
code snippets, components, etc. and have to weed through them to find
the best one for my particular circumstance.
Usually, I'll grab ahold of any Microsoft proposed solution first as I
trust them more than unfamiliar, rogue developers. When building
production-level apps it's not easy to trust non-commercial code
especially when it's thousands of lines long. I mean, think about it,
most programmers search the web to find solutions that are quick and
easy to implement. If the solutions we find are difficult to
understand and require us to read through thousands of lines of code
and test them ourselves, the process becomes rather cumbersome.
This leads to "Plan B" many times. Many of us know "Plan B". We grow
weary with trying to understand the component we've downloaded. We've
followed all the instructions and sometimes it just doesn't seem to
work or it just about does everything we need it to do but not
everything. "Plan B" is, of course, when we decide that the easier
solution is indeed to build it ourselves. We trust ourselves and by
the end of the development and testing we know the code better than
anyone. If I'm going to have to spend X hours testing the unfamiliar
code, learning it's intricacies, determining whether it meets my
needs, trudging through oftentimes with frustration, then why does
having to spend the same X hours (maybe X * 125%) seem so daunting?
The answer: it doesn't.
Of course, I'm not suggesting we keep reinventing the wheel. It
certainly makes no sense. I'd rather have access to a reliable
repository of components that 1) are trustworthy, 2) are reliable, and
whose interfaces are 3) quickly and easily learned and understood. I
guess the real culprit here is documentation and presentation.
The documentation needs to be complete and outlined at a high level
(in addition to the very detailed level) so that it can quickly be
absorbed. At least I know that I only want to review about 1 or 2
pages of text in order to determine whether the component will meet my
specific needs and whether it can be painlessly accomplished. There
needs to be sample code making use of the component in many real-world
use cases. Examples are almost always the most useful.
I have to offer kudos to Microsoft in their openess. They offer all
sorts of freebies that could easily be sold commercially. One thing
that immediately comes to mind is their Application Blocks.
I recently reviewed one -- The Updater Application Block -- and it
very well documented. Granted, they could simplify it a bit more by
adding an "Updater Application Block For Dummies" section that makes
is seem almost effortless to meet any of a number of enumerated
real-world examples by offering STEP-BY-STEP EXAMPLES. I'm not saying
that I'm a dummy or that most programmers are, it's just that we
already have to spend so much time during the day in this intense
level of concentration that being able to break out of it periodically
because someone else took the extra efforts to make using their
component intuitively easy to use would be WONDERFUL.
I suppose having a standard format for component documentation and
presentation would be a step in the right direction. Futhermore, a
rating system could accompany the imposed standard to indicate how
well packaged the component was in a number of areas -- documentation,
the number pratical examples available, ease of use per developer
feedback, etc. Now I'm getting a bit too analytical on the whole
topic so I'll stop. Really, I'm just hoping for some good ideas to
making reuse (esp. of open source and other non-commercial components)
an easier process.
Let me give you one real-world example where I had to make the
decision to use a pre-packaged solution that seemed overly complex or
develop my own. REPLICATION. I have a solution that requires that
data be replicated between 60 SQL Server machines. The machines do
not exist on a network so replication had to take place over the
Internet. I spent several days reviewing the SQL Server documentation
regarding replication and let me tell you my friends, it didn't seem
easy to set up. Anyway, my frustration/confusion lead me to develop
my own solution. My solution seemed appealing and the way to go to me
because it rested comfortably in my mind -- that is, I could envision
the whole thing, the steps involved, and how they worked together. I
had a mental grip on it. This is where most reusable components fail.
They don't make it QUICK or EASY for the developer to get a mental
grip on reusing their component.
Here's what I did to implement REPLICATION:
1. All tables have an Identity field for a Primary Key (CustomerID,
LocationID, etc.)
2. All tables have two fields -- DateAdded, DateUpdated -- that hold
obvious date values.
3. All locations running a SQL Server instance we assigned a machine
number (1 for location 1, 2 for location 2, etc.).
4. The machine number was used to assign an identity range to each
table. Machine 1 uses identities in the range of 1000000 to 1999999.
This was easily assigned using "DBCC CHECKIDENT (@TableName, RESEED,
@Seed)".
5. I wrote a replication program that nightly iterates through all
tables looking for records that have either be added or updated (per
the dates) within the last 24 hour period (12am to 11:59:59pm) and
adds them to a dataset. When all of the records have been collected
in this way, I used the DataSet.WriteXml method to create an XML file
containing all of the latest updates for a particular machine.
6. Those updates are uploaded to a central location that also runs the
same replication program. This program can also read an XML files
into a DataSet. Once the DataSet is restored in memory on the central
location I iterate through each record on each table and attempt to do
a forced INSERT (this requires "SET IDENTITY_INSERT [@TableName] OFF".
If the insert fails (this normally indicates that the given Identity
already exists, exception checking can confirm this), I attempt to
update the record being careful to observe the DateAdded and
DateUpdated values to make sure that I don't accidentally restore an
older version of the record.
7. Eventally after all locations have sent their updates to the
central location and those updates have been processed, the central
location repackages the collective updates in an outgoing file that is
downloaded by each location.
As you can see, my solution is much easier to get you mind to grasp
than the solution offered integrally in SQL Server. There are some
finer points that I omitted, but not many. The most difficult aspect
dealt with the logistics behind making sure that all files were
produced/received in a timely manner.
In any case, this is just one example where reinventing the wheel
seemed the easier way out.
Have any of you run into this dilemma whereby coding it yourself
seemed easier than using a canned solution? Do you have any ideas for
making well-executed components (and patterns) available to the public
in a way that eliminates (or reduces) this difficult decision -- to
reuse or to reinvent?
Mario T. Lanza
Clarity Information Architecture, Inc.