Using Existing Components vs. Code-it-yourself?

Mario T. Lanza · Jun 12, 2004

Greetings,

I don't know about you guys but on many occasions I've asked myself
whether or not someone else has solved a particular programming issue
-- whether or not they developed a clever pattern for doing some task
quite well. This usually leads me to a search on today's greatest
technical tool, The Internet. I indefinitely uncover many potential
code snippets, components, etc. and have to weed through them to find
the best one for my particular circumstance.

Usually, I'll grab ahold of any Microsoft proposed solution first as I
trust them more than unfamiliar, rogue developers. When building
production-level apps it's not easy to trust non-commercial code
especially when it's thousands of lines long. I mean, think about it,
most programmers search the web to find solutions that are quick and
easy to implement. If the solutions we find are difficult to
understand and require us to read through thousands of lines of code
and test them ourselves, the process becomes rather cumbersome.

This leads to "Plan B" many times. Many of us know "Plan B". We grow
weary with trying to understand the component we've downloaded. We've
followed all the instructions and sometimes it just doesn't seem to
work or it just about does everything we need it to do but not
everything. "Plan B" is, of course, when we decide that the easier
solution is indeed to build it ourselves. We trust ourselves and by
the end of the development and testing we know the code better than
anyone. If I'm going to have to spend X hours testing the unfamiliar
code, learning it's intricacies, determining whether it meets my
needs, trudging through oftentimes with frustration, then why does
having to spend the same X hours (maybe X * 125%) seem so daunting?
The answer: it doesn't.

Of course, I'm not suggesting we keep reinventing the wheel. It
certainly makes no sense. I'd rather have access to a reliable
repository of components that 1) are trustworthy, 2) are reliable, and
whose interfaces are 3) quickly and easily learned and understood. I
guess the real culprit here is documentation and presentation.

The documentation needs to be complete and outlined at a high level
(in addition to the very detailed level) so that it can quickly be
absorbed. At least I know that I only want to review about 1 or 2
pages of text in order to determine whether the component will meet my
specific needs and whether it can be painlessly accomplished. There
needs to be sample code making use of the component in many real-world
use cases. Examples are almost always the most useful.

I have to offer kudos to Microsoft in their openess. They offer all
sorts of freebies that could easily be sold commercially. One thing
that immediately comes to mind is their Application Blocks.

I recently reviewed one -- The Updater Application Block -- and it
very well documented. Granted, they could simplify it a bit more by
adding an "Updater Application Block For Dummies" section that makes
is seem almost effortless to meet any of a number of enumerated
real-world examples by offering STEP-BY-STEP EXAMPLES. I'm not saying
that I'm a dummy or that most programmers are, it's just that we
already have to spend so much time during the day in this intense
level of concentration that being able to break out of it periodically
because someone else took the extra efforts to make using their
component intuitively easy to use would be WONDERFUL.

I suppose having a standard format for component documentation and
presentation would be a step in the right direction. Futhermore, a
rating system could accompany the imposed standard to indicate how
well packaged the component was in a number of areas -- documentation,
the number pratical examples available, ease of use per developer
feedback, etc. Now I'm getting a bit too analytical on the whole
topic so I'll stop. Really, I'm just hoping for some good ideas to
making reuse (esp. of open source and other non-commercial components)
an easier process.

Let me give you one real-world example where I had to make the
decision to use a pre-packaged solution that seemed overly complex or
develop my own. REPLICATION. I have a solution that requires that
data be replicated between 60 SQL Server machines. The machines do
not exist on a network so replication had to take place over the
Internet. I spent several days reviewing the SQL Server documentation
regarding replication and let me tell you my friends, it didn't seem
easy to set up. Anyway, my frustration/confusion lead me to develop
my own solution. My solution seemed appealing and the way to go to me
because it rested comfortably in my mind -- that is, I could envision
the whole thing, the steps involved, and how they worked together. I
had a mental grip on it. This is where most reusable components fail.
They don't make it QUICK or EASY for the developer to get a mental
grip on reusing their component.

Here's what I did to implement REPLICATION:
1. All tables have an Identity field for a Primary Key (CustomerID,
LocationID, etc.)
2. All tables have two fields -- DateAdded, DateUpdated -- that hold
obvious date values.
3. All locations running a SQL Server instance we assigned a machine
number (1 for location 1, 2 for location 2, etc.).
4. The machine number was used to assign an identity range to each
table. Machine 1 uses identities in the range of 1000000 to 1999999.
This was easily assigned using "DBCC CHECKIDENT (@TableName, RESEED,
@Seed)".
5. I wrote a replication program that nightly iterates through all
tables looking for records that have either be added or updated (per
the dates) within the last 24 hour period (12am to 11:59:59pm) and
adds them to a dataset. When all of the records have been collected
in this way, I used the DataSet.WriteXml method to create an XML file
containing all of the latest updates for a particular machine.
6. Those updates are uploaded to a central location that also runs the
same replication program. This program can also read an XML files
into a DataSet. Once the DataSet is restored in memory on the central
location I iterate through each record on each table and attempt to do
a forced INSERT (this requires "SET IDENTITY_INSERT [@TableName] OFF".
If the insert fails (this normally indicates that the given Identity
already exists, exception checking can confirm this), I attempt to
update the record being careful to observe the DateAdded and
DateUpdated values to make sure that I don't accidentally restore an
older version of the record.
7. Eventally after all locations have sent their updates to the
central location and those updates have been processed, the central
location repackages the collective updates in an outgoing file that is
downloaded by each location.

As you can see, my solution is much easier to get you mind to grasp
than the solution offered integrally in SQL Server. There are some
finer points that I omitted, but not many. The most difficult aspect
dealt with the logistics behind making sure that all files were
produced/received in a timely manner.

In any case, this is just one example where reinventing the wheel
seemed the easier way out.

Have any of you run into this dilemma whereby coding it yourself
seemed easier than using a canned solution? Do you have any ideas for
making well-executed components (and patterns) available to the public
in a way that eliminates (or reduces) this difficult decision -- to
reuse or to reinvent?

Mario T. Lanza
Clarity Information Architecture, Inc.

Nick Malik · Jun 12, 2004

Hi Mario,

You are in the same position as most of us. I enjoyed reading your long
discussion on standard documentation, and reusable components.

My personal opinion is that a solution is not a component. Components are
already available for a wide variety of things, and I'm sure that many folks
would create more components if they could find a market for them.

What you needed, in your example, was more comprehensive than a component.
You needed a solution.

But where can you go to find a solution that you can reuse?

Look to the patterns literature for the answer to that question. Patterns
are NOT code snippets. They are well described solutions to common problems
(including your replication problem, by the way).

There are some excellent examples of patterns. Probably the first, and most
seminal, compendium of patterns is the book "Design Patterns - Elements of
Reusable Object Oriented Software" by Gamma, Helm, Johnson, and Vlissides
(affectionately known as the Gang of Four). That book has changed so many
lives that there are folks who consider the discovery of this list of
patterns to be something close to a complete re-education in computer
science.

I'm a little less fanatic about the Gang of Four. The book is not that easy
to read. If I were just starting on learning patterns, I'd start with "An
Introduction to Design Patterns" by Alan Shalloway. Excellent book (new
edition will be out in Fall of 2004). I'd then go to "Enterprise Design
Patterns" by Martin Fowler.

To solve your replication problem, I would have used the Observer pattern,
described in the "Design Patterns" book I mentioned above, or use the
variation called Publish/Subscribe, which is well described in this article:
http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=486

The problem you face is common in EAI (Enterprise Application Integration).
It's been solved many times. In fact, many tools have solutions built in.
A good example would be SonicESB which solves the problem using transactions
and Java Message Queues.

Personally, I'm fond of Biztalk. If you had installed the SQL Notification
component on each server, and placed Biztalk in the middle, with MSMQ 3.0
using SOAP over HTTPS as your transit protocol, you could have done most of
this with very little code.

Good luck, and take a look at the patterns literature. You owe it to
yourself not to reinvent the wheel.

--- Nick Malik
Biztalk Bum
Solutions Architect

Mario T. Lanza said:
Greetings,

I don't know about you guys but on many occasions I've asked myself
whether or not someone else has solved a particular programming issue
-- whether or not they developed a clever pattern for doing some task
quite well. This usually leads me to a search on today's greatest
technical tool, The Internet. I indefinitely uncover many potential
code snippets, components, etc. and have to weed through them to find
the best one for my particular circumstance.

Usually, I'll grab ahold of any Microsoft proposed solution first as I
trust them more than unfamiliar, rogue developers. When building
production-level apps it's not easy to trust non-commercial code
especially when it's thousands of lines long. I mean, think about it,
most programmers search the web to find solutions that are quick and
easy to implement. If the solutions we find are difficult to
understand and require us to read through thousands of lines of code
and test them ourselves, the process becomes rather cumbersome.

This leads to "Plan B" many times. Many of us know "Plan B". We grow
weary with trying to understand the component we've downloaded. We've
followed all the instructions and sometimes it just doesn't seem to
work or it just about does everything we need it to do but not
everything. "Plan B" is, of course, when we decide that the easier
solution is indeed to build it ourselves. We trust ourselves and by
the end of the development and testing we know the code better than
anyone. If I'm going to have to spend X hours testing the unfamiliar
code, learning it's intricacies, determining whether it meets my
needs, trudging through oftentimes with frustration, then why does
having to spend the same X hours (maybe X * 125%) seem so daunting?
The answer: it doesn't.

Of course, I'm not suggesting we keep reinventing the wheel. It
certainly makes no sense. I'd rather have access to a reliable
repository of components that 1) are trustworthy, 2) are reliable, and
whose interfaces are 3) quickly and easily learned and understood. I
guess the real culprit here is documentation and presentation.

The documentation needs to be complete and outlined at a high level
(in addition to the very detailed level) so that it can quickly be
absorbed. At least I know that I only want to review about 1 or 2
pages of text in order to determine whether the component will meet my
specific needs and whether it can be painlessly accomplished. There
needs to be sample code making use of the component in many real-world
use cases. Examples are almost always the most useful.

I have to offer kudos to Microsoft in their openess. They offer all
sorts of freebies that could easily be sold commercially. One thing
that immediately comes to mind is their Application Blocks.

I recently reviewed one -- The Updater Application Block -- and it
very well documented. Granted, they could simplify it a bit more by
adding an "Updater Application Block For Dummies" section that makes
is seem almost effortless to meet any of a number of enumerated
real-world examples by offering STEP-BY-STEP EXAMPLES. I'm not saying
that I'm a dummy or that most programmers are, it's just that we
already have to spend so much time during the day in this intense
level of concentration that being able to break out of it periodically
because someone else took the extra efforts to make using their
component intuitively easy to use would be WONDERFUL.

I suppose having a standard format for component documentation and
presentation would be a step in the right direction. Futhermore, a
rating system could accompany the imposed standard to indicate how
well packaged the component was in a number of areas -- documentation,
the number pratical examples available, ease of use per developer
feedback, etc. Now I'm getting a bit too analytical on the whole
topic so I'll stop. Really, I'm just hoping for some good ideas to
making reuse (esp. of open source and other non-commercial components)
an easier process.

Let me give you one real-world example where I had to make the
decision to use a pre-packaged solution that seemed overly complex or
develop my own. REPLICATION. I have a solution that requires that
data be replicated between 60 SQL Server machines. The machines do
not exist on a network so replication had to take place over the
Internet. I spent several days reviewing the SQL Server documentation
regarding replication and let me tell you my friends, it didn't seem
easy to set up. Anyway, my frustration/confusion lead me to develop
my own solution. My solution seemed appealing and the way to go to me
because it rested comfortably in my mind -- that is, I could envision
the whole thing, the steps involved, and how they worked together. I
had a mental grip on it. This is where most reusable components fail.
They don't make it QUICK or EASY for the developer to get a mental
grip on reusing their component.

Here's what I did to implement REPLICATION:
1. All tables have an Identity field for a Primary Key (CustomerID,
LocationID, etc.)
2. All tables have two fields -- DateAdded, DateUpdated -- that hold
obvious date values.
3. All locations running a SQL Server instance we assigned a machine
number (1 for location 1, 2 for location 2, etc.).
4. The machine number was used to assign an identity range to each
table. Machine 1 uses identities in the range of 1000000 to 1999999.
This was easily assigned using "DBCC CHECKIDENT (@TableName, RESEED,
@Seed)".
5. I wrote a replication program that nightly iterates through all
tables looking for records that have either be added or updated (per
the dates) within the last 24 hour period (12am to 11:59:59pm) and
adds them to a dataset. When all of the records have been collected
in this way, I used the DataSet.WriteXml method to create an XML file
containing all of the latest updates for a particular machine.
6. Those updates are uploaded to a central location that also runs the
same replication program. This program can also read an XML files
into a DataSet. Once the DataSet is restored in memory on the central
location I iterate through each record on each table and attempt to do
a forced INSERT (this requires "SET IDENTITY_INSERT [@TableName] OFF".
If the insert fails (this normally indicates that the given Identity
already exists, exception checking can confirm this), I attempt to
update the record being careful to observe the DateAdded and
DateUpdated values to make sure that I don't accidentally restore an
older version of the record.
7. Eventally after all locations have sent their updates to the
central location and those updates have been processed, the central
location repackages the collective updates in an outgoing file that is
downloaded by each location.

As you can see, my solution is much easier to get you mind to grasp
than the solution offered integrally in SQL Server. There are some
finer points that I omitted, but not many. The most difficult aspect
dealt with the logistics behind making sure that all files were
produced/received in a timely manner.

In any case, this is just one example where reinventing the wheel
seemed the easier way out.

Have any of you run into this dilemma whereby coding it yourself
seemed easier than using a canned solution? Do you have any ideas for
making well-executed components (and patterns) available to the public
in a way that eliminates (or reduces) this difficult decision -- to
reuse or to reinvent?

Mario T. Lanza
Clarity Information Architecture, Inc.

Andy Fish · Jun 13, 2004

Mario T. Lanza said:
Greetings,

I don't know about you guys but on many occasions I've asked myself
whether or not someone else has solved a particular programming issue
-- whether or not they developed a clever pattern for doing some task
quite well. This usually leads me to a search on today's greatest
technical tool, The Internet. I indefinitely uncover many potential
code snippets, components, etc. and have to weed through them to find
the best one for my particular circumstance.

I think most people feel very similar.

It always strikes me as a bit ironic that 10 years ago I was producing
monster C and VB3 programs, fighting against inadequacies in the languages
and dreaming of "reusable components". Now we have some great programming
languages with fantastic features, but whenever I start to write a piece of
code, I feel guilty that I may be re-inventing the wheel so I have to stop
and research current standards, interfaces, frameworks, and components, and
then spend effort learning how to integrate all these complex technologies
into my app.

Don't get me wrong, I'm not advocating reinventing the wheel - there is
already far too much of it that goes on, and that is the reason we have so
many competing frameworks and components anyway.

Ultimately, We have to accept that the science of computer programming must
advance to give our customers the best value for money, so the best
programmers will be those that are most skilled at analyzing the buy/build
trade-off. Inventing new frameworks and architectures from scratch is fun
(that's why so many of them are free), but if it's cheaper to buy something
in, that's what customers will want.

This means the job of programmers will change - hordes of "grunt"
programmers cranking out monstrous systems will be replaced by highly
skilled technical architects evalating and integrating existing
technologies.

Andy

Jean Gobbe · Jun 14, 2004

Hi Nick,

I am using a code similar to Mario's as I see.
But my question is more about patterns then about code.

I am using a publishing/subcribing pattern between 2 systems. But it is a 2 way
publishing.
Updates from SystemA are sent to SystemB and vice versa.

I don't find specific info about such a pattern and how to avoid problems when 2
updates are done on the 2 systems "near-simultaneously" on a same object.
This is the picture before :
SystemA :
objectID=0002
attribute1=0007
SystemB :
objectID=0002
attribute1=0007

An update is done on SystemA :
objectID=0002
attribute1=0007A

another update is done nearly at the same time on SystemB :
objectID=0002
attribute1=0007B

then an update message is sent to the other system:
SystemA to SystemB:
objectID=0002
attribute1=0007A

SystemB to SystemA :
objectID=0002
attribute1=0007B

This is the picture after:
SystemA :
objectID=0002
attribute1=0007B
SystemB :
objectID=0002
attribute1=0007A

How could I resolve this?
Perhaps having a master and a slave?
Systems are loosely coupled, and I can't use the DateUpdated from SystemA on
SystemB.

Thank for any idea or link.
Jean

Nick Malik said:
Hi Mario,

You are in the same position as most of us. I enjoyed reading your long
discussion on standard documentation, and reusable components.

My personal opinion is that a solution is not a component. Components are
already available for a wide variety of things, and I'm sure that many folks
would create more components if they could find a market for them.

What you needed, in your example, was more comprehensive than a component.
You needed a solution.

But where can you go to find a solution that you can reuse?

Look to the patterns literature for the answer to that question. Patterns
are NOT code snippets. They are well described solutions to common problems
(including your replication problem, by the way).

....

Nick Malik · Jun 15, 2004

Hi Jean,

I assume that your databases must remain in sync? I'd suggest passing all
"update" messages through a single point (collector/distributor) that will
distribute out the updates. That ends up serializing the changes (since a
change comes back to the sender), which gives you one value in all
databases.

So to reiterate your problem:

This is the picture before :
SystemA :
objectID=0002
attribute1=0007

SystemB :
objectID=0002
attribute1=0007

An update is done on SystemA :
objectID=0002
attribute1=0007A

another update is done nearly at the same time on SystemB :
objectID=0002
attribute1=0007B

then an update message is sent to the distributor
SystemA to distributor
objectID=0002
attribute1=0007A

SystemB to distributor
objectID=0002
attribute1=0007B

Now, the distributor sends out two messages (in the order received):

distributor to SystemA
objectID=0002
attribute1=0007A
distributor to SystemA
objectID=0002
attribute1=0007B

distributor to SystemB
objectID=0002
attribute1=0007A
distributor to SystemB
objectID=0002
attribute1=0007B

This is the picture after:
SystemA :
objectID=0002
attribute1=0007B
SystemB :
objectID=0002
attribute1=0007B

How's that sound?

Note: I didn't put a database at the distributor... I just serialized the
messages. This can be done using a variety of mechanisms. I like using
database tables for holding these values temporarily, but it isn't strictly
necessary.

You can serialize using date-time stamp on the distributor (since it is a
consistent clock). Note that this method allows the LAST CHANGE to survive.
There is a variation that will only allow the FIRST CHANGE to survive and
will discard the last change. Let me know if you are interested in that
variation (this one is simpler).

Hope this helps,
--- Nick

Jean Gobbe said:
Hi Nick,

I am using a code similar to Mario's as I see.
But my question is more about patterns then about code.

I am using a publishing/subcribing pattern between 2 systems. But it is a 2 way
publishing.
Updates from SystemA are sent to SystemB and vice versa.

I don't find specific info about such a pattern and how to avoid problems when 2
updates are done on the 2 systems "near-simultaneously" on a same object.

<<clipped>>>

Jean Gobbe · Jun 15, 2004

Thanks Nick,

I will try this solution today.

Nick Malik said:
Hi Jean,

I assume that your databases must remain in sync? I'd suggest passing all
"update" messages through a single point (collector/distributor) that will
distribute out the updates. That ends up serializing the changes (since a
change comes back to the sender), which gives you one value in all
databases.

So to reiterate your problem:

This is the picture before :
SystemA :
objectID=0002
attribute1=0007

SystemB :
objectID=0002
attribute1=0007

An update is done on SystemA :
objectID=0002
attribute1=0007A

another update is done nearly at the same time on SystemB :
objectID=0002
attribute1=0007B

then an update message is sent to the distributor
SystemA to distributor
objectID=0002
attribute1=0007A

SystemB to distributor
objectID=0002
attribute1=0007B

Now, the distributor sends out two messages (in the order received):

distributor to SystemA
objectID=0002
attribute1=0007A
distributor to SystemA
objectID=0002
attribute1=0007B

distributor to SystemB
objectID=0002
attribute1=0007A
distributor to SystemB
objectID=0002
attribute1=0007B

This is the picture after:
SystemA :
objectID=0002
attribute1=0007B
SystemB :
objectID=0002
attribute1=0007B

How's that sound?

Note: I didn't put a database at the distributor... I just serialized the
messages. This can be done using a variety of mechanisms. I like using
database tables for holding these values temporarily, but it isn't strictly
necessary.

You can serialize using date-time stamp on the distributor (since it is a
consistent clock). Note that this method allows the LAST CHANGE to survive.
There is a variation that will only allow the FIRST CHANGE to survive and
will discard the last change. Let me know if you are interested in that
variation (this one is simpler).

Hope this helps,
--- Nick

<<clipped>>>

Jean Gobbe · Jun 17, 2004

Nick,

What about your variation that will only allow the FIRST CHANGE to survive ?
(I am not sure we are on the right newsgroup ...)

Using Existing Components vs. Code-it-yourself?

Mario T. Lanza

Nick Malik

Andy Fish

Jean Gobbe

Nick Malik

Jean Gobbe

Jean Gobbe