extended stored proc programming

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi
I'm looking to write an extended stored procedure, the job of which will
basically to read data from one table, process it using a COM object, and
write (insert) rows out to another table.
I don't really know where to start with finding the functions necessary to
read from an existing table. Can anyone be so kind as to point me in the
right direction?

Thanks very much
 
I'm curious as to why you want to retrieve and insert the rows within the
extended proc rather than retrieve the rows, pass them into the extended
proc, then pass the data back for re-insertion? IMO, that would be a more
streamlined approach than having the proc do all of that work itself...
 
Rows inserted by an XP can be inserted outside the transaction context, so
that if the transaction is rolled back, the XP-inserted rows remain. Good
for auditing/logging/troubleshooting.
 
I don't know, I'm curious as to learn what the best way would be.
For me, 'best' == fastest.
I'm not bothered in the slightest about rollbacks or auditing.

However, I am still open to suggestions about what the best method might be.
My intial perception is that reading the data in the xp will be faster than
reading it via SQL and passing it into the xp for processing and then
inserting when back in SQL, as it's not having to constantly cross (marshal?)
between the SQL script and the xp. I intially tried this approach, but
instead of an xp, I wrote a load of SQL code that instantiates the COM object
that does the processing via sp_OAcreate, sp_OAmethod, etc., but it was
horrendously slow. Any ideas why this might be, and why an xp might be faster?

I just get the impression that if it's just the same block of C++ code that
does the entire lot of the work, then it will be faster.

If you still think your approach will be faster, please tell me how to 'pass
a row' to the xp - but please someone tell me how to read data using the xp
so I can test the methods against each other for speed.
 
Bonj said:
However, I am still open to suggestions about what the best method might be.
My intial perception is that reading the data in the xp will be faster than
reading it via SQL and passing it into the xp for processing and then

I guess, like anything else, it depends. Is it faster to cursor through
the data in C++ or T-SQL? How expensive is each initialization of the XP?
Will it be more expensive for the XP to have to make new database
connections to read/write, rather than just using the connection that
spawned the XP?
instead of an xp, I wrote a load of SQL code that instantiates the COM object
that does the processing via sp_OAcreate, sp_OAmethod, etc., but it was
horrendously slow. Any ideas why this might be, and why an xp might be
faster?

There is, IMO, no reason to believe that an XP would be that much
faster. Did you profile your COM objects to figure out where the
performance drain was?
If you still think your approach will be faster, please tell me how to 'pass
a row' to the xp - but please someone tell me how to read data using the xp
so I can test the methods against each other for speed.

I apologize if I implied that you can pass a row as-is -- you can't.
You can only pass scalar datatypes. Reading data using an XP is the same as
reading it from any other code -- set up a connection to the DB and query
the data.
 
I guess, like anything else, it depends. Is it faster to cursor through
the data in C++ or T-SQL?

I'm guessing C++. That's why I want to check it out with the whole lot
running in C++.
How expensive is each initialization of the XP?

Don't know. Probably average relatively... but the COM object has to be
called ~78m times, at the most recent count of the output rows. It currently
takes about 30 hours to do all that lot with a VB program. But that includes
all the time it takes it to send the data to SQL server. So I'm guessing that
to keep one connection open with the code that's calling the COM object all
the way through would be the best.

I just can't accept that initializing a stored procedure, or initalizing and
opening a connection, or pretty much initializing *anything* 78 million times
can be a good thing. That's why I want to do it all within the C++ code of an
extended proc.

Will it be more expensive for the XP to have to make new database
connections to read/write, rather than just using the connection that
spawned the XP?

That's what I'm thinking, yes.
faster?

There is, IMO, no reason to believe that an XP would be that much
faster. Did you profile your COM objects to figure out where the
performance drain was?

The COM object isn't under our control. But it is at the very very very
centre of all the loops. It is the absolute bottleneck. And before you ask,
no we can't write it in SQL because the company that wrote it have protected
it with a dongle, so I would think they'll be unlikely to do anything that
might be tantamount to giving the algorithm away.
I apologize if I implied that you can pass a row as-is -- you can't.
You can only pass scalar datatypes. Reading data using an XP is the same as
reading it from any other code -- set up a connection to the DB and query
the data.

I'm afraid I insist. Can you please tell me how to read a table from within
an XP, even if just to satisfy my own curiousity regarding testing how fast
it actually will be?
 
Depends on what version of C++.

If it's MS C++ 6, then you can simply use the ADO libraries/objects. If
it's C++.Net, then you can either use ADO with unmanaged C++, or you can
write mixed managed/unmanaged code, and use ADO.Net libraries for data
access.

If you're using another version of C++, then you'll either have to use the
ADO COM libraries or use whatever data access is built into the version of
C++ you're using. (You can use the SQL API, but I wouldn't recommend that
approach. It can be faster, but it's a royal PITA to write & troubleshoot,
and it's nearly impossible to find decent documentation on its
implementation.)

If you were to write a front-end application that needed to access the
database, and wrote it in C++, how would you access the database? It's the
same method, no matter whether it's an executable or a dll or an xp.
 
Bonj said:
Don't know. Probably average relatively... but the COM object has to be
called ~78m times, at the most recent count of the output rows. It currently

centre of all the loops. It is the absolute bottleneck. And before you ask,
no we can't write it in SQL because the company that wrote it have protected
it with a dongle, so I would think they'll be unlikely to do anything that
might be tantamount to giving the algorithm away.

Okay, I'm going to suggest that if you're going to be looping 78 million
times, you don't use an XP at all for this! It will be much more efficient
if you pull the data out to a flat file, manipulate it however you need to,
then BULK INSERT or BCP the data back in. The time saved by the BULK INSERT
alone as compared with a row-by-row reinsertion will be gigantic...
I'm afraid I insist. Can you please tell me how to read a table from within
an XP, even if just to satisfy my own curiousity regarding testing how fast
it actually will be?

As I said, open a connection to the database and query it. Just as you
would from any client app. I don't know what connection library you're
using in C++ -- ADO, MFC's ODBC connection library, ATL's OLE DB Consumer
Templates library, etc -- use that however you normally would. There is no
"loopback" feature in the XP architecture, if that's what you're looking
for.
 
hey now that's a good idea.
I might have to learn how to do that.

Can I just add a managed class and as long as I do the #import <mscorlib>
thing and put, like "__gc class...{}" rather than just "class...{}" then
it'll be able to use the managed ADO.NET?

Thanks for that!
 
As I said, open a connection to the database and query it. Just as you
would from any client app. I don't know what connection library you're
using in C++ -- ADO, MFC's ODBC connection library, ATL's OLE DB Consumer
Templates library, etc -- use that however you normally would. There is no
"loopback" feature in the XP architecture, if that's what you're looking
for.

Yes unfortunately that is what I was hoping for. What about "DB-Library for
C" - is that fast?

Also any suggestions about how to implement a managed class inside this
would be helpful!
 
The reason i don't want to use ADO is because it's COM - the overhead might
be small but when multiplied by 70-odd mill it's quite significant.

I've just had another idea actually, which is to challenge the company that
wrote the COM object to write it as an extended stored proc, and they would
know full well they would still be able to protect it with their dongle and
yet it could run fast on SQL server.
 
Bonj said:
Yes unfortunately that is what I was hoping for. What about "DB-Library for
C" - is that fast?

No, don't touch the DB Library, it's very old and, I believe,
deprecated. The fastest, IMO, is the ATL Consumer Templates OLE DB
implementation. But it's also a b*tch to work with. Easiest IMO is ADO,
followed by the MFC ODBC implementation.
 
Bonj said:
I've just had another idea actually, which is to challenge the company that
wrote the COM object to write it as an extended stored proc, and they would
know full well they would still be able to protect it with their dongle and
yet it could run fast on SQL server.

As I said, an extended stored proc is not going to help you here,
performance wise. If you want performance, you're going to have to do this
operation in bulk. Any row-by-row solution that requires marshalling of
data across processes/components is going to slow you down. If this were 78
rows instead of 78 million it might not be a problem... But even just
invoking the xp_hello example extended procedure 78 million times would
probably take quite a while (I'll leave that test to you :) )
 
I'm not really sure how it's done (mixed C++). Anything I write managed I
do in C#, and anything I need pure C++ for I use MSC++6.

I do know that you can write mixed code in C++.Net, I'm just not sure about
all the semantics, or restrictions.
 
As I said, an extended stored proc is not going to help you here,
performance wise. If you want performance, you're going to have to do this
operation in bulk.

Yes - what I was hoping to do is have an extended stored procedure do the
bulk operation, I thought being as it's operating directly within SQL
server's own process space, there wouldn't be any cross-process marshalling
going on all the time between SQL and COM.

Any row-by-row solution that requires marshalling of
data across processes/components is going to slow you down. If this were 78
rows instead of 78 million it might not be a problem... But even just
invoking the xp_hello example extended procedure 78 million times would
probably take quite a while (I'll leave that test to you :) )

That's exactly why I want an extended proc that will do all the 78m rows
from one invocation, called once, reads the rows in one fell swoop and spits
them out to a table, by doing "insert outputtable(....) exec xp_myproc"

Anyhow, I've written a test project to practice all the techniques I'll
need, and it was a lot easier than I imagined actually. I have used a static
library project to contain the managed code, and it seems to work. I'll do
speed testing on it, and if it's significantly faster then so be it, but at
least I'll be able to try it out.

Thanks
 
mmm... that's interesting.

managed C++ seems to be alright.
I initially was suspicious of it but it seems to work quite well.
The only thing is, the 'managed-ness' seems to be at project-wide level. So
if you have a project that's compiled with /clr, and you write bits in
unmanaged style, then will they compile as managed? Who knows...

also interesting about c#, I initially preferred it and have been using it
for months and I prefer it to VB.NET for all the reasons most people do,
looks better, more professional, few more fringe features, etc.
But I switched to VB.NET, because I just couldn't stand C#'s insistence on
case sensitivity any more, when there's something there that can do the same
thing that isn't case sensitive. I have actually found that I can write
programs faster.
There are a few advantages aswell, such as you can return a value from a
function and then do clean-up acts and such like. But I would never dream of
switching option strict or option explicit off.
 
Bonj said:
I initially was suspicious of it but it seems to work quite well.

That's true.
The only thing is, the 'managed-ness' seems to be at project-wide level.
So
if you have a project that's compiled with /clr, and you write bits in
unmanaged style, then will they compile as managed? Who knows...

I do. :-) That's not so. In fact a project can have a module (file) in which
there are both managed and unmanaged functions. In my view it is that
capability which makes MC++ now, and C++/CLI in the future, so attractive in
the case wher one has to straddle the fence between the host platform and
..Net.

Regards,
Will
 
I do. :-) That's not so. In fact a project can have a module (file) in which
there are both managed and unmanaged functions.

As I've got in my project. And the unmanaged function is called seamlessly
from the (completely) unmanaged extended proc. DLL project, and it seamlessly
creates an instance of a managed object (System::String).
But I've heard about this thing called IJW which is whereby they say "you
can take your existing C++ code, but compile it under .NET as managed code -
no changes to be made". So how does it know that's not what I want to do, and
what if I *did* want to do that, how would I do it?
Maybe I've got the concept of IJW comletely wrong?
Do my unmanaged functions compile to *actual* unmanaged (i.e. non-MSIL)
machine code, or is it just called unmanaged because it's written as
unmanaged, but when at machine code level, it compiles down to MSIL? If the
former, how come I'm able to use a managed object in this code?
 
would you mind also lending your expertise to my question about "using COM in
C++ (either unmanaged or managed)" in vc.language please?


Cheers
 
Back
Top