Is there an XMLDataAdapter?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have two, dynamic XML files that contain data. My customer insists on using
XML for a number of quite valid reasons. Being new (read naive) to XML, I
thought there might be some form of SELECT statement that could be used with
XML files. A number of books have referred me to the .NET SDK. However, I
can't seem to find what I need.

Is there a data adapter for XML? If so, where might it be documented? If
not, must I program my own queries against the contents of various dataset
tables (read by <data-set>.ReadXML)? I must stay within the Microsoft
provided .NET framework and not purchase software from another vender. I also
must assume that the user will have neither SQL nor Access available.

TIA,

Gus
 
Gus - you could write one if you wanted to. In Sahil Malik's first book,
he's got an example of doing just that. I know NSoftware for isntance, wrote
one for RSS feeds which is pretty cool. .

However, you may not need to. If the data isnt' going to a database, (and
even if it is depending on the details), you can just use LoadXml which
depending on the structure, will create a dataset for you. Now once in a
dataset, you can use the DataTable's Select method, which is not exactly Sql
but very close, Compute, Find etc. The main job of the dataadapter is
moving data, marshalling it between the source and a destination. So if you
had the xml files, you may just want to work with them locally. XPath is a
way to query xml directly which may suit your needs, .

Also, if the dataset schema is unacceptable to you once you load it, you can
use the Xslt libraries to transform it into whatever structure you want.
The learning curve is not trivial, but it's not all that difficult and it's
a skill that carries over to other technologies and IMHO, will be well worth
having.
 
Hey Gus,

Writing your own data adapter isn't rocket science. As Bill pointed out,
Chap 15 of book #1 deals with that, but, I must point out, what is difficult
to do however is writing a database engine that understands XML and lets you
fire SELECT queries against it.

Frankly you are better off System.XML/XPath/ etc.

Look up "Xpath" - that seems to be what you need.
 
Gus,

An XML file is just not a database.

There is a difference between a XML file as les say a document (with
attributes etc), (As Bill stated to get with a loadxml) and a XML file wich
is a dataset, what you can get with a readXML.

However both are just files noting more. Because there are tons of methods
to internally access a dataset, you can use that as well to get the data and
to select of whatever way you want to get a subset.

The pain comes with updating. It is just a file and therefore can you only
write a complete XML file back. This means that it is almost (in fact is)
unusable for multiuser purpose.

(And as well of course very slow, the action will be forever)

Reading a complete file
Renaming the file
Writing the complete file
Deleting the old file when every thing was OK

I hope this gives an idea

Cor
 
Cor Ligthert said:
Gus,

An XML file is just not a database.
-Cor, I say this with the utmost of respect, but I think that statement,
without adding some qualifications, is a bit misleading. The reason I say
this is that while XML is certainly not an optimum database in multiuser
scenarios, it certainly is a decent db in many other instances. XML is a
lot better that relational databases for many heirarchical scenarios and you
can deal with it directly in many cases, negating the need to advanced
extraction routines. Yes, the last part of that point is not valid across
the board, but what I mean is that you can grab an XMl document and deal
with it as a unit quite easily.
There is a difference between a XML file as les say a document (with
attributes etc), (As Bill stated to get with a loadxml) and a XML file
wich is a dataset, what you can get with a readXML.

However both are just files noting more.
--Again, i'd just caution that this point could be a little confusing if you
don't qualify it. Take ISAM Databases which were the predominant way to
handle data for a long time. They were single files. Access is a single
file database. Even Sql server is a a single file in respect to how the
actual db file is created (although admittedly you can split up filegroups).
But Excel, CSV etc are all single file systems in the truest sense of the
word and they all suffice as a database quite well in limited user
scenarios. In fact, I've used ISAM databases that were much faster than a
Sql Server equivalent and actually worked pretty well in multiuser scenarios
b/c the I/O time associated with reading and writing values was so fast that
locking was not much of an issue. In no way though am I saying that XML,
Excel , CSV etc is the optimal way to handle large multiuser scenarios, but
there is definitely a place for this approach in many applications.
Because there are tons of methods
to internally access a dataset, you can use that as well to get the data
and to select of whatever way you want to get a subset.

The pain comes with updating. It is just a file and therefore can you only
write a complete XML file back. This means that it is almost (in fact is)
unusable for multiuser purpose.
--I am not saying that it's great for multiuser scenarios, but single files,
including XML can often perform every bit as well as say Access. I've used
ISAM dbs with well over 100 users and had minimal problems. Indexes needed
rebuilt and maintained, but that's certainly not unique to single file
systems. Moreoever, the IO in many cases can happen so quickly that users
may never realize that someone else was writing to it. Access is another
example, all the changes made on remote machines need to propogate to one
file, and while Access is hardly a good tool IMHO for multiuser scenarios,
many use it successfully - it really depends on the nature of your
application. Also, look at an application like Log4net that writes to one
file at a time. The app is written to allow lazy writes and we used it for
a large scale application written fro the Dept of Revenue at the State of
SC. There were huge numbers of users and at first we had logging set up
very verbose. things happened so fast that even with all the writing going
on, it was not a problem. Sure, effectively one machine was serving as the
'user' but there were hundreds of users at a time hitting that machine, and
each request was writing to the file. The same can be implemented with XML
and I certainly am not advocating it as the best way to go in multiuser
scenarios, when RDMBS systems aren't available, like this case that Gus
mentions, I wouldn't write it off
 
Bill,

Most I agree with you so there is not much to write about. I expressly wrote
what I wrote and nothing more about databases.

The OP has said very clear that he don't want a database. So I tried to show
him the counterpart of that. I had in past the same idea as him, and when
using it showed me that it was a very limited approach and even dangerous
because that you can not set it in a save area (what is the same with by
instance access and excel)

A little thing, Excel can AFAIK be used as a (limited) multiuser database.
The reason that it cannot with XML files is that an XML file is nothing more
than a what I call flat textfile as a CSV file is too. In databases you can
change parts of the data withouth reading and writing the complete file,
databases are not *just* files, it are a special kind of files which have
another kind of access.

You would have seen in past in a lot of messages from me, that I forever
tell that XML datasets can be a very good replacements for a lot of
situations expressely for by instance saving settings or by instance the
base for the use of different languages. However those are limited in their
use at the start and end of a program and/or bound to one user.

There will be now never more one thought that I will use any XML file as a
real database. I still find the Access Database the first replacement for
that (with all its bad counterparts).

Cor
 
Cor Ligthert said:
Bill,

Most I agree with you so there is not much to write about. I expressly
wrote what I wrote and nothing more about databases.

The OP has said very clear that he don't want a database. So I tried to
show him the counterpart of that. I had in past the same idea as him, and
when using it showed me that it was a very limited approach and even
dangerous because that you can not set it in a save area (what is the same
with by instance access and excel)
--I agree with the assertion that it's easier in Excel and Access, but you
can still do this with XML. You have to put in some work arounds to get it
to work cleanly, but it can still be done in many cases and it's not all
that difficult.
A little thing, Excel can AFAIK be used as a (limited) multiuser database.
The reason that it cannot with XML files is that an XML file is nothing
more than a what I call flat textfile as a CSV file is too. In databases
you can change parts of the data withouth reading and writing the complete
file, databases are not *just* files, it are a special kind of files which
have another kind of access.
--Right, but they are just one file in most cases, they just have code
around them which changes their behavior. The same can be implemented w/
XML. I harken back to ISAM databases which, in most cases, can be opened up
and are pure text files. There are entrie XML based databases which wrap
functionality to update just parts of the file without affecting the rest of
it the same way other db formats do with other types. So the only
difference is that there are wrappers that handle the nuances of updating
only segments. If you think about it, XML is just another file format so
the same mechanisms that allow you to update segments can (and are) employed
with XML in many implementations.
You would have seen in past in a lot of messages from me, that I forever
tell that XML datasets can be a very good replacements for a lot of
situations expressely for by instance saving settings or by instance the
base for the use of different languages. However those are limited in
their use at the start and end of a program and/or bound to one user.
-- I don't dispute that they are better suited, if you use them directly,
for single user scenarios, my only contention is that you can use them in
multi-user scenarios just as you would other flat files, which people have
been doign for years.
There will be now never more one thought that I will use any XML file as a
real database. I still find the Access Database the first replacement for
that (with all its bad counterparts).
--Ideally, in most cases you would want to use a RDMS system, particularly
since MSDE or Sql Server 2005 express are free, as are firebird and many
others. But if for some reason you can't use one, it's definitely possible
to use XML directly. I would argue that it's hard to imagine too many
scenarios where you really can't use a RDBMS system, but if the boss is
insistent, it's definitely possible to use XML depending on the needs of the
system.
 
Cor - after rereading your post, I think we're talking about two different
thigns. Are you referring to specifically use WriteXML from a Dataset to
persist the data? I'm guessing this is what you're referring to. Jsut to be
clear, i am referring to using XPath/XQuery to navigate the document,
replace the values and save it. So obviously in the latter you can
definitely just write out or change specific pieces of the document. Back
in the day before I learned mxl a little better, I used WriteXml as the main
way to transform datasets into XML but haven't used it for a while. From
reading your post though, I'm guessing this is what you were referring to,
so we are probably advocating two entirely different approaches.
 
Bill,

An ISAM database and what you call sequential files in multiuser
environments have been in past to be at least so called Index Sequential
files. What in fact an ISAM file is. It has forever had in past an extra
index file around and now inbuild, the best in my opinion when it is a
BTree.

The by the index to point seperated datainformation parts are forever direct
addressable and to update.

Not that I have it from that, however I like it to show it using Wikipedia
than typing myself a lot.

http://en.wikipedia.org/wiki/ISAM

I have endless used those seperated index and sequential files in past.

An XML file has no index, you have to write and rewrite everything again.

In past I have also worked with a relative database, which had beside
indexes as well, zero or more list structures after every row, using those
was really very fast. They are nomore, probably because that searching of
the index is now fast enough.

Using sequential files was in past forever in batch where the first record
of (by instance) a tape was read and than in the same time rewritten on
another tape. That was no multiprocessing.

Cor
 
Bill,

Maybe, my most important point is however that it is not to do in a real
multiuser environment with *one* XML file.

If you use many xmlfiles, what is a method that is not impossible and which
I once in past have adviced (structured document processing with as goal to
use the data of that). Than your solution is in my opinon one of the better
ones because you are not sticked to the in fact simple format of the
dataset.

Using *one* XML file in a multiuser environment makes it needed that if one
user needs to use that file, the other one has to wait until the first one
is ready. (Complete file lock).

That makes it in my opinion in a real multiuser environment impossible (or
it should be something used or updated once a month) or would be a file for
by instance sinle use on a pda, that will be processed afterwards. (Thinking
about that is that probably where you are referencing too).

Cor
 
Cor Ligthert said:
Bill,

An ISAM database and what you call sequential files in multiuser
environments have been in past to be at least so called Index Sequential
files. What in fact an ISAM file is. It has forever had in past an extra
index file around and now inbuild, the best in my opinion when it is a
BTree.
--I'm just using ISAM as an example of a purely one file system. You can
navigate an XML document the same way you can use an index to find your
position. From the point of I/O, there's no difference for the disk drive
doesn't care how you found the position you want to write to.
The by the index to point seperated datainformation parts are forever
direct addressable and to update.

Not that I have it from that, however I like it to show it using Wikipedia
than typing myself a lot.

http://en.wikipedia.org/wiki/ISAM

I have endless used those seperated index and sequential files in past.

An XML file has no index, you have to write and rewrite everything again.
--Actually you don't. That's what my point is. I was using ISAM as an
example of a one file system that fit your definition. You can load the xml
document and use XPath to write specific nodes out to it. This can be done
very quickly and while you have a file locked, it may only be for a
millisecond so it's not necessarily an issue.
In past I have also worked with a relative database, which had beside
indexes as well, zero or more list structures after every row, using those
was really very fast. They are nomore, probably because that searching of
the index is now fast enough.

Using sequential files was in past forever in batch where the first record
of (by instance) a tape was read and than in the same time rewritten on
another tape. That was no multiprocessing.
--Sure, but flat file db's are still in use some places today. All it is is
a different file format. There are pure XML databases out there, that use
one file, and that are used by multiple people. It's not the best
configuration for tons of concurrent users but it can and does work for many
situations. What my point is is simply that all db's write to some file
format. XML is just another format of writing out your data. The db engine
handles how the data is written and the same can be done using an xml
structure.
 
BTW, here's an example of an XML database file that we use that has
typically about 15 applications writing to it. For any record that's
written, another application may find it and write/change the values. This
includes adding new files, deleting them and changing them. It's done
entirely with one file and there are currently about 25,000 records in it.
We actually use this for a bunch of services and apps so that the errors
aren't dependent on a db. However this is shared among them and when an
Message is processed and corrected, the node is found and update or deleted.
So in every sense this works just like you would with a rdbms system. While
performance can be a little sluggish at times, it's pretty quick and it's
been running for a good 5 months now without any noteworthy problem. The
main point worth nothing though is that application a may go find something
written out by application c (fileWatcher) and change the values that C
wrote. So even though it's one file, the only time we overwrite things is
when we specifically want to. All this btw is done with System.Xml and
System.Xml.XPath - then we periodically create reports on it using Xslt
(definitely a lot easier and more fun than the old vb6 days manipulating
dom)

<Exceptions position='4'>
<MessageID>1999232</MessageID>
<Message>An error occurred while parsing EntityName. Line 38, position
3.</Message>
<Job_x0020_Number>\\aug-filesrv1\Transcription\Signed\WS_EG_10013.xml</Job_1999232_Number>
<Exception>9/22/2005 10:33:08 AM</Exception>
</Exceptions>
<Exceptions position='10002'>
<MessageID>1896232</MessageID>
<Message>An error occurred while parsing EntityName. Line 32, position
3.</Message>
<Job_x0020_Number>\\aug-filesrv1\Transcription\Signed\WS_EG_10017.xml</Job_1896232_Number>
<Exception>10/18/2005 12:34:26 PM</Exception>
</Exceptions>
<Exceptions position='67'>
<MessageID>2009174</MessageID>
<Message>An error occurred while parsing EntityName. Line 40, position
3.</Message>
<Job_x0020_Number>\\aug-filesrv1\Transcription\Signed\WS_EG_10056.xml</Job_2009174_Number>
<Exception>9/21/2005 10:28:11 PM</Exception>
</Exceptions>
<Exceptions>
 
Back
Top