Struggling with XML CDATA

  • Thread starter Thread starter AGP
  • Start date Start date
A

AGP

I have looked at several articles and help on reading CDATA from an XML node
but cant say I understand it. I dont want to itereate through all subnodes
to find the CDATA. Is there a way in VB.NET to just query that section
directly and read it for each MSG node? My XML looks something like this:


<MSG
ID="8W">
<ROAD SEGLEN="2"/>
<ROAD SEGLEN="5"/>
<ROAD SEGLEN="2"/>
<![CDATA[
READING TAKEN BY NSH. TRUCK
]]>
</MSG>

<MSG
ID="9W">
<ROAD SEGLEN="3.5"/>
<ROAD SEGLEN="1"/>
<ROAD SEGLEN="6.4"/>
<![CDATA[
READINGS DONE PER WBSMITH. <BR> DAYTIME. STANDARD EQ.
]]>
</MSG>
 
AGP said:
I have looked at several articles and help on reading CDATA from an XML node
but cant say I understand it. I dont want to itereate through all subnodes
to find the CDATA. Is there a way in VB.NET to just query that section
directly and read it for each MSG node? My XML looks something like this:


<MSG
ID="8W">
<ROAD SEGLEN="2"/>
<ROAD SEGLEN="5"/>
<ROAD SEGLEN="2"/>
<![CDATA[
READING TAKEN BY NSH. TRUCK
]]>
</MSG>

<MSG
ID="9W">
<ROAD SEGLEN="3.5"/>
<ROAD SEGLEN="1"/>
<ROAD SEGLEN="6.4"/>
<![CDATA[
READINGS DONE PER WBSMITH. <BR> DAYTIME. STANDARD EQ.
]]>
</MSG>

With .NET 3.5 you can use LINQ to XML as follows (assuming your MSG
elements have a common root element):

Dim doc As XDocument = XDocument.Load("..\..\XMLFile1.xml")
Dim query = _
From msg In doc.Root.<MSG> _
Select New With { _
.Id = msg.@ID, _
.Text = msg.Nodes().OfType(Of XCData)().First().Value _
}
For Each item In query
Console.WriteLine("Id: {0}; Text: ""{1}""", item.Id, item.Text)
Next

Output for your provided XML sample is

Id: 8W; Text: "
READING TAKEN BY NSH. TRUCK
"
Id: 9W; Text: "
READINGS DONE PER WBSMITH. <BR> DAYTIME. STANDARD EQ.
"

I realize you asked just for the CDATA but I have kept the ID of the MSG
element to be able to relate the CDATA to its MSG element. Should be
easy to only select only the CDATA if you want that.

LINQ to XML is documented here:
http://msdn.microsoft.com/en-us/library/bb387098.aspx
 
AGP said:
I have looked at several articles and help on reading CDATA from an XML node
but cant say I understand it. I dont want to itereate through all subnodes
to find the CDATA. Is there a way in VB.NET to just query that section
directly and read it for each MSG node?

If you don't use .NET 3.5 then here is some solution with XmlDocument:

Dim doc As New XmlDocument()
doc.Load("..\..\XMLFile1.xml")
For Each msg As XmlElement In doc.SelectNodes("*/MSG")
Console.WriteLine("Id: {0}; Text: ""{1}""",
msg.GetAttribute("ID"),
msg.SelectSingleNode("text()[normalize-space()]").InnerText)
Next


Note however that the SelectSingleNode("text()[normalize-space()]") will
select any kind of non-empty text child node, not only a CDATA section
node. For the XML you posted that should not matter but generally XPath
does not distinguish between normal text nodes and CDATA section text nodes.
 
Martin Honnen said:
AGP said:
I have looked at several articles and help on reading CDATA from an XML
node
but cant say I understand it. I dont want to itereate through all
subnodes
to find the CDATA. Is there a way in VB.NET to just query that section
directly and read it for each MSG node?

If you don't use .NET 3.5 then here is some solution with XmlDocument:

Dim doc As New XmlDocument()
doc.Load("..\..\XMLFile1.xml")
For Each msg As XmlElement In doc.SelectNodes("*/MSG")
Console.WriteLine("Id: {0}; Text: ""{1}""",
msg.GetAttribute("ID"),
msg.SelectSingleNode("text()[normalize-space()]").InnerText)
Next


Note however that the SelectSingleNode("text()[normalize-space()]") will
select any kind of non-empty text child node, not only a CDATA section
node. For the XML you posted that should not matter but generally XPath
does not distinguish between normal text nodes and CDATA section text
nodes.

ok i looked at your first post and that may not work for me since I use
VBNET2005. I am curious as to why one cant use a more direct reading
method if detecting it and writing it seems to be supported by

XmlNodeType.CDATA
XmlWriter.WriteCData


AGP
 
AGP said:
Martin Honnen said:
AGP said:
I have looked at several articles and help on reading CDATA from an XML
node
but cant say I understand it. I dont want to itereate through all
subnodes
to find the CDATA. Is there a way in VB.NET to just query that section
directly and read it for each MSG node?

If you don't use .NET 3.5 then here is some solution with XmlDocument:

Dim doc As New XmlDocument()
doc.Load("..\..\XMLFile1.xml")
For Each msg As XmlElement In doc.SelectNodes("*/MSG")
Console.WriteLine("Id: {0}; Text: ""{1}""",
msg.GetAttribute("ID"),
msg.SelectSingleNode("text()[normalize-space()]").InnerText)
Next


Note however that the SelectSingleNode("text()[normalize-space()]") will
select any kind of non-empty text child node, not only a CDATA section
node. For the XML you posted that should not matter but generally XPath
does not distinguish between normal text nodes and CDATA section text
nodes.

ok i looked at your first post and that may not work for me since I use
VBNET2005. I am curious as to why one cant use a more direct reading
method if detecting it and writing it seems to be supported by

XmlNodeType.CDATA
XmlWriter.WriteCData


AGP

Perhaps you could give a more concrete example of requirements?
Do you want to read all CDATA sections in your XML, or just read/write
specific elements that are known in advance?
 
AGP said:
ok i looked at your first post and that may not work for me since I use
VBNET2005. I am curious as to why one cant use a more direct reading
method if detecting it and writing it seems to be supported by

XmlNodeType.CDATA
XmlWriter.WriteCData

Well if you care about CDATA sections then loop through ChildNodes or
through SelectNodes("text()") and check the NodeType, that way you can
detect them. XPath does not care about them as they are merely syntactic
sugar and e.g.
<foo><![CDATA[a < b && b < c]]></foo>
is equivalent to
<foo>a &lt; b &amp;&amp; b &lt; c</foo>
and gives that foo element the same contents (string value) in the XPath
data model.

The main problem with your sample is that those MSG elements have mixed
contents of elements and text or CDATA sections, that way it takes some
effort if someone wants only the text or CDATA section contents.
Otherwise you could simply access the InnerText property of a MSG
elements to get the text contents. Again it would not matter however if
in the markup there is a text node or a CDATA section node.
 
Joe Fawcett said:
AGP said:
Martin Honnen said:
AGP wrote:
I have looked at several articles and help on reading CDATA from an XML
node
but cant say I understand it. I dont want to itereate through all
subnodes
to find the CDATA. Is there a way in VB.NET to just query that section
directly and read it for each MSG node?

If you don't use .NET 3.5 then here is some solution with XmlDocument:

Dim doc As New XmlDocument()
doc.Load("..\..\XMLFile1.xml")
For Each msg As XmlElement In doc.SelectNodes("*/MSG")
Console.WriteLine("Id: {0}; Text: ""{1}""",
msg.GetAttribute("ID"),
msg.SelectSingleNode("text()[normalize-space()]").InnerText)
Next


Note however that the SelectSingleNode("text()[normalize-space()]") will
select any kind of non-empty text child node, not only a CDATA section
node. For the XML you posted that should not matter but generally XPath
does not distinguish between normal text nodes and CDATA section text
nodes.

ok i looked at your first post and that may not work for me since I use
VBNET2005. I am curious as to why one cant use a more direct reading
method if detecting it and writing it seems to be supported by

XmlNodeType.CDATA
XmlWriter.WriteCData


AGP

Perhaps you could give a more concrete example of requirements?
Do you want to read all CDATA sections in your XML, or just read/write
specific elements that are known in advance?

Well the requirement was to read the XML example and for each MSG node to
read some of the attributes and ROAS child nodes and do something with them.
That part was starightforward and I've already done that. The last part is
to read the CDATA of each MSG node and then do something with the contents.
Reading the CDAT is all Im after.

AGP
 
Martin Honnen said:
AGP said:
ok i looked at your first post and that may not work for me since I use
VBNET2005. I am curious as to why one cant use a more direct reading
method if detecting it and writing it seems to be supported by

XmlNodeType.CDATA
XmlWriter.WriteCData

Well if you care about CDATA sections then loop through ChildNodes or
through SelectNodes("text()") and check the NodeType, that way you can
detect them. XPath does not care about them as they are merely syntactic
sugar and e.g.
<foo><![CDATA[a < b && b < c]]></foo>
is equivalent to
<foo>a &lt; b &amp;&amp; b &lt; c</foo>
and gives that foo element the same contents (string value) in the XPath
data model.

The main problem with your sample is that those MSG elements have mixed
contents of elements and text or CDATA sections, that way it takes some
effort if someone wants only the text or CDATA section contents. Otherwise
you could simply access the InnerText property of a MSG elements to get
the text contents. Again it would not matter however if in the markup
there is a text node or a CDATA section node.

Let me try the InnerText idea and if that doesnt pan out then i will have to
iterate through the nodes and get the node type and once i hit an
XmlNodeType.CDATA then Ill know thats my data.

AGP
 
Back
Top