XML to plain text

  • Thread starter Thread starter Big D
  • Start date Start date
B

Big D

I have a simple xml file that contains, in part, content that is in HTML. I
am encompassing that content in <![cdata[]]> tags. This works fine.

However, my application needs to output the XML file (from a strongly typed
dataset) to plain text. I am doing

theText = myDataset.getXML()

Which works, but it doesn't "remember" the portions that were in cdata tags,
so that content gets parsed, turning every html tag int &lt;b&gt;, etc...

Is there a simple way to output that data without parsing it, and forcing
certain nodes to use the cdata tag? The getXML function accepts no
parameters.

Thanks!

MCD
 
Since I believe this issomething which can't get fixed in five seconds try:
theText =Replace(theText, "<", "&lt;")
theText =Replace(theText, ">", "&gt;")
 
Hi BigD,

Did you mean this?

\\\It start with making a sample dataset
Dim ds As New DataSet
Dim dt As New DataTable("parameters")
For c As Integer = 1 To 10
Dim dc As New DataColumn("elem" & c.tostring)
dt.Columns.Add(dc)
Next
For r As Integer = 1 To 10
Dim dr As DataRow = dt.NewRow
For c As Integer = 1 To 10
dr("elem" & c.tostring) = _
r.ToString & c.tostring ' or just dr(c) but to show you
Next
dt.Rows.Add(dr) ' can also before but I find this looking nicer
Next
ds.Tables.Add(dt)
-- end building sample dataset
Dim ser As XmlSerializer = New XmlSerializer(GetType(DataSet))
Dim ms As New IO.MemoryStream
Dim sw As IO.TextWriter = New IO.StreamWriter(ms)
ser.Serialize(sw, ds)
Dim b As Long = ms.Length
ms.Position = 0
Dim sr As IO.TextReader = New IO.StreamReader(ms)
Dim xmlstring As String = sr.ReadToEnd
sw.Close()
sr.Close()
ms.Close()
///
I hope this helps a little bit?

Cor
 
Hi Big D,

I have reviewed your issue. I will spend some time to do some research on
this issue.

I will reply to you ASAP. Thanks for your understanding.

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Hi Jeffrey.
I have reviewed your issue. I will spend some time to do some research on
this issue.

I will reply to you ASAP. Thanks for your understanding.

While there are 2 answers that are unanswered if it fits.

Is there a difference with the actions from MOPS between the persons who are
asking questions to this newsgroup?

If this is not an accident I think this is embarrassing.

Cor
 
Richard,

Thanks for the reply.

Yes, obviously I could do that. However, for one, I'm not confident that
"<,>" are the only characters getting parsed out. Secondly, The problem is
that even if I could do a find and replace on all the parsed characters,
this information would not exist within the <![cdata[]]> tag, so it won't be
a valid XML document. (the GetXML() funciton just places the parsed data in
between the tags without "knowing" that previously it was in cdata)

Is this a part of the schema that I need to adjust to notify it to expect
these characters? If so, how?

Thanks for the input!

-MCD

Richard T. [email protected] said:
Since I believe this issomething which can't get fixed in five seconds try:
theText =Replace(theText, "<", "&lt;")
theText =Replace(theText, ">", "&gt;")

Big D said:
I have a simple xml file that contains, in part, content that is in
HTML.
I
am encompassing that content in <![cdata[]]> tags. This works fine.

However, my application needs to output the XML file (from a strongly typed
dataset) to plain text. I am doing

theText = myDataset.getXML()

Which works, but it doesn't "remember" the portions that were in cdata tags,
so that content gets parsed, turning every html tag int &lt;b&gt;, etc...

Is there a simple way to output that data without parsing it, and forcing
certain nodes to use the cdata tag? The getXML function accepts no
parameters.

Thanks!

MCD
 
Hey Cor,

Thanks for the reply. I haven't tried the code bit, but it doesn't seem
like what I need. First off, It appears that you are programmatically
building the dataset, not from the schema... that is a neccescity for my
design. The cool part of how I have it working is that since it's a
strongly typed dataset, it's super easy to work with, I don't have to know
everything about the schema in order to operate on parts of it, and the
GetXML() function is EXACTLY what I want to do, EXCEPT of course that it is
parsing the "<" characters and such.

To me is seems like a schema issue. Previously I have just manually entered
the CDATA tag into fields where I knew that there would be HTML. It seems
like VS should be able to know from a setting in the xsd that the element
contains illegal characters. That way, when GetXML reads the schema to
output the data in the dataset, it would know what to do.

Maybe I'm dreaming.

;-)

Thanks!

MCD
 
Hi Big D,

Sorry for letting you wait for so long time.

After consult to the product team, I know the cause of the problem.

Actually, this behavior is by design.

This is the way XML is supposed to be serialized to a string. The "<"
character is not allowed to occur in text or attribute content because it
marks the beginning of a markup, therefore we escape it as &lt;. We also
escape ">" for compatibility reasons. If you look at the dataset content
though, you should see "<" and ">" in the value unescaped.

XML spec section 2.4:

The ampersand character (&) and the left angle bracket (<) may appear in
their literal form only when used as markup delimiters, or within a
comment, a processing instruction, or a CDATA section. If they are needed
elsewhere, they must be escaped using either numeric character references
or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>)
may be represented using the string "&gt;", and must, for compatibility, be
escaped using "&gt;"

So as a workaround, you may follow Richard's suggestion to parse the string
yourself.

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Hi Jeffrey,

Now I get curious, can you tell me why I have that design behaviour not with
the sample I have send.

Cor
 
Hi,

I saw overlooking my sample, that I forgot to tell that it needs an import
to
System.Xml.Serialization
Or that you have to set it before the xmlseralizer.

Cor
 
Hi Cor,

Oh, sorry, I can not see any succees in your solution.

In your solution, you build a dataset yourself, which contains no CDATA
section, also, your self-produced dataset contains no "special"
character(such as "<" or ">").

I have tested your solution in the correct way in C#, but it also does not
work, like this:
private void button1_Click(object sender, System.EventArgs e)
{
DataSet ds=new DataSet();
ds.ReadXml(@"D:\newtest.xml");

XmlSerializer ser=new XmlSerializer(typeof(DataSet));
MemoryStream ms=new MemoryStream();
TextWriter sw=new StreamWriter(ms);
ser.Serialize(sw, ds);

long b=ms.Length;
ms.Position=0;

TextReader sr=new StreamReader(ms);
string xmlstring =sr.ReadToEnd();
sw.Close();
sr.Close();
ms.Close();
}

Then, in debugger you will see that the CDATA section in my
"D:\newtest.xml" is also parsed(That is "<" becomes "&lt")


Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Hi Jeffrey,

Thank you for your message. It made my confusion totally clear.

I was going for the dataset alone to string, while the problem is a HTML
portion text saved as a string in a dataset.

Thinking it over than the answer for Big D is of course very simple.

To get the portions in the dataset, read it with dataset.readXML(path) and
then just write the items as needed with the streamreader to disk or just
use it.

(The answer can be "by design", but I think that the addition must than be
that it when it is written in this way by the ds.writexml it is readed in
the properiate size back with ds.readxml).

Just my thoughts

Cor
 
Back
Top