K
Keith G Hicks
I'm trying to parse some XML files that contain newspaper articles. Each
file is a separate article. Each element in the file is going to be posted
to a database. I wrote some code previously to read XML files that were laid
out rigidly and had no trouble. But these are not cooperating. They contain
lots of spacing, are not organized nicely line by line and some of the
elements are going to contain html tags (for example the article itself will
have <p>, <b>, <i> and other formatting tags in them). I need to be able to
read the XML tags into variables that I can post to the database. But my old
code for reading XML is not workign in this situation. I've used some
differetn examples I found on various sites but nothing seems to work so
far.
Here is a sample file:
<company_main>
<articles>
<id>
558960
</id>
<location_id>
1
</location_id>
<title>
<p>NY Times counsel</p>
<p>speaks at MSU Law</p>
</title>
<summary>
This is just a bunch of summary information about the article that is in
this file.......
</summary>
<author_id>
1
</author_id>
<text>
<p>
This is<i> paragraph</i> 1 of the article itself. Lorem ipsum dolor sit
amet, consectetur adipiscing elit. Duis nec lorem a tellus pulvinar dapibus.
Proin ut lectus magna. Morbi velit mi, faucibus a malesuada non, vehicula a
leo. Nam dolor elit, adipiscing blandit aliquet non, pellentesque sit amet
justo. Nulla tempor risus in sapien rhoncus mollis. Suspendisse potenti.
Integer vel pulvinar risus.
</p>
<p>
This is<i> paragraph</i> 1 of the article itself. Mauris non dolor erat,
vitae elementum nisl. <b>Sed ac ante ac purus</b> hendrerit tincidunt quis
eget augue. Nam orci mauris, pulvinar vitae faucibus ac, varius quis nunc.
Vestibulum sed feugiat magna.
</p>
<p>
This is<i> paragraph</i> 1 of the article itself. Nam bibendum aliquam
adipiscing. Sed congue rutrum sagittis. Ut neque felis, scelerisque a
adipiscing sit amet, pulvinar sed nisl. Praesent metus tortor, iaculis vitae
tempor at, rhoncus eu felis. Proin luctus, magna sit amet dapibus bibendum,
leo urna semper velit, venenatis dictum quam enim at sem.
</p>
<p>
This is<i> paragraph</i> 1 of the article itself. Proin quis dolor vel
mauris vehicula lobortis in vel nunc. Nullam neque neque, auctor et rutrum
vitae, ultrices in nunc. Sed adipiscing interdum risus et euismod.
</p>
</text>
<date>
10/27/09
</date>
<type>
Published
</type>
<url>
</url>
</articles>
</company_main>
I'm sure it's obvious but I need to read the following:
id
location_id
title
summary
author_id
text
date
type
url
This didn't work (kept finding tags that are not actually XML elements):
Dim xrdr As New XmlTextReader(textFilesLocation &
sArticleToPost)
xrdr.WhitespaceHandling = WhitespaceHandling.None
While xrdr.Read()
If String.Compare(xrdr.Name, "id", True) = 0 Then
ArticleID = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "location_id", True) = 0
Then
LocationID = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "title", True) = 0 Then
ArticleTitle = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "summary", True) = 0 Then
ArticleSummary = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "text", True) = 0 Then
ArticleText = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "author_id", True) = 0 Then
AuthorID = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "date", True) = 0 Then
ArticleDate = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "type", True) = 0 Then
ArticleType = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "url", True) = 0 Then
ArticleURL = Trim(xrdr.ReadElementString())
End If
End While
xrdr.Close()
And this errored as well (error said it found invalid encoding):
Dim m_xmld As XmlDocument
Dim m_nodelist As XmlNodeList
Dim m_node As XmlNode
'Create the XML Document
m_xmld = New XmlDocument()
'Load the Xml file
m_xmld.Load(textFilesLocation & sArticleToPost)
'Get the list of name nodes
m_nodelist = m_xmld.SelectNodes("/company_main/articles")
'Loop through the nodes
For Each m_node In m_nodelist
ArticleID = m_node.Attributes.GetNamedItem("id").Value
LocationID =
m_node.Attributes.GetNamedItem("location_id").Value
ArticleTitle =
m_node.Attributes.GetNamedItem("title").Value
Next
Any help would be greatly appreciated!
Keith
file is a separate article. Each element in the file is going to be posted
to a database. I wrote some code previously to read XML files that were laid
out rigidly and had no trouble. But these are not cooperating. They contain
lots of spacing, are not organized nicely line by line and some of the
elements are going to contain html tags (for example the article itself will
have <p>, <b>, <i> and other formatting tags in them). I need to be able to
read the XML tags into variables that I can post to the database. But my old
code for reading XML is not workign in this situation. I've used some
differetn examples I found on various sites but nothing seems to work so
far.
Here is a sample file:
<company_main>
<articles>
<id>
558960
</id>
<location_id>
1
</location_id>
<title>
<p>NY Times counsel</p>
<p>speaks at MSU Law</p>
</title>
<summary>
This is just a bunch of summary information about the article that is in
this file.......
</summary>
<author_id>
1
</author_id>
<text>
<p>
This is<i> paragraph</i> 1 of the article itself. Lorem ipsum dolor sit
amet, consectetur adipiscing elit. Duis nec lorem a tellus pulvinar dapibus.
Proin ut lectus magna. Morbi velit mi, faucibus a malesuada non, vehicula a
leo. Nam dolor elit, adipiscing blandit aliquet non, pellentesque sit amet
justo. Nulla tempor risus in sapien rhoncus mollis. Suspendisse potenti.
Integer vel pulvinar risus.
</p>
<p>
This is<i> paragraph</i> 1 of the article itself. Mauris non dolor erat,
vitae elementum nisl. <b>Sed ac ante ac purus</b> hendrerit tincidunt quis
eget augue. Nam orci mauris, pulvinar vitae faucibus ac, varius quis nunc.
Vestibulum sed feugiat magna.
</p>
<p>
This is<i> paragraph</i> 1 of the article itself. Nam bibendum aliquam
adipiscing. Sed congue rutrum sagittis. Ut neque felis, scelerisque a
adipiscing sit amet, pulvinar sed nisl. Praesent metus tortor, iaculis vitae
tempor at, rhoncus eu felis. Proin luctus, magna sit amet dapibus bibendum,
leo urna semper velit, venenatis dictum quam enim at sem.
</p>
<p>
This is<i> paragraph</i> 1 of the article itself. Proin quis dolor vel
mauris vehicula lobortis in vel nunc. Nullam neque neque, auctor et rutrum
vitae, ultrices in nunc. Sed adipiscing interdum risus et euismod.
</p>
</text>
<date>
10/27/09
</date>
<type>
Published
</type>
<url>
</url>
</articles>
</company_main>
I'm sure it's obvious but I need to read the following:
id
location_id
title
summary
author_id
text
date
type
url
This didn't work (kept finding tags that are not actually XML elements):
Dim xrdr As New XmlTextReader(textFilesLocation &
sArticleToPost)
xrdr.WhitespaceHandling = WhitespaceHandling.None
While xrdr.Read()
If String.Compare(xrdr.Name, "id", True) = 0 Then
ArticleID = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "location_id", True) = 0
Then
LocationID = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "title", True) = 0 Then
ArticleTitle = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "summary", True) = 0 Then
ArticleSummary = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "text", True) = 0 Then
ArticleText = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "author_id", True) = 0 Then
AuthorID = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "date", True) = 0 Then
ArticleDate = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "type", True) = 0 Then
ArticleType = Trim(xrdr.ReadElementString())
End If
If String.Compare(xrdr.Name, "url", True) = 0 Then
ArticleURL = Trim(xrdr.ReadElementString())
End If
End While
xrdr.Close()
And this errored as well (error said it found invalid encoding):
Dim m_xmld As XmlDocument
Dim m_nodelist As XmlNodeList
Dim m_node As XmlNode
'Create the XML Document
m_xmld = New XmlDocument()
'Load the Xml file
m_xmld.Load(textFilesLocation & sArticleToPost)
'Get the list of name nodes
m_nodelist = m_xmld.SelectNodes("/company_main/articles")
'Loop through the nodes
For Each m_node In m_nodelist
ArticleID = m_node.Attributes.GetNamedItem("id").Value
LocationID =
m_node.Attributes.GetNamedItem("location_id").Value
ArticleTitle =
m_node.Attributes.GetNamedItem("title").Value
Next
Any help would be greatly appreciated!
Keith