C
Christopher Robin
I'm inserting a SharePoint List into a SQL Database, but some of the text has
oddly formed HTML tags. I want to remove these tags with a regular
expression, but I'm having some difficulty. My code is below.
Imports System
Imports System.Net
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Xml
Imports SharePointServices
Imports SharePointServices.NorthwindSync
Imports System.Text.RegularExpressions
Imports System.IO
Public Class ScriptMain
Public Sub Main()
Dim DocLoc As String
Dim TextDoc As TextWriter
Dim listService As New Lists()
Dim node As XmlNode
Dim strHtmlString As String
Dim pattern As String = "<[/]?(font|span|div|del|ins|color:\w+)[^>]*?"
DocLoc = "\\MYSERVER\MyFolder\MyFile.xml"
listService.PreAuthenticate = True
listService.Credentials = CredentialCache.DefaultNetworkCredentials
Try
node = ListHelper.GetAllListItems(listService, "My List Name")
strHtmlString = node.InnerXml()
Regex.Replace(strHtmlString, pattern, String.Empty,
RegexOptions.IgnoreCase).Trim()
TextDoc = File.CreateText(DocLoc)
TextDoc.WriteLine(strHtmlString)
TextDoc.Flush()
TextDoc.Close()
Catch ex As Exception
'Raise the error again and the result to failure.
Dts.Events.FireError(1, ex.TargetSite.ToString(), ex.Message,
"", 0)
Dts.TaskResult = Dts.Results.Failure
End Try
Dts.TaskResult = Dts.Results.Success
End Sub
End Class
And here are a few samples of what I'm tryig to remove with the Regex.
"<div></div>"
"<font size=2 color="#1F497D">"
"</font><br> "
Any help would be greatly appreciated.
Thanks,
Chris
oddly formed HTML tags. I want to remove these tags with a regular
expression, but I'm having some difficulty. My code is below.
Imports System
Imports System.Net
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Xml
Imports SharePointServices
Imports SharePointServices.NorthwindSync
Imports System.Text.RegularExpressions
Imports System.IO
Public Class ScriptMain
Public Sub Main()
Dim DocLoc As String
Dim TextDoc As TextWriter
Dim listService As New Lists()
Dim node As XmlNode
Dim strHtmlString As String
Dim pattern As String = "<[/]?(font|span|div|del|ins|color:\w+)[^>]*?"
DocLoc = "\\MYSERVER\MyFolder\MyFile.xml"
listService.PreAuthenticate = True
listService.Credentials = CredentialCache.DefaultNetworkCredentials
Try
node = ListHelper.GetAllListItems(listService, "My List Name")
strHtmlString = node.InnerXml()
Regex.Replace(strHtmlString, pattern, String.Empty,
RegexOptions.IgnoreCase).Trim()
TextDoc = File.CreateText(DocLoc)
TextDoc.WriteLine(strHtmlString)
TextDoc.Flush()
TextDoc.Close()
Catch ex As Exception
'Raise the error again and the result to failure.
Dts.Events.FireError(1, ex.TargetSite.ToString(), ex.Message,
"", 0)
Dts.TaskResult = Dts.Results.Failure
End Try
Dts.TaskResult = Dts.Results.Success
End Sub
End Class
And here are a few samples of what I'm tryig to remove with the Regex.
"<div></div>"
"<font size=2 color="#1F497D">"
"</font><br> "
Any help would be greatly appreciated.
Thanks,
Chris