HTML file handling/modification in Visual Basic .NET application

  • Thread starter Thread starter Jay Kim
  • Start date Start date
J

Jay Kim

Hi,

We're implementing a Windows application using Visual
Basic .NET.

One of the key features we need to implement is that we
should be able to get the accurate byte offset of user
selected text in the file.

We've been trying to use the RichTextBox control to load
a file, and get the offset when the user selects some
text in the file. It's been working ok with regular text
files, but we can't use the RichTextBox control to handle
an HTML file.

We're looking for a control that loads a HTML file,
displays it correctly like web-browser, and

1. can get(return) actual position or byte offset of user
selected text

OR

2. has a feature to allow us to modify HTML source, and
put some tags into the specific location, which user
selected


We want to keep the original HTML source as much as
possible if we need to put our own tags into the file to
remember the selected location.

We've tried some third-party controls that can load HTML
file in VB form, but none of them have the features that
we need, and some of them have some issues like modifying
(converting) original HTML source which we don't want.

Please let us know how to implement this.

Thanks.
 
Hi Jay,
1. can get(return) actual position or byte offset of user
selected text

I think it is hard to be done.
e.g. you can use javascript (document.write ) to generate some html text
then how to located the text in html file.

OR

2. has a feature to allow us to modify HTML source, and
put some tags into the specific location, which user
selected

I think you may try to use the MSHTML.

Here I write a simple sample.
Private Sub Command1_Click()
Dim doc As MSHTML.HTMLDocument
Set doc = WebBrowser1.Document
Dim nd As MSHTML.HTMLDivElement
Set nd = doc.getElementById("mntl")
nd.contentEditable = True
nd.innerText = "hello"
End Sub
Private Sub Form_Load()
WebBrowser1.Navigate "http://www.yahoo.com/"
End Sub

Introduction to MSHTML Editing
http://msdn.microsoft.com/library/default.asp?url=/workshop/entry.asp

Regards,
Peter Huang
Microsoft Online Partner Support
Get Secure! www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
--------------------
 
Hi Peter,
Thanks for your response.
I'm trying to test with MSHTML as you suggested.
I was reading through the page in the link that you gave
me, but I'm not sure how to use MSHTML in VB .NET.
Could you tell me more about MSHTML in detail?
Thanks.

Jay
 
Hi,

MSHTML is a class library used to manipulate the HTML Document Object Model.
To use it in .NET, you may need to add reference to Microsoft.mshtml first.
You may achieve by right click on the reference/Add reference/.NET and
select the Microsoft.mshtml

Here is a helpful link, you may take a look.

document Object
http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/ref
erence/objects/obj_document.asp


Here is some code I write in VB.NET.

Imports mshtml

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click
Dim doc As mshtml.HTMLDocument
doc = AxWebBrowser1.Document
Dim nd As mshtml.HTMLDivElement
nd = doc.getElementById("mntl")
nd.contentEditable = True
nd.innerText = "hello"
End Sub

Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs)
Handles MyBase.Load
AxWebBrowser1.Navigate("http://www.yahoo.com")
End Sub

If you have any concern on this issue, please post here.

Regards,
Peter Huang
Microsoft Online Partner Support
Get Secure! www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Hi Peter,
Thank you very much for your help.
I'm trying to test as you suggested.
I'll let you know if it works.
Thanks again.

Jay
 
Hi Jay,

I look forward to hearing from you.
If you have any concern on this problem, please feel free to post here.

Regards,
Peter Huang
Microsoft Online Partner Support
Get Secure! www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Thanks for your info.
I took a quick look at it. It was not what I can use in
my project, but it was helpful to get some more ideas.
Thanks.

Jay
 
Hi Peter,
I really appreciate your help on this.
I think it is what I wanted. I'm almost positive that we
can implement all the features that we need with these,
but I'm having difficulties to figure them out.
I was able to open up an HTML file using WebBrowser
control, and manipulate it using MSHTML.
But, I still have some issues to solve.
First of all, how do I save the document after modifying
it without asking for the users - which means, we don't
want to show "Save as" popup to users. We want to save it
programmatically.
And, what I want to implement is that when I select some
text in the browser control, and click a button, it
inserts a HTML tag, such as <a name.> tag or <font> tag,
around the text.
For example,
Text in HTML : 1234567
if you have selected "345"
the original source would be : 12<a name="tag_1">345</a>67

I could make it somehow, but it doesn't always work, for
example,
Original source : 12<a name="tag_1">345</a>67
If you have selected "56"
The source HTML would be : 12<a
name="tag_1">34</a><a name="tag_2"> <a
name="tag_1">5</a>6</a>7

So, it's screwed up the previous HTML tags sometimes.
Please take a look at my code sample below, and advise me
if you have better solution for my case.
Thanks very much again for your help.

################################################
# Sample VB Code
################################################
Public Class frmTest
Inherits System.Windows.Forms.Form

Private doc As mshtml.HTMLDocument
Private idx As Long

Private Sub frmTest_Load(ByVal sender As
System.Object, ByVal e As System.EventArgs) Handles
MyBase.Load

AxWebBrowser1.Navigate2("C:\test.html")
doc = AxWebBrowser1.Document
idx = 0

End Sub


Private Sub Button1_Click_1(ByVal sender As
System.Object, ByVal e As System.EventArgs) Handles
Button1.Click

Dim rg As mshtml.IHTMLTxtRange
Dim sID As String

idx = idx + 1
sID = "myID_" & idx

If doc.selection.type = "Text" Then
'(1.1)
doc.execCommand("CreateBookmark", False, sID)

rg = doc.selection.createRange
If Not rg Is Nothing Then
'(2.1)
rg.pasteHTML("<a name=""" & sID
& """></a><span class=""my_class"">" & rg.htmlText
& "</span>")
'(2.2)
rg.pasteHTML("<a name=""" & sID
& """></a><span class=""my_class"">" & rg.text
& "</span>")

'(3.1)
rg.pasteHTML("<a name=""" & sID
& """><span class=""my_class"">" & rg.htmlText
& "</span></a>")
'(3.2)
rg.pasteHTML("<a name=""" & sID
& """><span class=""my_class"">" & rg.text
& "</span></a>")

'(4.1)
rg.pasteHTML("<span id=""" & sID & """
class=""my_class"">" & rg.htmlText & "</span>")
'(4.2)
rg.pasteHTML("<span id=""" & sID & """
class=""my_class"">" & rg.text & "</span>")
Else
MsgBox("Invalid selection")
End If
End If

End Sub

End Class


################################################
# Sample HTML - test.html
################################################
<html>
<head>
<title>Test</title>
<style type="text/css">
.my_class {
color: rgb(255,0,0);
}
</style>
</head>
<body>
<h1>Test HTML page

<p>12<font color="#ff0000">34</font>567890
ABCDEF<font
color="#ffff00">GHIJKL</font>MNOPQRSTUVWXYZ</p>

<p>123<a name="myID_100">456</a>7890 ABCD<span
class="my_class">EFG</span>HIJKLMNOPQRSTUVWXYZ</p>
</body>
</html>
 
Hi Jay,

#1
If you will want to Save file without prompting the user. Sorry, that is
not possible using by simply using WebBrowser control. You can probably
improve that using SendKeys etc. But if you really need to download a file
to hard disk without user prompting, the best bet would be use URL
Monikers. In specific you probably want to use URLDownloadToFile API. You
can find more documentation regarding URL Monikers at
<http://msdn.microsoft.com/workshop/networking/moniker/overview/overview.aspand URLDownloadToFile at
<http://msdn.microsoft.com/workshop/networking/moniker/reference/functions/U
RLDDownloadToFile.asp>. Alternatively you can also consider using WinInet
APIs directly
(<http://msdn.microsoft.com/workshop/networking/wininet/overview/overview.as
p>)


#2
Since MSHTML is used to manipulate the html node, I think you may need to
do the parse work yourself.

=============================================
Text in HTML : 1234567
if you have selected "345"
the original source would be : 12<a name="tag_1">345</a>67
=============================================
In such case, all is ok, because what you are select are in the same node.

=================================================
I could make it somehow, but it doesn't always work, for
example,
Original source : 12<a name="tag_1">345</a>67
If you have selected "56"
The source HTML would be : 12<a
name="tag_1">34</a><a name="tag_2"> <a
name="tag_1">5</a>6</a>7
================================================
If this case, I think you may try to judge if the if the 5 and 6 is in the
same node.
If no, then if 6 and 5 will be format as the same format.
if yes,
<a name="tag_1">345</a>6 to <a name="tag_1">3456</a>
You may get the node "tag_1" and change its text, in the mean
time, delete the 6 from its original node


That is to say, all operation should based on the node.



Regards,
Peter Huang
Microsoft Online Partner Support
Get Secure! www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
Back
Top