A
AAH
Is there a freeware utility that can convert
the web pages in .htm and .mht format to
to text format?
Thanks
the web pages in .htm and .mht format to
to text format?
Thanks
AAH said:Is there a freeware utility that can convert
the web pages in .htm and .mht format to
to text format?
AAH said:Is there a freeware utility that can convert
the web pages in .htm and .mht format to
to text format?
AAH said:omega said:Is there a freeware utility that can convert
the web pages in .htm and .mht format to
to text format?
You should have no problem finding progs that will bath convert .html
to .txt. For the .mht, I think you should first convert those to .html.
For that, I recommend the MHT-HTML convertor which Anthony Giorgianni
found for us last month.
microsoft.com, description page:
http://tinyurl.com/49qlu [...]
Thanks for the url.
Did you end up finding an html to text convertor you liked? There are a
great number around, and I've not systematically compared them. However,
yesterday I had some directories I needed to process for this, ran about
six+ programs in turn, before finding one whose interface -- and output --
were very satisfying for my project.
http://www.nirsoft.net/utils/htmlastext.html
From the fine, honorable Nir Sofer, a 25k download.
MLC said:_omega_, sabato 11/dic/2004:
Great great great great great great!!!)
I needed something like this, because I'm going to revise some my web pages,
and you found it, no install, 25k download, even converts multiple files,
many useful options, and also converts from command-line!
Thank you![]()
Yet there is one mention I should make on this particular converter. It
doesn't preserve links inside href tags. I didn't need that feature for
my project, but not sure if you will?
MLC said:_omega_, sabato 11/dic/2004:
Do you mean that a
<a href="somepage.html">blabla</a> is rendered as
blabla ?
Maybe it's not much a problem, because I've just registered a domain and
I'll need to change a lot of things, links included.
Now I'm in the fog (do you use this expression in English?):
I'd like to change some contents and the presentation too, but I
haven't taken a decision yet...
omega said:Yes, that's what I mean. It's a pretty rare feature for the batch
convertors to also pull out the links, but I've seen it mentioned.
Though right now I'm only spotting one on my local drive that does
it. Web2Text v1.6 (1999). It has a bad interface (pita to enter target
paths, no wildcards, almost no formatting options) as well as fairly
poor output. If you don't need that feature, then better, since can
choose HTMLAsText, which does things so nicely.
To make sure clarify. Yes: most of them, including HTMLAsText, they
will have only "blabla" in the output file. It's only a very rare
few that will make a point to also put "somepage.htm" in the output.
Susan Bugher said:What interface?I just drag the HTML file I want converted on to
Web2Text's icon and voila! - there's a text file with the same name in
the same directory.
MLC said:A little note, since you love the registry things: as you know, no-install
doesn't mean always that it doesn't write keys into the registry.
It's a pity that this wonderful little prog does write them, because it
wouldn't need it. Since it saves your config settings in a .cfg file of your
choice, I don't understand why it writes the same things in new keys under
HKEY_USERS\...\Software\NirSoft\HTMLAsText
These sort of things drive me nuts, because I think about people who don't
know/care to look into the registry, then it grows, grows,...full of
garbage.
I see that its development is active (Copyright (c) 2004), maybe it would be
worth asking them to not touch the registry. Not me, 'cause my English![]()
This note aside, I like it very much and I think you have to put it in the
KISS thread![]()
omega said:OK, but the drag-drop only does single file. To process a directory, the
interface must be used. There is no place to copy the path. Instead you
have to carpal click 25 times to get to the desired location on disk.
It also does not store recent paths, have to re-enter them upon relaunch.