Web Pages

  • Thread starter Thread starter AAH
  • Start date Start date
A

AAH

Is there a freeware utility that can convert
the web pages in .htm and .mht format to
to text format?
Thanks
 
AAH said:
Is there a freeware utility that can convert
the web pages in .htm and .mht format to
to text format?

You should have no problem finding progs that will bath convert .html
to .txt. For the .mht, I think you should first convert those to .html.
For that, I recommend the MHT-HTML convertor which Anthony Giorgianni
found for us last month.

microsoft.com, description page:
http://tinyurl.com/49qlu
direct download (190k):
http://download.microsoft.com/download/7/b/7/7b703034-1449-4cf1-9610-631c9a0b318c/wa_setup.EXE

It doesn't need MSO. It does need a couple of Outlook Express system files,
for the mht handling.
 
Thanks for the url.


AAH said:
Is there a freeware utility that can convert
the web pages in .htm and .mht format to
to text format?

You should have no problem finding progs that will bath convert .html
to .txt. For the .mht, I think you should first convert those to .html.
For that, I recommend the MHT-HTML convertor which Anthony Giorgianni
found for us last month.

microsoft.com, description page:
http://tinyurl.com/49qlu
direct download (190k):

http://download.microsoft.com/download/7/b/7/7b703034-1449-4cf1-9610-631c9a0b318c/wa_setup.EXE

It doesn't need MSO. It does need a couple of Outlook Express system files,
for the mht handling.
 
AAH said:
omega said:
Is there a freeware utility that can convert
the web pages in .htm and .mht format to
to text format?

You should have no problem finding progs that will bath convert .html
to .txt. For the .mht, I think you should first convert those to .html.
For that, I recommend the MHT-HTML convertor which Anthony Giorgianni
found for us last month.

microsoft.com, description page:
http://tinyurl.com/49qlu [...]

Thanks for the url.

Did you end up finding an html to text convertor you liked? There are a
great number around, and I've not systematically compared them. However,
yesterday I had some directories I needed to process for this, ran about
six+ programs in turn, before finding one whose interface -- and output --
were very satisfying for my project.

http://www.nirsoft.net/utils/htmlastext.html

From the fine, honorable Nir Sofer, a 25k download.
 
_omega_, sabato 11/dic/2004:
Did you end up finding an html to text convertor you liked? There are a
great number around, and I've not systematically compared them. However,
yesterday I had some directories I needed to process for this, ran about
six+ programs in turn, before finding one whose interface -- and output --
were very satisfying for my project.

http://www.nirsoft.net/utils/htmlastext.html

From the fine, honorable Nir Sofer, a 25k download.

Great great great great great great!!! :))

I needed something like this, because I'm going to revise some my web pages,
and you found it, no install, 25k download, even converts multiple files,
many useful options, and also converts from command-line!

Thank you :)
 
MLC said:
_omega_, sabato 11/dic/2004:


Great great great great great great!!! :))

I needed something like this, because I'm going to revise some my web pages,
and you found it, no install, 25k download, even converts multiple files,
many useful options, and also converts from command-line!

Thank you :)

My pleasure! Nir Sofer's abundance of tiny tools are great stuff,
and it's ideal when one is custom-fit for a project at hand. Yet
there is one mention I should make on this particular converter.
It doesn't preserve links inside href tags. I didn't need that
feature for my project, but not sure if you will?
 
_omega_, sabato 11/dic/2004:
Yet there is one mention I should make on this particular converter. It
doesn't preserve links inside href tags. I didn't need that feature for
my project, but not sure if you will?

Do you mean that a
<a href="somepage.html">blabla</a> is rendered as
blabla ?

Maybe it's not much a problem, because I've just registered a domain and
I'll need to change a lot of things, links included.

Now I'm in the fog (do you use this expression in English?): I'd like to
change some contents and the presentation too, but I haven't taken a
decision yet...
 
MLC said:
_omega_, sabato 11/dic/2004:


Do you mean that a
<a href="somepage.html">blabla</a> is rendered as
blabla ?

Yes, that's what I mean. It's a pretty rare feature for the batch
convertors to also pull out the links, but I've seen it mentioned.
Though right now I'm only spotting one on my local drive that does
it. Web2Text v1.6 (1999). It has a bad interface (pita to enter target
paths, no wildcards, almost no formatting options) as well as fairly
poor output. If you don't need that feature, then better, since can
choose HTMLAsText, which does things so nicely.
Maybe it's not much a problem, because I've just registered a domain and
I'll need to change a lot of things, links included.

Now I'm in the fog (do you use this expression in English?):

Yes, we do. Colloquial English would be "in a fog," but the way you
have it is more poetic.
I'd like to change some contents and the presentation too, but I
haven't taken a decision yet...

I could see the advantage to looking over a set of text files first.
Although on the matter of site design, and conceiving plans related
to that, you're far beyond me. I'm in such a thick fog that it's hard
to make out much -- I only hear the waves breaking in the distance. :)
 
To make sure clarify. Yes: most of them, including HTMLAsText, they
will have only "blabla" in the output file. It's only a very rare
few that will make a point to also put "somepage.htm" in the output.
 
omega said:
Yes, that's what I mean. It's a pretty rare feature for the batch
convertors to also pull out the links, but I've seen it mentioned.
Though right now I'm only spotting one on my local drive that does
it. Web2Text v1.6 (1999). It has a bad interface (pita to enter target
paths, no wildcards, almost no formatting options) as well as fairly
poor output. If you don't need that feature, then better, since can
choose HTMLAsText, which does things so nicely.

What interface? ;) I just drag the HTML file I want converted on to
Web2Text's icon and voila! - there's a text file with the same name in
the same directory.

Susan
 
_omega_, sabato 11/dic/2004:
To make sure clarify. Yes: most of them, including HTMLAsText, they
will have only "blabla" in the output file. It's only a very rare
few that will make a point to also put "somepage.htm" in the output.

Your words were clear also in the previous message :-)
Anyway I don't see a problem here, so I'll stick to HTMLAsText.

A little note, since you love the registry things: as you know, no-install
doesn't mean always that it doesn't write keys into the registry.
It's a pity that this wonderful little prog does write them, because it
wouldn't need it. Since it saves your config settings in a .cfg file of your
choice, I don't understand why it writes the same things in new keys under

HKEY_USERS\...\Software\NirSoft\HTMLAsText

These sort of things drive me nuts, because I think about people who don't
know/care to look into the registry, then it grows, grows,...full of
garbage.
I see that its development is active (Copyright (c) 2004), maybe it would be
worth asking them to not touch the registry. Not me, 'cause my English :)

This note aside, I like it very much and I think you have to put it in the
KISS thread :)
 
Susan Bugher said:
What interface? ;) I just drag the HTML file I want converted on to
Web2Text's icon and voila! - there's a text file with the same name in
the same directory.

OK, but the drag-drop only does single file. To process a directory, the
interface must be used. There is no place to copy the path. Instead you
have to carpal click 25 times to get to the desired location on disk.
It also does not store recent paths, have to re-enter them upon relaunch.
Might be have to do with its 1999 vintage. It's seems that there is a lot
of older software that has the torture going on for file/directory opens.

If I was doing a -single- file, or a small number of files, and needed
the feature of the href links included, Notetab Lite would be a very
good candidate. (Even though it means loading and clicking things). It
seems to be quite adept with formatting the output. There is also its
whole scripting customization possible, for one who would put time into
that.

For good directory processing, when not need the one feature, HTMLAsText
really looks the best of all to me.

For an honorable mention, different type of deal... One prog with
interesting output, its the cmdline util, HTMSTRIP. It has a drawback
of not handling LFNs. But other than that, it's worth a look for anyone
interested in getting serious about finely honing text output. Not that
I've read its 84,000 line manual. Mainly gave it test run at basic defaults.
For instance, on the tables from the pricelewarehome.org site, it gave
a very nice-looking text rendition.

http://users.erols.com/waynesof/bruce.htm
http://www.erols.com/waynesof/HTMS0208.ZIP
 
MLC said:
A little note, since you love the registry things: as you know, no-install
doesn't mean always that it doesn't write keys into the registry.

True, no correlation with the two things... Just as so many programs
that don't themselves write to the registry are shipped with installers,
we also have programs shipped no-install when they do write to the reg.
It's a pity that this wonderful little prog does write them, because it
wouldn't need it. Since it saves your config settings in a .cfg file of your
choice, I don't understand why it writes the same things in new keys under

HKEY_USERS\...\Software\NirSoft\HTMLAsText

These sort of things drive me nuts, because I think about people who don't
know/care to look into the registry, then it grows, grows,...full of
garbage.
I see that its development is active (Copyright (c) 2004), maybe it would be
worth asking them to not touch the registry. Not me, 'cause my English :)

I've never mailed a request, but have wished that Nir Sofer would convert
all his software to green. Maybe he'll do a Google vanity search and see
our wish? His stuff is so close to being clean already. Tiny, polite, no
writes to HKCR, no dlls\ocx, no-install. It would be a small step to take
for him to change to usage of local file, instead of automatically forcing
settings to HKCU.
This note aside, I like it very much and I think you have to put it in the
KISS thread :)

You're right, it is KISS. I'm willing to rtfm, and stumble around some,
on software that's hard to figure out... But then it's such a nice treat
when open a KISS prog like this, where right from the main screen, it's all
clear and easy.
 
omega said:
OK, but the drag-drop only does single file. To process a directory, the
interface must be used. There is no place to copy the path. Instead you
have to carpal click 25 times to get to the desired location on disk.
It also does not store recent paths, have to re-enter them upon relaunch.

For my very occasional use ATM this app is handy. KISS - I can
*remember* how to run it. ;)

The info on the pros and cons of these apps is Very Much Appeciated. It
sounds like there are several that are better (especially for heavy
use). Thanks much. :) :) :)

Susan
 
Back
Top