---- cut here (makefile) ----
%:%.html
$(RM) temp.html temp.out
sed -e 's=</*I>=_=g' -e 's=</*STRONG>=\*=g' \
-e 's=BLOCKQUOTE=PRE=g' -e 's=<HR>=-------=' $< > temp.out
awk -f faq.awk temp.out > temp.html
$(RM) temp.out
lynx -dump temp.html > temp.out
cat temp.out | tail +5 | \
sed \
-e 's/^ -------/----------------------------------------------------------------------/' \
-e 's/^ -------/------------------------------/' \
-e 's/ *$$//' -e 's/^ //' | uniq > $@
$(RM) temp.html temp.out
---- and here (faq.awk) ----
#!/bin/awk -f
#
# looks for <PRE> pre-formatted regions and removes <BR> from those regions.
#
BEGIN {
pre = 0;
}
$1 ~ /\<PRE\>/ {
pre = 1;
}
$1 ~ /\<\/PRE\>/ {
pre = 0;
}
$NF ~ /\<BR\>/ {
if (pre == 1) {
s = "";
k = length($0) - 3;
while (substr($0,k,4) != "<BR>")
k = k - 1;
s = substr($0,0,k - 1);
print s;
}
else
print $0;
next;
}
{
print $0;
}
---- and here ----
Check out one of my FAQs for the input format I use, which must be
followed to get good looking text out of the system. For example, the
Chalkhills FAQ is at "http://idaho.ig.com/chalkhlls/html/FAQ.html".
Specifically, I use <BLOCKQUOTE> for quotations, but the text in both
<BLOCKQUOTE> and <PRE> sections must be indented 12 spaces and
formatted as if <PRE>-formatted. The text version actually changes
all <BLOCKQUOTE>s to <PRE>s before formatting. Note the answers are
all in a <UL> list. The script also converts <HR>s to something
vaguely resembling RFC 1153 format (very vaguely). I also convert
<I>talics to _underlines_ (because I deal with a lot of album titles),
and <STRONG>s to *emphasis here*, just to get the point across.
Anyway, as I say, the best way to get an idea of the format is
actually to look at the HTML source for one of my FAQs.
I suppose I could re-write this in Perl, but I haven't. Sorry.
-- John
-- http://www.ig.com/~relph/
[
Usenet Hypertext FAQ Archive |
Search Mail Archive |
Authors |
Usenet
]
[
1993 |
1994 |
1995 |
1996 |
1997
]
© Copyright The Landfield Group, 1997
All rights reserved