* Cleaning Up a Website
@ John J. Boyer
` Fredrik Larsson
` jude dashiell
0 siblings, 2 replies; 4+ messages in thread
From: John J. Boyer @ UTC (permalink / raw)
To: blinux-list
Hello,
I have gotten myself committed to maintaining a website that was
originally created with some Windows program. It is full of redundant
code, and I need to clean it up so I can see what I am doing. Using Lynx,
if I just print a page, all the formatting, even paragraph separations,
vanishes, so I don't even get a good text file. Is there some way to
preserve at least the separation of paragraphs and headings?
Any other hints would also be greatly appreciated.
Thanks,
John
--
Computers to Help People, Inc.
http://www.chpi.org
825 East Johnson; Madison, WI 53703
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Cleaning Up a Website
Cleaning Up a Website John J. Boyer
@ ` Fredrik Larsson
` jude dashiell
1 sibling, 0 replies; 4+ messages in thread
From: Fredrik Larsson @ UTC (permalink / raw)
To: blinux-list
Hi,
I suggest running the pages through tidy, the W3C tool for cleaning up
html. It at least will remove redundant html.
If you don't find any <p> tags, chanses are that paragraphs are made using
<span class="x"> and stylesheets to set layout properties for class x.
Another way is to try to find a <meta name="generator" content="..."> to
find out what was used to generate the site. Then you can read something
about how that tool generates code.
Hope that helps!
Fredrik
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Cleaning Up a Website
Cleaning Up a Website John J. Boyer
` Fredrik Larsson
@ ` jude dashiell
1 sibling, 0 replies; 4+ messages in thread
From: jude dashiell @ UTC (permalink / raw)
To: blinux-list
The w3c site has a tool called tidy-html written by Dave Raggett designed
specifically for your needs.
^ permalink raw reply [flat|nested] 4+ messages in thread
* cleaning up a website
@ Karl Dahlke
0 siblings, 0 replies; 4+ messages in thread
From: Karl Dahlke @ UTC (permalink / raw)
To: blinux-list
Use edbrowse.
You can read in the html file locally, or read it off the net.
Enter the b command to browse,
then the w command to write to a file.
All the paragraphs and headings will be there.
Lists too, numbered or bullet,
and a modest attempt at tables,
using | separaters,
although this doesn't always work out that well.
http://www.eklhad.net/linux/app/
Karl
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~ UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
Cleaning Up a Website John J. Boyer
` Fredrik Larsson
` jude dashiell
-- strict thread matches above, loose matches on Subject: below --
cleaning up a website Karl Dahlke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).