public inbox for blinux-list@redhat.com
 help / color / mirror / Atom feed
* Cleaning Up a Website
@  John J. Boyer
   ` Fredrik Larsson
   ` jude dashiell
  0 siblings, 2 replies; 4+ messages in thread
From: John J. Boyer @  UTC (permalink / raw)
  To: blinux-list

Hello,

I have gotten myself committed to maintaining a website that was 
originally created with some Windows program. It is full of redundant 
code, and I need to clean it up so I can see what I am doing. Using Lynx, 
if I just print a page, all the formatting, even paragraph separations, 
vanishes, so I don't even get a good text file. Is there some way to 
preserve at least the separation of paragraphs and headings?

Any other hints would also be greatly appreciated.

Thanks,
John


-- 
Computers to Help People, Inc.
http://www.chpi.org
825 East Johnson; Madison, WI 53703




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Cleaning Up a Website
   Cleaning Up a Website John J. Boyer
@  ` Fredrik Larsson
   ` jude dashiell
  1 sibling, 0 replies; 4+ messages in thread
From: Fredrik Larsson @  UTC (permalink / raw)
  To: blinux-list

Hi,

I suggest running the pages through tidy, the W3C tool for cleaning up
html. It at least will remove redundant html.

If you don't find any <p> tags, chanses are that paragraphs are made using
<span class="x"> and stylesheets to set layout properties for class x.

Another way is to try to find a <meta name="generator" content="..."> to
find out what was used to generate the site. Then you can read something
about how that tool generates code.

Hope that helps!

Fredrik



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Cleaning Up a Website
   Cleaning Up a Website John J. Boyer
   ` Fredrik Larsson
@  ` jude dashiell
  1 sibling, 0 replies; 4+ messages in thread
From: jude dashiell @  UTC (permalink / raw)
  To: blinux-list

The w3c site has a tool called tidy-html written by Dave Raggett designed
specifically for your needs.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* cleaning up a website
@  Karl Dahlke
  0 siblings, 0 replies; 4+ messages in thread
From: Karl Dahlke @  UTC (permalink / raw)
  To: blinux-list

Use edbrowse.
You can read in the html file locally, or read it off the net.
Enter the b command to browse,
then the w command to write to a file.
All the paragraphs and headings will be there.
Lists too, numbered or bullet,
and a modest attempt at tables,
using | separaters,
although this doesn't always work out that well.

http://www.eklhad.net/linux/app/

Karl



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~ UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
 Cleaning Up a Website John J. Boyer
 ` Fredrik Larsson
 ` jude dashiell
  -- strict thread matches above, loose matches on Subject: below --
 cleaning up a website Karl Dahlke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).