From: Igor Gueths <igueths@attbi.com>
To: speakup@braille.uwo.ca
Subject: Re: word documents
Date: Sun, 16 Mar 2003 09:56:00 -0500 (EST) [thread overview]
Message-ID: <Pine.LNX.4.44.0303160953190.2847-100000@igueths> (raw)
In-Reply-To: <15988.10894.462403.836384@localhost.localdomain>
Hi Dave. I have a not so good result with pdftotext. Basically I converted
a fairly large manual written in pdf format to text, the result was a
textfile that didn't have the line rap set properly. The result was having
about half a line of text, and the other half of the line had been lost.
Needless to say the conversion was not accurate. have you been able to get
around this problem and if so how?
May you code in the power of the source,
may the kernel, libraries, and utilities be with you,
throughout all distributions until the end of the epoch.
On Sun, 16 Mar 2003, Dave Hunt wrote:
> There are Word document viewers for Linux console. The one I use is
> called wv. Another is called antiword. No doubt, there are more.
> Because Word is a proprietary format, and the specification is not
> available, the authors of programs such as wv have had to
> reverse-engineer a bit. Because of this, certain things in the Word
> document may not decode as well as we'd like. Nonetheless, I use wv
> and get reasonable results when converting from Word to html. The
> resulting html source is quite bloated, but, it's there.
>
> For pdf conversion, there's pdftotext. This is part of the xpdf
> package, and may already be on your system. Surprise, it was already
> on my stock installation of RH 7.2. the one thing I don't like about
> pdftotext-s rendering, is that hyperlinks get lost. To preserve the
> navigability of pdf documents, I visit <access.adobe.com>, and submit
> the url of a pdf document (assuming I've found it on the web) to the
> form. What comes back is a nice html rendering (links and all).
>
>
> Hope this helps,
>
>
> -Dave
>
>
> _______________________________________________
> Speakup mailing list
> Speakup@braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>
next prev parent reply other threads:[~ UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
Joseph C. Lininger
` Aaron Howell
` Dave Hunt
` Igor Gueths [this message]
` Dave Hunt
` Charles Crawford
` Jude DaShiell
` Janina Sajka
` Kenny Hitt
` ccrawford
` Kenny Hitt
` Erik Heil
` William F. Acker WB2FLW +1-303-722-7209
` ccrawford
` Joseph C. Lininger
` Danny Crone
` Ann Parsons
` Jude DaShiell
` Ann Parsons
` Charles Crawford
` smart profs, was: " Gregory Nowak
` Ann Parsons
[not found] ` <Pine.LNX.4.33.0303160909310.714-100000@athame.gmpexpress.n et>
` Charles Crawford
Jude DaShiell
` Chuck Hallenbeck
[not found] <20030316235502.7926.4073.Mailman@speech.braille.uwo.ca>
` Thomas Ward
[not found] <20030317062559.19403.13142.Mailman@speech.braille.uwo.ca>
` Thomas Ward
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.44.0303160953190.2847-100000@igueths \
--to=igueths@attbi.com \
--cc=speakup@braille.uwo.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).