* pdf documents
@ Daniel Dalton
` Tony Baechler
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Daniel Dalton @ UTC (permalink / raw)
To: Linux for blind general Discussion
Hi,
Does anyone know if it is possible to read pdf documents in a console with
speakup?
I don't want to have to use a gui.
If so what application should I use?
Thanks for any help,
--
Daniel Dalton
http://members.iinet.net.au/~ddalton/
d.dalton@iinet.net.au
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: pdf documents
pdf documents Daniel Dalton
@ ` Tony Baechler
` Matt Barnes
` Geoff Shang
2 siblings, 0 replies; 9+ messages in thread
From: Tony Baechler @ UTC (permalink / raw)
To: Linux for blind general discussion
Daniel Dalton wrote:
> Does anyone know if it is possible to read pdf documents in a console
> with speakup?
Yes.
> I don't want to have to use a gui.
So don't use one.
> If so what application should I use?
Try the pdftotext utility from the xpdf package. You didn't say what
distro you run but I know it's prepackaged for Debian and probably in
.rpm. You could of course compile from source as well. Note that
pdftotext won't read files with the no copy and no print bits set. It
dumps the pdf file to the same name with a .txt extension which you can
read with less(1) or more(1). It might overwrite already existing files
with the same name and .txt extension as the pdf file you convert. You
don't need the other programs in the package. You used to get it from
http://www.foolabs.com/ but this might have changed. You can always get
source from Debian ftp mirrors.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pdf documents
pdf documents Daniel Dalton
` Tony Baechler
@ ` Matt Barnes
` Tony Baechler
` Geoff Shang
2 siblings, 1 reply; 9+ messages in thread
From: Matt Barnes @ UTC (permalink / raw)
To: Linux for blind general discussion
[-- Attachment #1: Type: text/plain, Size: 795 bytes --]
Tesseract is an OCR and can convert pdf's and images to text. I haven't
gotten around to installing it and trying it out, but it seems like the OCR
of choice, located here:
http://sourceforge.net/project/showfiles.php?group_id=158586
On Dec 27, 2007 4:54 AM, Daniel Dalton <d.dalton@iinet.net.au> wrote:
> Hi,
>
> Does anyone know if it is possible to read pdf documents in a console with
> speakup?
> I don't want to have to use a gui.
> If so what application should I use?
>
> Thanks for any help,
>
> --
> Daniel Dalton
>
> http://members.iinet.net.au/~ddalton/<http://members.iinet.net.au/%7Eddalton/>
> d.dalton@iinet.net.au
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list@redhat.com
> https://www.redhat.com/mailman/listinfo/blinux-list
>
[-- Attachment #2: Type: text/html, Size: 1351 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: pdf documents
` Matt Barnes
@ ` Tony Baechler
0 siblings, 0 replies; 9+ messages in thread
From: Tony Baechler @ UTC (permalink / raw)
To: Linux for blind general discussion
Yes, but unless I'm badly mistaken, it is very old and doesn't support
directly extracting images from pdf files. You would still need to
install the xpdf package to get the pdfimages utility so you can process
the images as single files. I read about the OCR package you describe
but I'm fairly sure it's old and unmaintained. Maybe someone was going
to take over development, I'm not sure. I've noticed that most pdf
files are text and don't have page images, or if they do, the images are
pictures so would be useless anyway. Also, what is the accuracy rate
for this OCR package? What about accessibility?
Matt Barnes wrote:
> Tesseract is an OCR and can convert pdf's and images to text. I
> haven't gotten around to installing it and trying it out, but it seems
> like the OCR of choice, located here:
> http://sourceforge.net/project/showfiles.php?group_id=158586
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pdf documents
pdf documents Daniel Dalton
` Tony Baechler
` Matt Barnes
@ ` Geoff Shang
` Tony Baechler
2 siblings, 1 reply; 9+ messages in thread
From: Geoff Shang @ UTC (permalink / raw)
To: Linux for blind general discussion
Hi,
In addition to pdftotext, I can also recommend pstotext (in Debian it's in
its own package).
pstotext outputs to standard output by default. You can save the output
by either using > to redirect to a file, or by using the -output command
line
option.
I have both pstotext and pdftotext installed here. Results seem to vary as
to which is better and you may want to try both if a document is proving
difficult to read and see which gives the best results.
Geoff.
--
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: pdf documents
` Geoff Shang
@ ` Tony Baechler
` Mike Gorse
0 siblings, 1 reply; 9+ messages in thread
From: Tony Baechler @ UTC (permalink / raw)
To: Linux for blind general discussion
Hi,
Have you experimented with the pdftotext -layout and -raw options? I've
noticed that usually no command line options produces usable output but
sometimes using the -raw command line optin works better. Then again,
sometimes it makes it worse. There's generally a decrease in the output
file size when using the -raw option. Also, since presumably the
interest of most people here is in making pdf documents accessible or
somehow finding a way to use the text from them, do you know of a way to
get around the protection bits? I'm not trying to pirate or anything
else, but I know of at least one major audio editing package which ships
manuals that can't be read with pdftotext because of the no-print and
no-copy bits. I know the text is there in the pdf file because they
sent me an unprotected copy upon request, but they have been bought out
by a major media company now.
Geoff Shang wrote:
> I have both pstotext and pdftotext installed here. Results seem to
> vary as to which is better and you may want to try both if a document
> is proving difficult to read and see which gives the best results.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: pdf documents
` Tony Baechler
@ ` Mike Gorse
` Tony Baechler
0 siblings, 1 reply; 9+ messages in thread
From: Mike Gorse @ UTC (permalink / raw)
To: Linux for blind general discussion
pdftotext has a few lines of code that simply check the permission bits,
and, if they are set to disallow copying, print an error message and exit.
-- Mike Gorse / AIM:linvortex / http://mgorse.freeshell.org --
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: pdf documents
` Mike Gorse
@ ` Tony Baechler
` Mike Gorse
0 siblings, 1 reply; 9+ messages in thread
From: Tony Baechler @ UTC (permalink / raw)
To: Linux for blind general discussion
Fine, but that doesn't help us non-programmers like me. I admit that I
haven't looked at the source but I wouldn't know what I'm looking at
anyway. I took a brief look but it meant nothing to me. Again, I'm not
trying to pirate or anything else, I just want equal access to pdf
documents without using an OCR package. The OCR package I used in the
past created errors that weren't in the original document.
Mike Gorse wrote:
> pdftotext has a few lines of code that simply check the permission
> bits, and, if they are set to disallow copying, print an error message
> and exit.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: pdf documents
` Tony Baechler
@ ` Mike Gorse
0 siblings, 0 replies; 9+ messages in thread
From: Mike Gorse @ UTC (permalink / raw)
To: Linux for blind general discussion
If you know the error string that gets displayed, then using grep -r
should point you to the relevant code. Actually, it doesn't look as
though my version of pdftotext enforces permissions (I have the version
included with poppler, which is a fork of xpdf made into a library and
used by evince and possibly other projects). It will enforce the
permissions if ENFORCE_PERMISSIONS is defined at compile time, but it is
not by default.
-- Mike Gorse / AIM:linvortex / http://mgorse.freeshell.org --
On Sat, 29 Dec 2007, Tony Baechler wrote:
> Fine, but that doesn't help us non-programmers like me. I admit that I
> haven't looked at the source but I wouldn't know what I'm looking at anyway.
> I took a brief look but it meant nothing to me. Again, I'm not trying to
> pirate or anything else, I just want equal access to pdf documents without
> using an OCR package. The OCR package I used in the past created errors that
> weren't in the original document.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~ UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
pdf documents Daniel Dalton
` Tony Baechler
` Matt Barnes
` Tony Baechler
` Geoff Shang
` Tony Baechler
` Mike Gorse
` Tony Baechler
` Mike Gorse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).