* Re: pdf to html
pdf to html Karl Dahlke
@ ` Andor Demarteau
` Jason White
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Andor Demarteau @ UTC (permalink / raw)
To: blinux-list
On Thu, 26 Apr 2001, Karl Dahlke wrote:
> access.adobe.com/tools.html
Is this a linux/unix tool or is it windoz only carbage?
> In other words, Adobe knows its "standard"
> is totally inaccessible, and their trying to do something about it.
> I guess that's something.
> I ran my 6 megabyte pdf administrators guide through it,
> and out came the 2 megabyte html equivalent.
> As a conversion utility, I'd give it a C+, maybe a B-.
> But at least it worked, and I can read the manual
> and get on with my job.
>
> One of the most irritating features of the tool is its tendency
> to print its error messages right in the text,
> and there are plenty of them.
> At this point I'm glad I wrote my own editor browser.
> One line of perl code strips them out.
> If you're using standard software such as lynx,
> you might be able to save the html page with the -source option,
> run the following perl command on it (sed won't do),
> and then view the modified local file.
> That will get rid of the errors.
> Here's the relevant chunk of perl code from my editor.
>
> # One of the common problems in the translation is
> # the following meaningless string, that appears over and over again.
> # I'm removing it here.
> $text =~ s/Had\strouble\sresolving\sdest\snear\sword\s(<[\w_]+>\s)?action\stype\sis\sGoToR//g;
>
>
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list@redhat.com
> https://listman.redhat.com/mailman/listinfo/blinux-list
>
slainte mhaith (good health)
slainte (cheers)
-----------
Andor Demarteau E-mail: ademarte@students.cs.uu.nl
student computer science www: http://www.students.cs.uu.nl/~ademarte/
Utrecht University webpage as been updated :)
-----------
Believe in yourself, know what you want, and make it happen!
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: pdf to html
pdf to html Karl Dahlke
` Andor Demarteau
@ ` Jason White
` B. Alan Mattison
` Andor Demarteau
3 siblings, 0 replies; 12+ messages in thread
From: Jason White @ UTC (permalink / raw)
To: blinux-list
There already exist useful, albeit in certain respects limited, free
(open-source) tools which can convert PDF documents to text or HTML.
Specifically, the pdftotext utility supplied as part of the XPDF
package (http://www.foolabs.com/xpdf/) will convert PDF files to ASCII
text. HTML conversion is provided by pdftohtml:
http://www.ra.informatik.uni-stuttgart.de/~gosho/pdftohtml/
(preserving links and font details).
It should also be noted that PDF version 1.3 allows for the full
representation of document logical structure, essentially as an
element hierarchy as in SGML or XML, with application-specific
attributes. There is what might be described as a two-way linking
mechanism connecting the logical structure to the relevant content in
the PDF page descriptions. Details can be found in the PDF
specification, which is published by Adobe and available on their web
site. The latest version is the PDF Reference, 2nd ed., version 1.3,
available in PDF at
http://partners.adobe.com/asn/developer/acrosdk/docs/PDFRef.pdf
The pdftotext and pdftohtml tools do not currently support the
"logical structure" features of PDF 1.3. If there are any programmers
on the list who would like to implement this capability in an
open-source conversion tool, for instance by extending pdftotext or
pdftohtml, this would make for a valuable and worthwhile project.
I have already contacted the authors of pdftotext and pdftohtml who
would be interested in adding this feature, but they lack the
resources to implement it at present.
With the growing recognition of access concerns, including their
social and legal implications, it is to be expected that PDF
generation tools will increasingly be able to produce structured PDF
documents which, given the availability of appropriate tools, will be
accessible. It is important to make such tools available as
open-source solutions for the Unix and Linux platforms.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdf to html
pdf to html Karl Dahlke
` Andor Demarteau
` Jason White
@ ` B. Alan Mattison
` David Poehlman
` Andor Demarteau
` Andor Demarteau
3 siblings, 2 replies; 12+ messages in thread
From: B. Alan Mattison @ UTC (permalink / raw)
To: blinux-list
Hi Karl,
Windows Eyes by GW Micro has, in conjunction with Adobe, come up with a very
good PDF reader. It's part of Window Eyes 4.1, which is about to be
released, I think on Friday. You might give it a look.
----- Original Message -----
From: "Karl Dahlke" <eklhad@home.com>
To: <blinux-list@redhat.com>
Sent: Thursday, April 26, 2001 12:54
Subject: pdf to html
> A couple months ago I asked you kind folks how to turn pdf into html,
> since one of my customers *required* pdf documents.
> (Personally I hate pdf, with a passion,
> but it's getting more and more popular,
> so we're going to have to find ways of dealing with it.)
> You directed me to an htmldoc program that works
> pretty well, if you don't use many fancy tags.
> If I were grading the package, I'd give it a strong B.
>
> Now I am in the opposite situation.
> A major software developer supplies its documentation in print
> or pdf, period!
> I called and asked; it's pdf or hit the highway.
> So I searched the net again and found the site
> access.adobe.com/tools.html
> In other words, Adobe knows its "standard"
> is totally inaccessible, and their trying to do something about it.
> I guess that's something.
> I ran my 6 megabyte pdf administrators guide through it,
> and out came the 2 megabyte html equivalent.
> As a conversion utility, I'd give it a C+, maybe a B-.
> But at least it worked, and I can read the manual
> and get on with my job.
>
> One of the most irritating features of the tool is its tendency
> to print its error messages right in the text,
> and there are plenty of them.
> At this point I'm glad I wrote my own editor browser.
> One line of perl code strips them out.
> If you're using standard software such as lynx,
> you might be able to save the html page with the -source option,
> run the following perl command on it (sed won't do),
> and then view the modified local file.
> That will get rid of the errors.
> Here's the relevant chunk of perl code from my editor.
>
> # One of the common problems in the translation is
> # the following meaningless string, that appears over and over again.
> # I'm removing it here.
> $text =~
s/Had\strouble\sresolving\sdest\snear\sword\s(<[\w_]+>\s)?action\stype\sis\s
GoToR//g;
>
>
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list@redhat.com
> https://listman.redhat.com/mailman/listinfo/blinux-list
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: pdf to html
` B. Alan Mattison
@ ` David Poehlman
` Andor Demarteau
1 sibling, 0 replies; 12+ messages in thread
From: David Poehlman @ UTC (permalink / raw)
To: blinux-list
with jfw soon to follow
----- Original Message -----
From: "B. Alan Mattison" <mattison@cnsp.com>
To: <blinux-list@redhat.com>
Sent: Thursday, April 26, 2001 6:56 PM
Subject: Re: pdf to html
Hi Karl,
Windows Eyes by GW Micro has, in conjunction with Adobe, come up with a
very
good PDF reader. It's part of Window Eyes 4.1, which is about to be
released, I think on Friday. You might give it a look.
----- Original Message -----
From: "Karl Dahlke" <eklhad@home.com>
To: <blinux-list@redhat.com>
Sent: Thursday, April 26, 2001 12:54
Subject: pdf to html
> A couple months ago I asked you kind folks how to turn pdf into html,
> since one of my customers *required* pdf documents.
> (Personally I hate pdf, with a passion,
> but it's getting more and more popular,
> so we're going to have to find ways of dealing with it.)
> You directed me to an htmldoc program that works
> pretty well, if you don't use many fancy tags.
> If I were grading the package, I'd give it a strong B.
>
> Now I am in the opposite situation.
> A major software developer supplies its documentation in print
> or pdf, period!
> I called and asked; it's pdf or hit the highway.
> So I searched the net again and found the site
> access.adobe.com/tools.html
> In other words, Adobe knows its "standard"
> is totally inaccessible, and their trying to do something about it.
> I guess that's something.
> I ran my 6 megabyte pdf administrators guide through it,
> and out came the 2 megabyte html equivalent.
> As a conversion utility, I'd give it a C+, maybe a B-.
> But at least it worked, and I can read the manual
> and get on with my job.
>
> One of the most irritating features of the tool is its tendency
> to print its error messages right in the text,
> and there are plenty of them.
> At this point I'm glad I wrote my own editor browser.
> One line of perl code strips them out.
> If you're using standard software such as lynx,
> you might be able to save the html page with the -source option,
> run the following perl command on it (sed won't do),
> and then view the modified local file.
> That will get rid of the errors.
> Here's the relevant chunk of perl code from my editor.
>
> # One of the common problems in the translation is
> # the following meaningless string, that appears over and over again.
> # I'm removing it here.
> $text =~
s/Had\strouble\sresolving\sdest\snear\sword\s(<[\w_]+>\s)?action\stype\s
is\s
GoToR//g;
>
>
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list@redhat.com
> https://listman.redhat.com/mailman/listinfo/blinux-list
_______________________________________________
Blinux-list mailing list
Blinux-list@redhat.com
https://listman.redhat.com/mailman/listinfo/blinux-list
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: pdf to html
` B. Alan Mattison
` David Poehlman
@ ` Andor Demarteau
1 sibling, 0 replies; 12+ messages in thread
From: Andor Demarteau @ UTC (permalink / raw)
To: blinux-list
On Thu, 26 Apr 2001, B. Alan Mattison wrote:
> Hi Karl,
> Windows Eyes by GW Micro has, in conjunction with Adobe, come up with a very
> good PDF reader. It's part of Window Eyes 4.1, which is about to be
> released, I think on Friday. You might give it a look.
that's not an option on this list cause it is not conserned wiht linux,
sorry
> ----- Original Message -----
> From: "Karl Dahlke" <eklhad@home.com>
> To: <blinux-list@redhat.com>
> Sent: Thursday, April 26, 2001 12:54
> Subject: pdf to html
>
>
> > A couple months ago I asked you kind folks how to turn pdf into html,
> > since one of my customers *required* pdf documents.
> > (Personally I hate pdf, with a passion,
> > but it's getting more and more popular,
> > so we're going to have to find ways of dealing with it.)
> > You directed me to an htmldoc program that works
> > pretty well, if you don't use many fancy tags.
> > If I were grading the package, I'd give it a strong B.
> >
> > Now I am in the opposite situation.
> > A major software developer supplies its documentation in print
> > or pdf, period!
> > I called and asked; it's pdf or hit the highway.
> > So I searched the net again and found the site
> > access.adobe.com/tools.html
> > In other words, Adobe knows its "standard"
> > is totally inaccessible, and their trying to do something about it.
> > I guess that's something.
> > I ran my 6 megabyte pdf administrators guide through it,
> > and out came the 2 megabyte html equivalent.
> > As a conversion utility, I'd give it a C+, maybe a B-.
> > But at least it worked, and I can read the manual
> > and get on with my job.
> >
> > One of the most irritating features of the tool is its tendency
> > to print its error messages right in the text,
> > and there are plenty of them.
> > At this point I'm glad I wrote my own editor browser.
> > One line of perl code strips them out.
> > If you're using standard software such as lynx,
> > you might be able to save the html page with the -source option,
> > run the following perl command on it (sed won't do),
> > and then view the modified local file.
> > That will get rid of the errors.
> > Here's the relevant chunk of perl code from my editor.
> >
> > # One of the common problems in the translation is
> > # the following meaningless string, that appears over and over again.
> > # I'm removing it here.
> > $text =~
> s/Had\strouble\sresolving\sdest\snear\sword\s(<[\w_]+>\s)?action\stype\sis\s
> GoToR//g;
> >
> >
> >
> > _______________________________________________
> > Blinux-list mailing list
> > Blinux-list@redhat.com
> > https://listman.redhat.com/mailman/listinfo/blinux-list
>
>
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list@redhat.com
> https://listman.redhat.com/mailman/listinfo/blinux-list
>
slainte mhaith (good health)
slainte (cheers)
-----------
Andor Demarteau E-mail: ademarte@students.cs.uu.nl
student computer science www: http://www.students.cs.uu.nl/~ademarte/
Utrecht University webpage as been updated :)
-----------
Believe in yourself, know what you want, and make it happen!
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdf to html
pdf to html Karl Dahlke
` (2 preceding siblings ...)
` B. Alan Mattison
@ ` Andor Demarteau
` Jude DaShiell
3 siblings, 1 reply; 12+ messages in thread
From: Andor Demarteau @ UTC (permalink / raw)
To: blinux-list
On Thu, 26 Apr 2001, Karl Dahlke wrote:
> Now I am in the opposite situation.
> A major software developer supplies its documentation in print
> or pdf, period!
> I called and asked; it's pdf or hit the highway.
> So I searched the net again and found the site
> access.adobe.com/tools.html
I just tested it out wiht a file including game-results from yesterdays
NFLE-game.
There a lot of tables in it, which the Adobe-tool can't handel at all.
Pitty.
> In other words, Adobe knows its "standard"
> is totally inaccessible, and their trying to do something about it.
> I guess that's something.
> I ran my 6 megabyte pdf administrators guide through it,
> and out came the 2 megabyte html equivalent.
> As a conversion utility, I'd give it a C+, maybe a B-.
> But at least it worked, and I can read the manual
> and get on with my job.
>
> One of the most irritating features of the tool is its tendency
> to print its error messages right in the text,
> and there are plenty of them.
> At this point I'm glad I wrote my own editor browser.
> One line of perl code strips them out.
> If you're using standard software such as lynx,
> you might be able to save the html page with the -source option,
> run the following perl command on it (sed won't do),
> and then view the modified local file.
> That will get rid of the errors.
> Here's the relevant chunk of perl code from my editor.
>
> # One of the common problems in the translation is
> # the following meaningless string, that appears over and over again.
> # I'm removing it here.
> $text =~ s/Had\strouble\sresolving\sdest\snear\sword\s(<[\w_]+>\s)?action\stype\sis\sGoToR//g;
>
>
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list@redhat.com
> https://listman.redhat.com/mailman/listinfo/blinux-list
>
slainte mhaith (good health)
slainte (cheers)
-----------
Andor Demarteau E-mail: ademarte@students.cs.uu.nl
student computer science www: http://www.students.cs.uu.nl/~ademarte/
Utrecht University webpage as been updated :)
-----------
Believe in yourself, know what you want, and make it happen!
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: pdf to html
` Andor Demarteau
@ ` Jude DaShiell
` Andor Demarteau
0 siblings, 1 reply; 12+ messages in thread
From: Jude DaShiell @ UTC (permalink / raw)
To: blinux-list
Those aren't even the biggest problems with the pdf standard and adobe's
implementation of it. If you want to look at the biggest problems with it,
find Stenny Hoyer's web page in the house of representatives web site and
just try to read any of the articles they have posted on that page. Unless
they've fixed things, highly unlikely given it's Congress you'll find it
completely impossible even with all the muscle adobe has available. The
reason why is that adobe software accepts optically scanned input and
doesn't require ascii as input. So all of those documents might as well be
pictures for all the good they're going to do the blind community. Sort of
puts a new meaning on franking privilege doesn't it?
Jude <dashiell@starpower.net>
^ permalink raw reply [flat|nested] 12+ messages in thread