public inbox for blinux-list@redhat.com
 help / color / mirror / Atom feed
From: Linux for blind general discussion <blinux-list@redhat.com>
To: Linux for blind general discussion <blinux-list@redhat.com>
Subject: Re: Convert unwrapped paragraphs to hard wrapped paragraphs whenthere's no blank lines.
Date: Sat, 28 Mar 2020 10:39:33 +0300	[thread overview]
Message-ID: <20200328103933.7d2e4a3b@telaviv1.shlomifish.org> (raw)
In-Reply-To: <CAJ1g4g-4CYHo6BAnvuC9y-QPbQpSeBqKSfBfVSFumE3Da8OL4w@mail.gmail.com>

Hi Paul,

On Fri, 27 Mar 2020 14:43:01 -0700
Linux for blind general discussion <blinux-list@redhat.com> wrote:

> > I don't understand how paragraphs start and end in these files. Otherwise
> > you
> > can try using one of the text processing tools mentioned here:
> >
> > * https://www.shlomifish.org/open-source/resources/text-processing-tools/
> >
> > * https://www.computerhope.com/unix/ufold.htm
> >
> > * https://en.wikipedia.org/wiki/Fmt_(Unix)
> >
> > * https://en.wikipedia.org/wiki/Par_(command)
> >
> > Note that you may have better luck converting EPUBs (assuming they lack
> > https://en.wikipedia.org/wiki/Digital_rights_management ) to plaintext using
> > tools such as https://pandoc.org/ ,
> > https://metacpan.org/search?q=html%3A%3Awikiconverter&size=20 , etc.  
> 
> Of that list of programs, I'd be inclined to use Pandoc. It permits
> you to write filters in (embedded) Lua, which is a quick-to-learn
> programming language. For example, this Lua one-liner converts a
> string ("s") to add a line break after each existing line break:
> 
> s = string.gsub(s, "<BR>", "<BR>\n<BR>")
> 

Other tools may work as well. Furthermore, your HTML processing substitution
will not work if one has "<br>" or "<br />" or "<br/>" for newlines or uses the
more recommended https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p
element.

Also see:

* https://perl-begin.org/uses/text-parsing/

* https://blog.codinghorror.com/parsing-html-the-cthulhu-way/



> On writing Pandoc filters with Lua, see <https://pandoc.org/lua-filters.html>.
> 
> Best regards,
> 
> Paul
> 



-- 

Shlomi Fish       https://www.shlomifish.org/
https://is.gd/MQHVF3 - The Atom Text Editor edits a 2,000,001B file

Joel’s Generalisation: If it happens to you, it happens to everybody.
(Or: It’s never only you.)
    — Based on http://www.joelonsoftware.com/news/20020402.html

Please reply to list if it's a mailing list post - http://shlom.in/reply .


  reply	other threads:[~ UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
 Convert unwrapped paragraphs to hard wrapped paragraphs when there's " Linux for blind general discussion
 ` Linux for blind general discussion
   ` Convert unwrapped paragraphs to hard wrapped paragraphs whenthere's " Linux for blind general discussion
     ` Linux for blind general discussion
       ` Linux for blind general discussion [this message]
 ` Convert unwrapped paragraphs to hard wrapped paragraphs when there's " Linux for blind general discussion
   ` Linux for blind general discussion
     ` Linux for blind general discussion
 ` Convert unwrapped paragraphs to hard wrapped paragraphs when there'sno " Linux for blind general discussion

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200328103933.7d2e4a3b@telaviv1.shlomifish.org \
    --to=blinux-list@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).