From: Linux for blind general discussion <blinux-list@redhat.com>
To: Linux for blind general discussion <blinux-list@redhat.com>
Subject: Re: Convert unwrapped paragraphs to hard wrapped paragraphs whenthere's no blank lines.
Date: Sat, 28 Mar 2020 10:39:33 +0300 [thread overview]
Message-ID: <20200328103933.7d2e4a3b@telaviv1.shlomifish.org> (raw)
In-Reply-To: <CAJ1g4g-4CYHo6BAnvuC9y-QPbQpSeBqKSfBfVSFumE3Da8OL4w@mail.gmail.com>
Hi Paul,
On Fri, 27 Mar 2020 14:43:01 -0700
Linux for blind general discussion <blinux-list@redhat.com> wrote:
> > I don't understand how paragraphs start and end in these files. Otherwise
> > you
> > can try using one of the text processing tools mentioned here:
> >
> > * https://www.shlomifish.org/open-source/resources/text-processing-tools/
> >
> > * https://www.computerhope.com/unix/ufold.htm
> >
> > * https://en.wikipedia.org/wiki/Fmt_(Unix)
> >
> > * https://en.wikipedia.org/wiki/Par_(command)
> >
> > Note that you may have better luck converting EPUBs (assuming they lack
> > https://en.wikipedia.org/wiki/Digital_rights_management ) to plaintext using
> > tools such as https://pandoc.org/ ,
> > https://metacpan.org/search?q=html%3A%3Awikiconverter&size=20 , etc.
>
> Of that list of programs, I'd be inclined to use Pandoc. It permits
> you to write filters in (embedded) Lua, which is a quick-to-learn
> programming language. For example, this Lua one-liner converts a
> string ("s") to add a line break after each existing line break:
>
> s = string.gsub(s, "<BR>", "<BR>\n<BR>")
>
Other tools may work as well. Furthermore, your HTML processing substitution
will not work if one has "<br>" or "<br />" or "<br/>" for newlines or uses the
more recommended https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p
element.
Also see:
* https://perl-begin.org/uses/text-parsing/
* https://blog.codinghorror.com/parsing-html-the-cthulhu-way/
> On writing Pandoc filters with Lua, see <https://pandoc.org/lua-filters.html>.
>
> Best regards,
>
> Paul
>
--
Shlomi Fish https://www.shlomifish.org/
https://is.gd/MQHVF3 - The Atom Text Editor edits a 2,000,001B file
Joel’s Generalisation: If it happens to you, it happens to everybody.
(Or: It’s never only you.)
— Based on http://www.joelonsoftware.com/news/20020402.html
Please reply to list if it's a mailing list post - http://shlom.in/reply .
next prev parent reply other threads:[~ UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
Convert unwrapped paragraphs to hard wrapped paragraphs when there's " Linux for blind general discussion
` Linux for blind general discussion
` Convert unwrapped paragraphs to hard wrapped paragraphs whenthere's " Linux for blind general discussion
` Linux for blind general discussion
` Linux for blind general discussion [this message]
` Convert unwrapped paragraphs to hard wrapped paragraphs when there's " Linux for blind general discussion
` Linux for blind general discussion
` Linux for blind general discussion
` Convert unwrapped paragraphs to hard wrapped paragraphs when there'sno " Linux for blind general discussion
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200328103933.7d2e4a3b@telaviv1.shlomifish.org \
--to=blinux-list@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).