From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 02S7dh75002044 for ; Sat, 28 Mar 2020 03:39:43 -0400 Received: by smtp.corp.redhat.com (Postfix) id 5F3D42038B80; Sat, 28 Mar 2020 07:39:43 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast04.extmail.prod.ext.rdu2.redhat.com [10.11.55.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5B14B2026D67 for ; Sat, 28 Mar 2020 07:39:40 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B7442101A55A for ; Sat, 28 Mar 2020 07:39:40 +0000 (UTC) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=dkim.mimecast.com; s=201903; t=1585381180; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wtGzLiSxUJbWG+kknEZCOHxDV2v1rBb0TMHcmIxljFA=; b=MFLEN9exu866IKD6hMW4hu2y/Z3JBSPu1HbIIg9ceHoUB8NGfuNHT8tnTSQZltUz/lUwV3 yS++T1bRKX4kh3vAQWLNbSmA1dUxNiVnpr8s5avTM/9ZRU2yHBSe4G80IryhGEu+MkS5DW cuU4rvsZ2I3Y1JmO7Eb6zhKWEePq+76cbTfTY3RBoSyq8+PTIwvmj7gGjo+feN1+gjYwCv UrX66Lch2gom3ktS7WeVzn2OJPRsi9Z6puPjleEHRor58CIc6M+53TguCNqKUjoznEDUZm KrqXZvCDJGPj/EFYs5EvSCKWgmmgSdXPE4Pa5zqcSBOXyOrINLJ8q0PTbSMMyA== ARC-Seal: i=1; s=201903; d=dkim.mimecast.com; t=1585381180; a=rsa-sha256; cv=none; b=KVZGReyfi6SfsfLf5T48qj9lsUo/gAuAzsICdc7subTGInDXNfa6PJqbQv0OqVw0H1VtA3 NyXq6khHZ5cez/sGwkI2Mk5ApP3w1drngFbYtCV3ZwPOdZh3qhdn5MLljdn8YXnlqlq0Fn iJAc+km2n81AWReod6kvhUoGUA5QqLIt79KVQSzkrWsoTfdqfXWIJa8RvLiYfx87581tI8 2ajpkJSCdRt7fhGLvKoLd+YQrjhrRZa6rm4UZDSa4KdlzF+ch8JojTlrgP/AF85EdfO97D q5Sbgq6YVk3exX+EcyNBao0iLiRXuDKloi+lSFWgGMmJgVaqqj8YvNaPNUr0kQ== ARC-Authentication-Results: i=1; relay.mimecast.com; dkim=none; dmarc=none; spf=neutral (relay.mimecast.com: 192.185.145.122 is neither permitted nor denied by domain of shlomif@shlomifish.org) smtp.mailfrom=shlomif@shlomifish.org Received: from gateway32.websitewelcome.com (gateway32.websitewelcome.com [192.185.145.122]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-82-qhfO2CjNPHKm9H8diZiCVA-1; Sat, 28 Mar 2020 03:39:37 -0400 X-MC-Unique: qhfO2CjNPHKm9H8diZiCVA-1 Received: from cm11.websitewelcome.com (cm11.websitewelcome.com [100.42.49.5]) by gateway32.websitewelcome.com (Postfix) with ESMTP id 846591E2D3 for ; Sat, 28 Mar 2020 02:39:36 -0500 (CDT) Received: from gator4065.hostgator.com ([192.185.4.76]) by cmsmtp with SMTP id I648jMSXLSl8qI648jtFRP; Sat, 28 Mar 2020 02:39:36 -0500 X-Authority-Reason: nr=8 Received: from igld-84-229-97-230.inter.net.il ([84.229.97.230]:51720 helo=telaviv1.shlomifish.org) by gator4065.hostgator.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from ) id 1jI648-000XH3-4z for blinux-list@redhat.com; Sat, 28 Mar 2020 02:39:36 -0500 Received: from telaviv1.shlomifish.org (telaviv1.shlomifish.org [127.0.0.1]) by telaviv1.shlomifish.org (Postfix) with ESMTP id 5525B261521 for ; Sat, 28 Mar 2020 10:39:33 +0300 (IDT) Date: Sat, 28 Mar 2020 10:39:33 +0300 To: Linux for blind general discussion Subject: Re: Convert unwrapped paragraphs to hard wrapped paragraphs whenthere's no blank lines. Message-ID: <20200328103933.7d2e4a3b@telaviv1.shlomifish.org> In-Reply-To: References: <20200327192532.120f151d@telaviv1.shlomifish.org> MIME-Version: 1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator4065.hostgator.com X-AntiAbuse: Original Domain - redhat.com X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - shlomifish.org X-BWhitelist: no X-Source-IP: 84.229.97.230 X-Source-L: No X-Exim-ID: 1jI648-000XH3-4z X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: igld-84-229-97-230.inter.net.il (telaviv1.shlomifish.org) [84.229.97.230]:51720 X-Source-Auth: shlomif@shlomifish.org X-Email-Count: 1 X-Source-Cap: c2hsb21pZjtzaGxvbWlmO2dhdG9yNDA2NS5ob3N0Z2F0b3IuY29t X-Local-Domain: yes Authentication-Results: relay.mimecast.com; dkim=none; dmarc=none; spf=neutral (relay.mimecast.com: 192.185.145.122 is neither permitted nor denied by domain of shlomif@shlomifish.org) smtp.mailfrom=shlomif@shlomifish.org X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by lists01.pubmisc.prod.ext.phx2.redhat.com id 02S7dh75002044 X-loop: blinux-list@redhat.com From: Linux for blind general discussion X-BeenThere: blinux-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: blinux-list@redhat.com List-Id: Linux for blind general discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Mar 2020 07:39:43 -0000 Hi Paul, On Fri, 27 Mar 2020 14:43:01 -0700 Linux for blind general discussion wrote: > > I don't understand how paragraphs start and end in these files. Otherwise > > you > > can try using one of the text processing tools mentioned here: > > > > * https://www.shlomifish.org/open-source/resources/text-processing-tools/ > > > > * https://www.computerhope.com/unix/ufold.htm > > > > * https://en.wikipedia.org/wiki/Fmt_(Unix) > > > > * https://en.wikipedia.org/wiki/Par_(command) > > > > Note that you may have better luck converting EPUBs (assuming they lack > > https://en.wikipedia.org/wiki/Digital_rights_management ) to plaintext using > > tools such as https://pandoc.org/ , > > https://metacpan.org/search?q=html%3A%3Awikiconverter&size=20 , etc. > > Of that list of programs, I'd be inclined to use Pandoc. It permits > you to write filters in (embedded) Lua, which is a quick-to-learn > programming language. For example, this Lua one-liner converts a > string ("s") to add a line break after each existing line break: > > s = string.gsub(s, "
", "
\n
") > Other tools may work as well. Furthermore, your HTML processing substitution will not work if one has "
" or "
" or "
" for newlines or uses the more recommended https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p element. Also see: * https://perl-begin.org/uses/text-parsing/ * https://blog.codinghorror.com/parsing-html-the-cthulhu-way/ > On writing Pandoc filters with Lua, see . > > Best regards, > > Paul > -- Shlomi Fish https://www.shlomifish.org/ https://is.gd/MQHVF3 - The Atom Text Editor edits a 2,000,001B file Joel’s Generalisation: If it happens to you, it happens to everybody. (Or: It’s never only you.) — Based on http://www.joelonsoftware.com/news/20020402.html Please reply to list if it's a mailing list post - http://shlom.in/reply .