From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id wBIImTYv022021 for ; Tue, 18 Dec 2018 13:48:29 -0500 Received: by smtp.corp.redhat.com (Postfix) id 82B3584FE; Tue, 18 Dec 2018 18:48:29 +0000 (UTC) Received: from mx1.redhat.com (ext-mx08.extmail.prod.ext.phx2.redhat.com [10.5.110.32]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 791B460C44 for ; Tue, 18 Dec 2018 18:48:25 +0000 (UTC) Received: from opera.rednote.net (opera.rednote.net [66.228.34.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5A8C2C0BF2CB for ; Tue, 18 Dec 2018 18:48:23 +0000 (UTC) Received: from rednote.net (localhost [IPv6:0:0:0:0:0:0:0:1]) by opera.rednote.net (8.15.2/8.15.2) with ESMTPS id wBIImKuR019162 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Tue, 18 Dec 2018 18:48:20 GMT DMARC-Filter: OpenDMARC Filter v1.3.2 opera.rednote.net wBIImKuR019162 Authentication-Results: opera.rednote.net; dmarc=pass (p=reject dis=none) header.from=rednote.net Authentication-Results: opera.rednote.net; spf=pass smtp.mailfrom=janina@rednote.net DKIM-Filter: OpenDKIM Filter v2.11.0 opera.rednote.net wBIImKuR019162 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=rednote.net; s=default; t=1545158900; bh=SsoJ/anc16L0ewPNkBil8nekN3gCKm0KEgtqB34234Y=; h=Date:From:To:Subject:References:In-Reply-To:From; b=UV8SYCu4ldkg8Iakt7bdITy/AhTVkScOfNKZPjVJFsqUllF+TRwdLY00QsSeQThtX ofFXWKYg4NEnnWD/r/RTUapMuj6eMkecrmvDTsrG/2slSMGOfMTsCo9qYaeVjUxNZO SbfzQ+qFwKX6utGp7Tiig8oOXQZbrJWmhX1XVmPE= Received: (from janina@localhost) by rednote.net (8.15.2/8.14.6/Submit) id wBIImJ5V019161 for blinux-list@redhat.com; Tue, 18 Dec 2018 13:48:19 -0500 Date: Tue, 18 Dec 2018 13:48:19 -0500 To: Linux for blind general discussion Subject: Re: extracting text from png files Message-ID: <20181218184819.GA8150@rednote.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: X-Operating-System: Linux opera.rednote.net 4.19.8-300.fc29.x86_64 User-Agent: Mutt/1.10.1 (2018-07-13) X-Greylist: Sender passed SPF test, ACL 238 matched, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 18 Dec 2018 18:48:24 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 18 Dec 2018 18:48:24 +0000 (UTC) for IP:'66.228.34.147' DOMAIN:'opera.rednote.net' HELO:'opera.rednote.net' FROM:'janina@rednote.net' RCPT:'' X-RedHat-Spam-Score: -0.102 (DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, SPF_HELO_PASS, SPF_PASS) 66.228.34.147 opera.rednote.net 66.228.34.147 opera.rednote.net X-Scanned-By: MIMEDefang 2.78 on 10.5.110.32 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-loop: blinux-list@redhat.com From: Linux for blind general discussion X-BeenThere: blinux-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: blinux-list@redhat.com List-Id: Linux for blind general discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Dec 2018 18:48:29 -0000 OK, this is a nit, but the O in OCR stands for "Optical," not "Ocular." It's about the process based on vision, not on the organ that is sensitive to light. Machines don't have eyes, biological beings have eyes. Linux for blind general discussion writes: > What you're looking for is Ocular Character Recognition or OCR for short. > > I've never managed to figure out its command line syntax, but I > believe tesseract is considered the current standard option for Linux. > > There's also Cuneiform, which I have actually used with some success > in the past, but I believe its either contrib or non-free under > Debian, so you might need to enable extra repositories depending on > how strict your distro is about sticking to FOSS principles. > > I will warn you, in my experience, OCR is as likely to produce > gibberish as legible text. A scan of a page of prose type set in a > standard font will probably OCR well, but the more mixed text is with > graphics, the fancier the font, and the more complicated the page > layout, the more likely errors are. I've tried OCR'ing scanlated > manga(Japanese comics) in the past and have gotten results that > included unpredictible patterns of letters and numbers misidentified > as others(S and 5, P and D, I and 1, LI and U, B and g where just some > of the common substitutions I encountered trying to fix the OCR'd > text), characters my screenreader could'nt identify or identified as > characters I'm unfamiliar, and even when the text was clear, > paragraphs out of order wasn't uncommon. > > -- > Sincerely, > > Jeffery Wright > Bachelor of Computer Science > President Emeritus, Nu Nu Chapter, Phi Theta Kappa. > > _______________________________________________ > Blinux-list mailing list > Blinux-list@redhat.com > https://www.redhat.com/mailman/listinfo/blinux-list -- Janina Sajka Linux Foundation Fellow Executive Chair, Accessibility Workgroup: http://a11y.org The World Wide Web Consortium (W3C), Web Accessibility Initiative (WAI) Chair, Accessible Platform Architectures http://www.w3.org/wai/apa