From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id wBHMuAW9025733 for ; Mon, 17 Dec 2018 17:56:10 -0500 Received: by smtp.corp.redhat.com (Postfix) id 1B7B15D9C8; Mon, 17 Dec 2018 22:56:10 +0000 (UTC) Received: from mx1.redhat.com (ext-mx06.extmail.prod.ext.phx2.redhat.com [10.5.110.30]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 13CB45D9C5 for ; Mon, 17 Dec 2018 22:56:07 +0000 (UTC) Received: from smtprelay03.ispgateway.de (smtprelay03.ispgateway.de [80.67.29.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CFECE285C0 for ; Mon, 17 Dec 2018 22:56:04 +0000 (UTC) Received: from [95.90.218.77] (helo=[192.168.178.18]) by smtprelay03.ispgateway.de with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.90_1) (envelope-from ) id 1gZ1nt-0003Yt-7F for blinux-list@redhat.com; Mon, 17 Dec 2018 23:56:01 +0100 Subject: Re: extracting text from png files To: blinux-list@redhat.com References: Message-ID: Date: Mon, 17 Dec 2018 23:56:00 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.3 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Df-Sender: Y2hyeXNAbGludXgtYTExeS5vcmc= X-Greylist: Sender passed SPF test, Sender IP whitelisted by DNSRBL, ACL 216 matched, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 17 Dec 2018 22:56:05 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 17 Dec 2018 22:56:05 +0000 (UTC) for IP:'80.67.29.7' DOMAIN:'smtprelay03.ispgateway.de' HELO:'smtprelay03.ispgateway.de' FROM:'chrys@linux-a11y.org' RCPT:'' X-RedHat-Spam-Score: -0.699 (RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS) 80.67.29.7 smtprelay03.ispgateway.de 80.67.29.7 smtprelay03.ispgateway.de X-Scanned-By: MIMEDefang 2.78 on 10.5.110.30 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-loop: blinux-list@redhat.com From: Linux for blind general discussion X-BeenThere: blinux-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: blinux-list@redhat.com List-Id: Linux for blind general discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2018 22:56:10 -0000 Howdy, i use tesseract for doing this. I recognized with version 4.0 what just is released the results improved a lot here (for german and english usecases). some offical numbers could be found here: https://github.com/tesseract-ocr/docs/raw/master/das_tutorial2016/7Building%20a%20Multi-Lingual%20OCR%20Engine.pdf the languages improves between 10 and 80 percent - depending on language and it previouse support level.. It seems it got a new OCR engine spend based on neuronal network. cheers chrys Am 17.12.18 um 16:57 schrieb Linux for blind general discussion: > Disclaimer: I don't know which image formats either program supports > directly, nor do I know of a good way to convert between image > formats, though I'm pretty sure cuneiform supports at least .jpg and > .png files directly. > > I also remember at least one OCR tutorial recommending some > preprocessing to make images easier for the OCR program to work with, > and I believe they used the convert command provided by imagemagick to > do so, but I forget the details. > > Also, it's been a while since I've attempted any OCR'ing myself(how > often I had to manually clean up the output kind of put me off), so > there might be others on this list who can provide better, and more > specific advice on this subject. > > Still, I hope I've at least got you started on the right track. > > >