Saturday, January 26, 2013

Pytesser only digits recognition

Last time I needed some Python library which recognizes digits from image. I decided to use Pytesser which is wrapper for tesseract.exe - program developed firstly by HP then by Google.
It worked fine with standard text examples.
I had few images containing only digits. They came from really simple captchas (with removed noises and so on..). I was using pytesser function image_to_string and getting some characters, comas, ...:/
I was trying to find option to read only digits. When i got this option it didnt work. I realized, that standard Tesseract within pytesser doesnt support them.
Sollution is: Get the latest version of Tesseract from http://code.google.com/p/tesseract-ocr/downloads/list .
Install it in pytesser directory (for me it was C:/Python27/Lib/pytesser). It will change old tesseract.exe to new one.
Find that line in pytesser.py :
args = [tesseract_exe_name, input_filename, output_filename]
Change it to:
args = [tesseract_exe_name, input_filename, output_filename, 'nobatch', 'digits']

For me it works fine!
PS:

That configuration recognizes also 'dot' and 'minus'. If You don't want that functionality then go into tessdata\configs directory, find digits file, open it and change:
tessedit_char_whitelist 0123456789.-
into
tessedit_char_whitelist 0123456789

5 comments:

  1. thank you very much, you save me!

    ReplyDelete
    Replies
    1. Programmer'S Blog: Pytesser Only Digits Recognition >>>>> Download Now

      >>>>> Download Full

      Programmer'S Blog: Pytesser Only Digits Recognition >>>>> Download LINK

      >>>>> Download Now

      Programmer'S Blog: Pytesser Only Digits Recognition >>>>> Download Full

      >>>>> Download LINK DE

      Delete
  2. Hi Przemek,

    That digits file doesn't exist on my pytesser installation. Did you create it yourself?

    I am trying to allow only digits, a colon, and the alphabet. Could I just add those characters to the "digits" files?

    Thanks!

    ReplyDelete
  3. Didn't you skip "Get the latest version of Tesseract ..." step? Installation of latest Tesseract version will also replace your old 'tessdata' directory with new one containing also 'digits' file.

    I'm not sure what to do to add alphabet into digits configuration, sorry. Try to experiment.

    ReplyDelete
  4. Programmer'S Blog: Pytesser Only Digits Recognition >>>>> Download Now

    >>>>> Download Full

    Programmer'S Blog: Pytesser Only Digits Recognition >>>>> Download LINK

    >>>>> Download Now

    Programmer'S Blog: Pytesser Only Digits Recognition >>>>> Download Full

    >>>>> Download LINK Tu

    ReplyDelete