Programmer's blog: January 2013

Thursday, January 31, 2013

Using Boost.Python under Eclipse

This post describes how to configure Eclipse under linux to create Boost.python c++ shared library. I suppose you have Boost and Eclipse Indigo already installed.

First step is to chose File -> New -> C++ project. Then select 'shared library' -> empty project. Name it somehow and finish. Now click Project -> properties, expand C/C++ Build and chose Settings. In GCC C++ compiler click on dictionaries and add /usr/include/boost and /usr/include/python. That paths are correct for me, for You they might be different...

Next step is to change dynamic library name. It must be the same as your BOOST_PYTHON_MODULE(<its name>). So if your module is called 'hello' then your configuration looks like

Next thing is to add -fPIC flag to your compiler settings

Now your library will compile, but You have to add boost_python library to Your linker settings.

You are ready to build library now and import it in python using 'import hello' statement.

Saturday, January 26, 2013

Pytesser only digits recognition

Last time I needed some Python library which recognizes digits from image. I decided to use Pytesser which is wrapper for tesseract.exe - program developed firstly by HP then by Google.
It worked fine with standard text examples.
I had few images containing only digits. They came from really simple captchas (with removed noises and so on..). I was using pytesser function image_to_string and getting some characters, comas, ...:/
I was trying to find option to read only digits. When i got this option it didnt work. I realized, that standard Tesseract within pytesser doesnt support them.
Sollution is: Get the latest version of Tesseract from http://code.google.com/p/tesseract-ocr/downloads/list .
Install it in pytesser directory (for me it was C:/Python27/Lib/pytesser). It will change old tesseract.exe to new one.
Find that line in pytesser.py :
args = [tesseract_exe_name, input_filename, output_filename]
Change it to:
args = [tesseract_exe_name, input_filename, output_filename, 'nobatch', 'digits']

For me it works fine!
PS:

That configuration recognizes also 'dot' and 'minus'. If You don't want that functionality then go into tessdata\configs directory, find digits file, open it and change:
tessedit_char_whitelist 0123456789.-
into
tessedit_char_whitelist 0123456789

Monday, January 14, 2013

Boost::python::dict to std::map conversion

In this post there are code listings for std::map wraper to python. All files can be easily downloaded from https://sites.google.com/site/ppiotrowblog/home/files file: map.zip.

First listing is for code, which converts boost::python::dict into std::map.

 #include <map>  
 #include <boost/python.hpp>  
 #include <string>  
   
 #ifndef MAPHOLDER_HPP  
 #define     MAPHOLDER_HPP  
 typedef std::map<std::string,std::string> StringMap;  
 class MapHolder{  
 public:  
      /*Constructors*/  
      MapHolder();  
      MapHolder(boost::python::dict& py_dict);  
        
      /*Modifiers*/  
      void update_map(boost::python::dict& py_dict);  
      void clear();  
   
      boost::python::dict get_dict();  
      size_t size();  
 protected:  
      StringMap map_;  
        
 };  
 #endif

It allows to create empty map or map with values from python dict. It also allows to clear map or add some elements with update_map method. Here comes methods implementation

 #include "MapHolder.hpp"  
 #include <iostream>  
 MapHolder::MapHolder(){  
   
 };  
   
   
 MapHolder::MapHolder(boost::python::dict& py_dict){  
      update_map(py_dict);  
 }  
   
   
 void MapHolder::update_map(boost::python::dict& py_dict){  
      boost::python::list keys = py_dict.keys();  
        for (int i = 0; i < len(keys); ++i) {  
           boost::python::extract<std::string> extracted_key(keys[i]);  
           if(!extracted_key.check()){  
                std::cout<<"Key invalid, map might be incomplete"<<std::endl;  
                continue;                 
           }  
           std::string key = extracted_key;  
           boost::python::extract<std::string> extracted_val(py_dict[key]);  
           if(!extracted_val.check()){  
           std::cout<<"Value invalid, map might be incomplete"<<std::endl;  
                continue;                 
           }  
           std::string value = extracted_val;  
           map_[key] = value;  
      }  
 }  
   
   
 boost::python::dict MapHolder::get_dict(){  
      boost::python::dict py_dict;  
      for(StringMap::const_iterator it = map_.begin(); it != map_.end(); ++it)   
           py_dict[it->first]=it->second;        
      return py_dict;  
 }  
   
   
 void MapHolder::clear(){  
      map_.clear();  
 }

After implementation lets expose that in Python.

 #include <algorithm>  
 #include <boost/python.hpp>  
 #include "MapHolder.hpp"  
 #include "FooTicTacToe.hpp"  
   
 BOOST_PYTHON_MODULE(cpp_collections)  
 {  
   using namespace boost::python;  
    class_<MapHolder> ("map", init<>())  
       .def(init<boost::python::dict&>())  
        .def("update_map", &MapHolder::update_map)  
       .def("clear", &MapHolder::clear)  
       .def("size", &MapHolder::size)  
        .def("to_dict",&MapHolder::get_dict);  
   
    class_<FooTicTacToe, bases<MapHolder> >("FooTicTacToe", init<>())  
        .def(init<boost::python::dict&>())  
        .def("is_winner", &FooTicTacToe::FooIsWinner);  
 }

At this moment ignore FooTicTacToe class. Compile module with sconstruct file

 # Przemek  -*- mode: Python; -*-  
 ##Linux Parameters  
 boost_path='/usr/include/boost/'  
 python_path='/usr/include/python2.7/'  
 ##End  
   
 import platform, os  
   
 boost_libs=['boost_python']  
 env = Environment()  
 if(platform.system() == "Linux"):  
   env.Append(CPPPATH=[boost_path,python_path])  
 #modul pythona  
   env.SharedLibrary(target='cpp_collections.so',source=['python_wraper.cpp', 'MapHolder.cpp', 'FooTicTacToe.cpp'], LIBS=boost_libs, SHLIBPREFIX='')  
 else:  
   print platform.system() + ' not supported'

If You invoke that Python script:

 import cpp_collections  
   
 sample_dict = {'EUR':'European','USD':'United States Dolar','RON':'Romanian leu','PLN':'Polish Zloty','HUF':'Hungarian forint', '1':'Number as a text'}  
 sample_map = cpp_collections.map(sample_dict)  
 print sample_map.size()  
 print sample_map.to_dict()  
   
 sample_map.update_map({'UKC':'Unknown Currency'})  
 print sample_map.size()  
 print sample_map.to_dict()  
   
 sample_map.clear();  
 print sample_map.size()  
 print sample_map.to_dict()  
   
   
 sample_map.update_map({2:'Number as a number'});  
 print sample_map.size()  
 print sample_map.to_dict()

You will see result

 6  
 {'HUF': 'Hungarian forint', 'USD': 'United States Dolar', 'RON': 'Romanian leu', '1': 'Number as a text', 'PLN': 'Polish Zloty', 'EUR': 'European'}  
 7  
 {'HUF': 'Hungarian forint', 'USD': 'United States Dolar', 'RON': 'Romanian leu', '1': 'Number as a text', 'PLN': 'Polish Zloty', 'UKC': 'Unknown Currency', 'EUR': 'European'}  
 0  
 {}  
 Key invalid, map might be incomplete  
 0  
 {}

So we created working boost::python::dict to std::map converter. Lets use it to create Foo TicTacToe game.

 #include "MapHolder.hpp"  
 #include <boost/python.hpp>  
   
 #ifndef FooTicTacToe_HPP  
 #define     FooTicTacToe_HPP  
   
 class FooTicTacToe: public MapHolder{  
 public:  
      FooTicTacToe();  
      FooTicTacToe(boost::python::dict& py_dict);  
      bool FooIsWinner();  
 };  
   
 #endif

and method implementation

 #include "FooTicTacToe.hpp"  
   
 FooTicTacToe::FooTicTacToe():  
      MapHolder()  
      {}  
   
   
 FooTicTacToe::FooTicTacToe(boost::python::dict& py_dict):  
      MapHolder(py_dict)  
      {}  
   
 bool FooTicTacToe::FooIsWinner(){  
 //TODO that method is realy stupid  
   
 return map_["a1"]=="x" && map_["a2"]=="x" && map_["a3"] =="x";  
   
 }

Because we used boost python classes inheritance, we have ready methods to simulate moves. Bellow is Python script to simulate really foo game.

   
 #TicTacToe game  
 ttt_game = cpp_collections.FooTicTacToe({'a1':'x'})  
   
 #make moves  
 ttt_game.update_map({'b1':'o'})  
 ttt_game.update_map({'a2':'x'})  
 print 'Is winner?',ttt_game.is_winner()  
 ttt_game.update_map({'b2':'0'})  
 ttt_game.update_map({'a3':'x'})  
 print 'Is winner?',ttt_game.is_winner()  
 ttt_game.clear()

It produces output

 Is winner? False  
 Is winner? True

Boost.python vs. python simple function exec time comparison

Last time i've decided to compare execution time of simple functions written in Python and C++. To do that i've created simple c++ code exposed as python module by Boost.python library.

 #include <algorithm>  
 #include <boost/python.hpp>  
 int gcd(int a,int b){  
   if (b!=0)  
     return gcd(b,a%b);  
   else  
     return a;  
 }  
 BOOST_PYTHON_MODULE(cpp_compare)  
 {  
   using namespace boost::python;  
   def("max",   
     std::max<int>,return_value_policy<copy_const_reference>());  
   def("gcd",gcd);  
 }

Module can be compiled on Linux with following Sconstruct file

 # Przemek  -*- mode: Python; -*-  
 ##Linux Parameters  
 boost_path='/usr/include/boost/'  
 python_path='/usr/include/python2.7/'  
 ##End  
 import platform, os  
 boost_libs=['boost_python']  
 env = Environment()  
 if(platform.system() == "Linux"):  
   env.Append(CPPPATH=[boost_path,python_path])  
 #modul pythona  
   env.SharedLibrary(target='cpp_compare.so',source=['python_wraper.cpp'], LIBS=boost_libs, SHLIBPREFIX='')  
 else:  
   print platform.system() + ' not supported'

Last but not least comes Python file:

 import random  
 import cpp_compare  
 import datetime  
 capacity= 1000000  
 min_val = -10000  
 max_val = 100000  
 compare_data = [(int(random.randint(min_val,max_val)), int(random.randint(min_val,max_val))) for i in xrange(capacity)]  
 def gcd(a, b):  
   if b!=0:  
     return gcd(b, a%b)  
   else:  
       return a  
 def compare_time(fun_name, python_fun, cpp_fun):   
     print 'Comparing ', fun_name,' execution time:'  
     start= datetime.datetime.now()  
     for i, j in compare_data:  
       python_fun(i,j)#bigger = max(i, j)  
     end = datetime.datetime.now()  
     print 'Python time:', (end-start).total_seconds()  
     start= datetime.datetime.now()  
     for i, j in compare_data:  
        cpp_fun(i,j)#bigger = cpp_compare.max(i, j)  
     end = datetime.datetime.now()  
     print 'Cpp time:',(end-start).total_seconds()  
 compare_time('Max',max, cpp_compare.max)  
 compare_time('Greatest common divisor',gcd, cpp_compare.gcd)

Results from executing python main.py
Comparing Max execution time:
Python time: 0.283549
Cpp time: 0.589304

Comparing Greatest common divisor execution time:
Python time: 4.236021
Cpp time: 0.746045

Results seems to be pretty obvious. Simple calculations like MAX should be written using native Python packages. Probably most of time was wasted to call cpp module. But when calculations are more complicated, its worth to write them in C++.

Thursday, January 10, 2013

PMX Crossover (Krzyżowanie PMX )

My implementation of PMX crossover written in R.

 bag.crossover <- function (parent1, parent2) {  
 #Created with inspiration from http://algorytmy-genetyczne.eprace.edu.pl/664,Implementacja.html ,
 # which I think has some errors corrected bellow.  
   if(length(parent1)!=length(parent2))  
           stop("Parents have different lengths")  
      parentLength <- length(parent1)  
      crossLength <- sample(1:parentLength,1,replace=T)#lenght of crossing segment  
      ibeg <- sample(1:(parentLength-crossLength+1),1,replace=T)#index of begining of crossing segment  
      iend <- (ibeg+crossLength-1) #index of end  
      SegmentParent <- matrix(data=c(parent1[ibeg:iend], parent2[ibeg:iend]), byrow = T,nrow=2, ncol=crossLength)#crossing 
   #segment for both parents  
      child <-c()  
      child[ibeg:iend] <-SegmentParent[1,]  
      for(locus in 1:parentLength){  
           soughtAllele <-parent2[locus]  
           saPosParent1 <- which( SegmentParent[1,] == soughtAllele) #soughtAllele position in parent 1  
           saPosParent2 <- which( SegmentParent[2,] == soughtAllele) #soughtAllele position in parent 2  
           if (length(saPosParent1)) {#Number occurred yet in crossing segment  
                next  
           }  
           else if (length(saPosParent2) ){#Occured in skipped segment  
                newSoughtAllele <- soughtAllele  
                while (length(saPosParent2)){  
                     newLocus <- saPosParent2[1]  
                     newSoughtAllele <- SegmentParent[1,newLocus]  
                     saPosParent2 <- which( SegmentParent[2,] == newSoughtAllele)  
                     if(!length(saPosParent2)){  
                          break  
                     }                      
                }  
                saPosParent2 <- which( parent2 == newSoughtAllele)  
                newLocus <- saPosParent2[1]  
                child[newLocus] <- soughtAllele  
           }  
           else{#Doesnt occured in crossing segments. Let it stay, where it is now  
                child[locus] <- soughtAllele  
                }  
      }#for  
      return (child)  
 }

Raw can be downloaded from http://pastebin.com/7FMfrzfp