This post describes how to configure Eclipse under linux to create Boost.python c++ shared library. I suppose you have Boost and Eclipse Indigo already installed.
First step is to chose File -> New -> C++ project. Then select 'shared library' -> empty project. Name it somehow and finish. Now click Project -> properties, expand C/C++ Build and chose Settings. In GCC C++ compiler click on dictionaries and add /usr/include/boost and /usr/include/python. That paths are correct for me, for You they might be different...
Next step is to change dynamic library name. It must be the same as your BOOST_PYTHON_MODULE(<its name>). So if your module is called 'hello' then your configuration looks like
Next thing is to add -fPIC flag to your compiler settings
Now your library will compile, but You have to add boost_python library to Your linker settings.
You are ready to build library now and import it in python using 'import hello' statement.
Thursday, January 31, 2013
Saturday, January 26, 2013
Pytesser only digits recognition
Last time I needed some Python library which recognizes digits from image. I decided to use Pytesser which is wrapper for tesseract.exe - program developed firstly by HP then by Google.
It worked fine with standard text examples.
I had few images containing only digits. They came from really simple captchas (with removed noises and so on..). I was using pytesser function image_to_string and getting some characters, comas, ...:/
I was trying to find option to read only digits. When i got this option it didnt work. I realized, that standard Tesseract within pytesser doesnt support them.
Sollution is: Get the latest version of Tesseract from http://code.google.com/p/tesseract-ocr/downloads/list .
Install it in pytesser directory (for me it was C:/Python27/Lib/pytesser). It will change old tesseract.exe to new one.
Find that line in pytesser.py :
args = [tesseract_exe_name, input_filename, output_filename]
Change it to:
args = [tesseract_exe_name, input_filename, output_filename, 'nobatch', 'digits']
For me it works fine!
PS:
That configuration recognizes also 'dot' and 'minus'. If You don't want that functionality then go into tessdata\configs directory, find digits file, open it and change:
tessedit_char_whitelist 0123456789.-
into
tessedit_char_whitelist 0123456789
It worked fine with standard text examples.
I had few images containing only digits. They came from really simple captchas (with removed noises and so on..). I was using pytesser function image_to_string and getting some characters, comas, ...:/
I was trying to find option to read only digits. When i got this option it didnt work. I realized, that standard Tesseract within pytesser doesnt support them.
Sollution is: Get the latest version of Tesseract from http://code.google.com/p/tesseract-ocr/downloads/list .
Install it in pytesser directory (for me it was C:/Python27/Lib/pytesser). It will change old tesseract.exe to new one.
Find that line in pytesser.py :
args = [tesseract_exe_name, input_filename, output_filename]
Change it to:
args = [tesseract_exe_name, input_filename, output_filename, 'nobatch', 'digits']
For me it works fine!
PS:
That configuration recognizes also 'dot' and 'minus'. If You don't want that functionality then go into tessdata\configs directory, find digits file, open it and change:
tessedit_char_whitelist 0123456789.-
into
tessedit_char_whitelist 0123456789
Monday, January 14, 2013
Boost::python::dict to std::map conversion
In this post there are code listings for std::map wraper to python. All files can be easily downloaded from https://sites.google.com/site/ppiotrowblog/home/files file: map.zip.
First listing is for code, which converts boost::python::dict into std::map.
First listing is for code, which converts boost::python::dict into std::map.
#include <map>
#include <boost/python.hpp>
#include <string>
#ifndef MAPHOLDER_HPP
#define MAPHOLDER_HPP
typedef std::map<std::string,std::string> StringMap;
class MapHolder{
public:
/*Constructors*/
MapHolder();
MapHolder(boost::python::dict& py_dict);
/*Modifiers*/
void update_map(boost::python::dict& py_dict);
void clear();
boost::python::dict get_dict();
size_t size();
protected:
StringMap map_;
};
#endif
It allows to create empty map or map with values from python dict. It also allows to clear map or add some elements with update_map method.
Here comes methods implementation
#include "MapHolder.hpp"
#include <iostream>
MapHolder::MapHolder(){
};
MapHolder::MapHolder(boost::python::dict& py_dict){
update_map(py_dict);
}
void MapHolder::update_map(boost::python::dict& py_dict){
boost::python::list keys = py_dict.keys();
for (int i = 0; i < len(keys); ++i) {
boost::python::extract<std::string> extracted_key(keys[i]);
if(!extracted_key.check()){
std::cout<<"Key invalid, map might be incomplete"<<std::endl;
continue;
}
std::string key = extracted_key;
boost::python::extract<std::string> extracted_val(py_dict[key]);
if(!extracted_val.check()){
std::cout<<"Value invalid, map might be incomplete"<<std::endl;
continue;
}
std::string value = extracted_val;
map_[key] = value;
}
}
boost::python::dict MapHolder::get_dict(){
boost::python::dict py_dict;
for(StringMap::const_iterator it = map_.begin(); it != map_.end(); ++it)
py_dict[it->first]=it->second;
return py_dict;
}
void MapHolder::clear(){
map_.clear();
}
After implementation lets expose that in Python.
#include <algorithm>
#include <boost/python.hpp>
#include "MapHolder.hpp"
#include "FooTicTacToe.hpp"
BOOST_PYTHON_MODULE(cpp_collections)
{
using namespace boost::python;
class_<MapHolder> ("map", init<>())
.def(init<boost::python::dict&>())
.def("update_map", &MapHolder::update_map)
.def("clear", &MapHolder::clear)
.def("size", &MapHolder::size)
.def("to_dict",&MapHolder::get_dict);
class_<FooTicTacToe, bases<MapHolder> >("FooTicTacToe", init<>())
.def(init<boost::python::dict&>())
.def("is_winner", &FooTicTacToe::FooIsWinner);
}
At this moment ignore FooTicTacToe class.
Compile module with sconstruct file
# Przemek -*- mode: Python; -*-
##Linux Parameters
boost_path='/usr/include/boost/'
python_path='/usr/include/python2.7/'
##End
import platform, os
boost_libs=['boost_python']
env = Environment()
if(platform.system() == "Linux"):
env.Append(CPPPATH=[boost_path,python_path])
#modul pythona
env.SharedLibrary(target='cpp_collections.so',source=['python_wraper.cpp', 'MapHolder.cpp', 'FooTicTacToe.cpp'], LIBS=boost_libs, SHLIBPREFIX='')
else:
print platform.system() + ' not supported'
If You invoke that Python script:
import cpp_collections
sample_dict = {'EUR':'European','USD':'United States Dolar','RON':'Romanian leu','PLN':'Polish Zloty','HUF':'Hungarian forint', '1':'Number as a text'}
sample_map = cpp_collections.map(sample_dict)
print sample_map.size()
print sample_map.to_dict()
sample_map.update_map({'UKC':'Unknown Currency'})
print sample_map.size()
print sample_map.to_dict()
sample_map.clear();
print sample_map.size()
print sample_map.to_dict()
sample_map.update_map({2:'Number as a number'});
print sample_map.size()
print sample_map.to_dict()
You will see result
6
{'HUF': 'Hungarian forint', 'USD': 'United States Dolar', 'RON': 'Romanian leu', '1': 'Number as a text', 'PLN': 'Polish Zloty', 'EUR': 'European'}
7
{'HUF': 'Hungarian forint', 'USD': 'United States Dolar', 'RON': 'Romanian leu', '1': 'Number as a text', 'PLN': 'Polish Zloty', 'UKC': 'Unknown Currency', 'EUR': 'European'}
0
{}
Key invalid, map might be incomplete
0
{}
So we created working boost::python::dict to std::map converter. Lets use it to create Foo TicTacToe game.
#include "MapHolder.hpp"
#include <boost/python.hpp>
#ifndef FooTicTacToe_HPP
#define FooTicTacToe_HPP
class FooTicTacToe: public MapHolder{
public:
FooTicTacToe();
FooTicTacToe(boost::python::dict& py_dict);
bool FooIsWinner();
};
#endif
and method implementation
#include "FooTicTacToe.hpp"
FooTicTacToe::FooTicTacToe():
MapHolder()
{}
FooTicTacToe::FooTicTacToe(boost::python::dict& py_dict):
MapHolder(py_dict)
{}
bool FooTicTacToe::FooIsWinner(){
//TODO that method is realy stupid
return map_["a1"]=="x" && map_["a2"]=="x" && map_["a3"] =="x";
}
Because we used boost python classes inheritance, we have ready methods to simulate moves. Bellow is Python script to simulate really foo game.
#TicTacToe game
ttt_game = cpp_collections.FooTicTacToe({'a1':'x'})
#make moves
ttt_game.update_map({'b1':'o'})
ttt_game.update_map({'a2':'x'})
print 'Is winner?',ttt_game.is_winner()
ttt_game.update_map({'b2':'0'})
ttt_game.update_map({'a3':'x'})
print 'Is winner?',ttt_game.is_winner()
ttt_game.clear()
It produces output
Is winner? False
Is winner? True
Boost.python vs. python simple function exec time comparison
Last time i've decided to compare execution time of simple functions written in Python and C++. To do that i've created simple c++ code exposed as python module by Boost.python library.
Comparing Max execution time:
Python time: 0.283549
Cpp time: 0.589304
Comparing Greatest common divisor execution time:
Python time: 4.236021
Cpp time: 0.746045
Results seems to be pretty obvious. Simple calculations like MAX should be written using native Python packages. Probably most of time was wasted to call cpp module. But when calculations are more complicated, its worth to write them in C++.
#include <algorithm>
#include <boost/python.hpp>
int gcd(int a,int b){
if (b!=0)
return gcd(b,a%b);
else
return a;
}
BOOST_PYTHON_MODULE(cpp_compare)
{
using namespace boost::python;
def("max",
std::max<int>,return_value_policy<copy_const_reference>());
def("gcd",gcd);
}
Module can be compiled on Linux with following Sconstruct file # Przemek -*- mode: Python; -*-
##Linux Parameters
boost_path='/usr/include/boost/'
python_path='/usr/include/python2.7/'
##End
import platform, os
boost_libs=['boost_python']
env = Environment()
if(platform.system() == "Linux"):
env.Append(CPPPATH=[boost_path,python_path])
#modul pythona
env.SharedLibrary(target='cpp_compare.so',source=['python_wraper.cpp'], LIBS=boost_libs, SHLIBPREFIX='')
else:
print platform.system() + ' not supported'
Last but not least comes Python file:
import random
import cpp_compare
import datetime
capacity= 1000000
min_val = -10000
max_val = 100000
compare_data = [(int(random.randint(min_val,max_val)), int(random.randint(min_val,max_val))) for i in xrange(capacity)]
def gcd(a, b):
if b!=0:
return gcd(b, a%b)
else:
return a
def compare_time(fun_name, python_fun, cpp_fun):
print 'Comparing ', fun_name,' execution time:'
start= datetime.datetime.now()
for i, j in compare_data:
python_fun(i,j)#bigger = max(i, j)
end = datetime.datetime.now()
print 'Python time:', (end-start).total_seconds()
start= datetime.datetime.now()
for i, j in compare_data:
cpp_fun(i,j)#bigger = cpp_compare.max(i, j)
end = datetime.datetime.now()
print 'Cpp time:',(end-start).total_seconds()
compare_time('Max',max, cpp_compare.max)
compare_time('Greatest common divisor',gcd, cpp_compare.gcd)
Results from executing python main.pyComparing Max execution time:
Python time: 0.283549
Cpp time: 0.589304
Comparing Greatest common divisor execution time:
Python time: 4.236021
Cpp time: 0.746045
Results seems to be pretty obvious. Simple calculations like MAX should be written using native Python packages. Probably most of time was wasted to call cpp module. But when calculations are more complicated, its worth to write them in C++.
Thursday, January 10, 2013
PMX Crossover (Krzyżowanie PMX )
My implementation of PMX crossover written in R.
bag.crossover <- function (parent1, parent2) {
#Created with inspiration from http://algorytmy-genetyczne.eprace.edu.pl/664,Implementacja.html ,
# which I think has some errors corrected bellow.
if(length(parent1)!=length(parent2))
stop("Parents have different lengths")
parentLength <- length(parent1)
crossLength <- sample(1:parentLength,1,replace=T)#lenght of crossing segment
ibeg <- sample(1:(parentLength-crossLength+1),1,replace=T)#index of begining of crossing segment
iend <- (ibeg+crossLength-1) #index of end
SegmentParent <- matrix(data=c(parent1[ibeg:iend], parent2[ibeg:iend]), byrow = T,nrow=2, ncol=crossLength)#crossing
#segment for both parents
child <-c()
child[ibeg:iend] <-SegmentParent[1,]
for(locus in 1:parentLength){
soughtAllele <-parent2[locus]
saPosParent1 <- which( SegmentParent[1,] == soughtAllele) #soughtAllele position in parent 1
saPosParent2 <- which( SegmentParent[2,] == soughtAllele) #soughtAllele position in parent 2
if (length(saPosParent1)) {#Number occurred yet in crossing segment
next
}
else if (length(saPosParent2) ){#Occured in skipped segment
newSoughtAllele <- soughtAllele
while (length(saPosParent2)){
newLocus <- saPosParent2[1]
newSoughtAllele <- SegmentParent[1,newLocus]
saPosParent2 <- which( SegmentParent[2,] == newSoughtAllele)
if(!length(saPosParent2)){
break
}
}
saPosParent2 <- which( parent2 == newSoughtAllele)
newLocus <- saPosParent2[1]
child[newLocus] <- soughtAllele
}
else{#Doesnt occured in crossing segments. Let it stay, where it is now
child[locus] <- soughtAllele
}
}#for
return (child)
}
Raw can be downloaded from http://pastebin.com/7FMfrzfp
Subscribe to:
Posts (Atom)