Friday, April 26, 2013

[Solved] R SVM test data does not match model

Hi,

Here is my solution to error "test data does not match model !". It occurs, when you try to predict testdata with SVM model from e1071 like bellow
predict(mySVMmodel, type="class", testset)
I found some hint here http://r.789695.n4.nabble.com/Levels-in-new-data-fed-to-SVM-td4654969.html , but in wasn't exactly my case. I lost few hours but I have solution now.

 You have to set factor levels of ALL your columns to be exactly the same as in training data, not only class column...
So you can use sth. like:
(edition: thx to Ting Chi) 

testset$foocolname <- factor(
    testset$foocolname,levels = levels(trainset$foocolname)
)
testset$goocol <- factor(
    testset$goocol,levels = levels(trainset$goocol)
)
etc...
 If it helps, let me know:)

Edit: some tips
  • Error "length of 'center' must equal the number of columns of 'x'" might be somehow connected with factor levels problem. I don't know why, but by using tips from the post i solved that error too.
  • When you assign some factor levels you might get error "number of levels differs". It means, that left side column contains more factor levels than right side column and your idea is probably wrong.


Thursday, March 28, 2013

Scrum Open Assesment

Below i present three too hard for me questions from Scrum Open Assesment. There are also correct answers taken from Open Assesment Feedback.

Question 1:

The Development Team should not be interrupted during the Sprint. The work it selects for the Sprint should not be changed. The Sprint Goal should remain intact. All of these attributes of a Sprint foster creativity, quality and productivity. Based on this, which of the following is false?

A) The Product Owner can help clarify or optimize the Sprint when asked by the Development Team
B) The Sprint Backlog and its contents are fully formulated in the Sprint Planning meeting and do not change durring the Sprint.
C) As a decomposition of the selected Product Backlog Items, the Sprint Backlog changes and may grow as the work emerges.
The Development Team may work with the Product Owner to remove or add work if it finds it has more or less capacity than it expected.

Question 2:

Which statement best describes the Sprint Review?

A) It is a review of the team's activities during the Sprint.
B) It is when the Scrum Team and stakeholders inspect to outcome of the Sprint and figure out what to do in the upcomming Sprint.
C) It is a demo at the end of the Sprint for everyone in the organization to provide feedback on the work done.
D) It is used to congratulate the Development Team if it did what it committed to doing, or to punish the Development Team if it failed to meet its commitments.

Feedback:
Every event in Scrum, besides the Sprint which is a container for the other events, is an opportunity to Inspect AND Adapt.

Question 3:

Which two (2) things does the Development Team not do during the first Sprint?
A) Deliver an increment of potentially shippable functionality.
B) Nail down the complete architecture and infrastrunture.
C) Develop and deliver at least one piece of functionality.
D) Develop a plan for the rest of the project.

Hope it may help you also during (preparation to) real test. Blog is open for discusion about that questions.

Saturday, February 16, 2013

Songsterr free premium account


Hi,

this post isn't really conntected with programming but with free music tabs. Today (16th febuary) I registered free account. Then I came here https://www.songsterr.com/a/wa/plus and I spotted image
So after I clicked that image I got my own reflink

Then i pasted it into my browser, clicked "enter" and now I have premium for two weeks, so I can for example print tabs:)

Let your friends know!

Thursday, February 7, 2013

[Linux] capturing desktop video tool

The best tool in my opinion for recording video in Linux is recordMyDesktop. It has good GUI gtk-recordmydesctop and really good performance. You can define its capture area. All calculations runs after capturing process, so it doesn't slow down your computer during that.

Result is written into .ogv file. It can be converted into avi, mp4 without loosing quality with following commands:


ffmpeg -i input.ogv -vcodec libx264 -vpre medium -crf 24 -threads 0 -acodec libfaac output.mp4


ffmpeg -sameq -i input.ogv output.avi

IxWebHosting promo code

I've recived that message and I don't need that code now.
 Instead, from February 7th to February 15th, we'll be having a Valentine's Day Sale! This sale includes: Up to 50% discount on all shared hosting plans and Expert plans will be discounted to a seductively low $2.96! All you have to do is use our sweet coupon code "bemine".

Thursday, January 31, 2013

Using Boost.Python under Eclipse

This post describes how to configure Eclipse under linux to create Boost.python c++ shared library. I suppose you have Boost and Eclipse Indigo already installed.

First step is to chose File -> New -> C++ project. Then select 'shared library' -> empty project. Name it somehow and finish. Now click Project -> properties, expand C/C++ Build and chose Settings. In GCC C++ compiler click on dictionaries and add /usr/include/boost and /usr/include/python. That paths are correct for me, for You they might be different...


Next step is to change dynamic library name. It must be the same as your BOOST_PYTHON_MODULE(<its name>). So if your module is called 'hello' then your configuration looks like


Next thing is to add -fPIC flag to your compiler settings


Now your library will compile, but You have to add boost_python library to Your linker settings.
You are ready to build library now and import it in python using 'import hello' statement.

Saturday, January 26, 2013

Pytesser only digits recognition

Last time I needed some Python library which recognizes digits from image. I decided to use Pytesser which is wrapper for tesseract.exe - program developed firstly by HP then by Google.
It worked fine with standard text examples.
I had few images containing only digits. They came from really simple captchas (with removed noises and so on..). I was using pytesser function image_to_string and getting some characters, comas, ...:/
I was trying to find option to read only digits. When i got this option it didnt work. I realized, that standard Tesseract within pytesser doesnt support them.
Sollution is: Get the latest version of Tesseract from http://code.google.com/p/tesseract-ocr/downloads/list .
Install it in pytesser directory (for me it was C:/Python27/Lib/pytesser). It will change old tesseract.exe to new one.
Find that line in pytesser.py :
args = [tesseract_exe_name, input_filename, output_filename]
Change it to:
args = [tesseract_exe_name, input_filename, output_filename, 'nobatch', 'digits']

For me it works fine!
PS:

That configuration recognizes also 'dot' and 'minus'. If You don't want that functionality then go into tessdata\configs directory, find digits file, open it and change:
tessedit_char_whitelist 0123456789.-
into
tessedit_char_whitelist 0123456789

Monday, January 14, 2013

Boost::python::dict to std::map conversion

In this post there are code listings for std::map wraper to python. All files can be easily downloaded from https://sites.google.com/site/ppiotrowblog/home/files file: map.zip.

First listing is for code, which converts boost::python::dict into std::map.

 #include <map>  
 #include <boost/python.hpp>  
 #include <string>  
   
 #ifndef MAPHOLDER_HPP  
 #define     MAPHOLDER_HPP  
 typedef std::map<std::string,std::string> StringMap;  
 class MapHolder{  
 public:  
      /*Constructors*/  
      MapHolder();  
      MapHolder(boost::python::dict& py_dict);  
        
      /*Modifiers*/  
      void update_map(boost::python::dict& py_dict);  
      void clear();  
   
      boost::python::dict get_dict();  
      size_t size();  
 protected:  
      StringMap map_;  
        
 };  
 #endif  
It allows to create empty map or map with values from python dict. It also allows to clear map or add some elements with update_map method. Here comes methods implementation
 #include "MapHolder.hpp"  
 #include <iostream>  
 MapHolder::MapHolder(){  
   
 };  
   
   
 MapHolder::MapHolder(boost::python::dict& py_dict){  
      update_map(py_dict);  
 }  
   
   
 void MapHolder::update_map(boost::python::dict& py_dict){  
      boost::python::list keys = py_dict.keys();  
        for (int i = 0; i < len(keys); ++i) {  
           boost::python::extract<std::string> extracted_key(keys[i]);  
           if(!extracted_key.check()){  
                std::cout<<"Key invalid, map might be incomplete"<<std::endl;  
                continue;                 
           }  
           std::string key = extracted_key;  
           boost::python::extract<std::string> extracted_val(py_dict[key]);  
           if(!extracted_val.check()){  
           std::cout<<"Value invalid, map might be incomplete"<<std::endl;  
                continue;                 
           }  
           std::string value = extracted_val;  
           map_[key] = value;  
      }  
 }  
   
   
 boost::python::dict MapHolder::get_dict(){  
      boost::python::dict py_dict;  
      for(StringMap::const_iterator it = map_.begin(); it != map_.end(); ++it)   
           py_dict[it->first]=it->second;        
      return py_dict;  
 }  
   
   
 void MapHolder::clear(){  
      map_.clear();  
 }  
After implementation lets expose that in Python.
 #include <algorithm>  
 #include <boost/python.hpp>  
 #include "MapHolder.hpp"  
 #include "FooTicTacToe.hpp"  
   
 BOOST_PYTHON_MODULE(cpp_collections)  
 {  
   using namespace boost::python;  
    class_<MapHolder> ("map", init<>())  
       .def(init<boost::python::dict&>())  
        .def("update_map", &MapHolder::update_map)  
       .def("clear", &MapHolder::clear)  
       .def("size", &MapHolder::size)  
        .def("to_dict",&MapHolder::get_dict);  
   
    class_<FooTicTacToe, bases<MapHolder> >("FooTicTacToe", init<>())  
        .def(init<boost::python::dict&>())  
        .def("is_winner", &FooTicTacToe::FooIsWinner);  
 }  
At this moment ignore FooTicTacToe class. Compile module with sconstruct file
 # Przemek  -*- mode: Python; -*-  
 ##Linux Parameters  
 boost_path='/usr/include/boost/'  
 python_path='/usr/include/python2.7/'  
 ##End  
   
 import platform, os  
   
 boost_libs=['boost_python']  
 env = Environment()  
 if(platform.system() == "Linux"):  
   env.Append(CPPPATH=[boost_path,python_path])  
 #modul pythona  
   env.SharedLibrary(target='cpp_collections.so',source=['python_wraper.cpp', 'MapHolder.cpp', 'FooTicTacToe.cpp'], LIBS=boost_libs, SHLIBPREFIX='')  
 else:  
   print platform.system() + ' not supported'  
If You invoke that Python script:
 import cpp_collections  
   
 sample_dict = {'EUR':'European','USD':'United States Dolar','RON':'Romanian leu','PLN':'Polish Zloty','HUF':'Hungarian forint', '1':'Number as a text'}  
 sample_map = cpp_collections.map(sample_dict)  
 print sample_map.size()  
 print sample_map.to_dict()  
   
 sample_map.update_map({'UKC':'Unknown Currency'})  
 print sample_map.size()  
 print sample_map.to_dict()  
   
 sample_map.clear();  
 print sample_map.size()  
 print sample_map.to_dict()  
   
   
 sample_map.update_map({2:'Number as a number'});  
 print sample_map.size()  
 print sample_map.to_dict()  
You will see result
 6  
 {'HUF': 'Hungarian forint', 'USD': 'United States Dolar', 'RON': 'Romanian leu', '1': 'Number as a text', 'PLN': 'Polish Zloty', 'EUR': 'European'}  
 7  
 {'HUF': 'Hungarian forint', 'USD': 'United States Dolar', 'RON': 'Romanian leu', '1': 'Number as a text', 'PLN': 'Polish Zloty', 'UKC': 'Unknown Currency', 'EUR': 'European'}  
 0  
 {}  
 Key invalid, map might be incomplete  
 0  
 {}  
   
So we created working boost::python::dict to std::map converter. Lets use it to create Foo TicTacToe game.
 #include "MapHolder.hpp"  
 #include <boost/python.hpp>  
   
 #ifndef FooTicTacToe_HPP  
 #define     FooTicTacToe_HPP  
   
 class FooTicTacToe: public MapHolder{  
 public:  
      FooTicTacToe();  
      FooTicTacToe(boost::python::dict& py_dict);  
      bool FooIsWinner();  
 };  
   
 #endif  
and method implementation
 #include "FooTicTacToe.hpp"  
   
 FooTicTacToe::FooTicTacToe():  
      MapHolder()  
      {}  
   
   
 FooTicTacToe::FooTicTacToe(boost::python::dict& py_dict):  
      MapHolder(py_dict)  
      {}  
   
 bool FooTicTacToe::FooIsWinner(){  
 //TODO that method is realy stupid  
   
 return map_["a1"]=="x" && map_["a2"]=="x" && map_["a3"] =="x";  
   
 }  
Because we used boost python classes inheritance, we have ready methods to simulate moves. Bellow is Python script to simulate really foo game.
   
 #TicTacToe game  
 ttt_game = cpp_collections.FooTicTacToe({'a1':'x'})  
   
 #make moves  
 ttt_game.update_map({'b1':'o'})  
 ttt_game.update_map({'a2':'x'})  
 print 'Is winner?',ttt_game.is_winner()  
 ttt_game.update_map({'b2':'0'})  
 ttt_game.update_map({'a3':'x'})  
 print 'Is winner?',ttt_game.is_winner()  
 ttt_game.clear()  
It produces output
 Is winner? False  
 Is winner? True  

Boost.python vs. python simple function exec time comparison

Last time i've decided to compare execution time of simple functions written in Python and C++. To do that i've created simple c++ code exposed as python module by Boost.python library.

 #include <algorithm>  
 #include <boost/python.hpp>  
 int gcd(int a,int b){  
   if (b!=0)  
     return gcd(b,a%b);  
   else  
     return a;  
 }  
 BOOST_PYTHON_MODULE(cpp_compare)  
 {  
   using namespace boost::python;  
   def("max",   
     std::max<int>,return_value_policy<copy_const_reference>());  
   def("gcd",gcd);  
 }  
Module can be compiled on Linux with following Sconstruct file


 # Przemek  -*- mode: Python; -*-  
 ##Linux Parameters  
 boost_path='/usr/include/boost/'  
 python_path='/usr/include/python2.7/'  
 ##End  
 import platform, os  
 boost_libs=['boost_python']  
 env = Environment()  
 if(platform.system() == "Linux"):  
   env.Append(CPPPATH=[boost_path,python_path])  
 #modul pythona  
   env.SharedLibrary(target='cpp_compare.so',source=['python_wraper.cpp'], LIBS=boost_libs, SHLIBPREFIX='')  
 else:  
   print platform.system() + ' not supported'  
Last but not least comes Python file:
 import random  
 import cpp_compare  
 import datetime  
 capacity= 1000000  
 min_val = -10000  
 max_val = 100000  
 compare_data = [(int(random.randint(min_val,max_val)), int(random.randint(min_val,max_val))) for i in xrange(capacity)]  
 def gcd(a, b):  
   if b!=0:  
     return gcd(b, a%b)  
   else:  
       return a  
 def compare_time(fun_name, python_fun, cpp_fun):   
     print 'Comparing ', fun_name,' execution time:'  
     start= datetime.datetime.now()  
     for i, j in compare_data:  
       python_fun(i,j)#bigger = max(i, j)  
     end = datetime.datetime.now()  
     print 'Python time:', (end-start).total_seconds()  
     start= datetime.datetime.now()  
     for i, j in compare_data:  
        cpp_fun(i,j)#bigger = cpp_compare.max(i, j)  
     end = datetime.datetime.now()  
     print 'Cpp time:',(end-start).total_seconds()  
 compare_time('Max',max, cpp_compare.max)  
 compare_time('Greatest common divisor',gcd, cpp_compare.gcd)  
Results from executing python main.py
Comparing Max execution time:
Python time: 0.283549
Cpp time: 0.589304

Comparing Greatest common divisor execution time:
Python time: 4.236021
Cpp time: 0.746045

Results seems to be pretty obvious. Simple calculations like MAX should be written using native Python packages. Probably most of time was wasted to call cpp module. But when calculations are more complicated, its worth to write them in C++.

Thursday, January 10, 2013

PMX Crossover (Krzyżowanie PMX )

My implementation of PMX crossover written in R.
 bag.crossover <- function (parent1, parent2) {  
 #Created with inspiration from http://algorytmy-genetyczne.eprace.edu.pl/664,Implementacja.html ,
 # which I think has some errors corrected bellow.  
   if(length(parent1)!=length(parent2))  
           stop("Parents have different lengths")  
      parentLength <- length(parent1)  
      crossLength <- sample(1:parentLength,1,replace=T)#lenght of crossing segment  
      ibeg <- sample(1:(parentLength-crossLength+1),1,replace=T)#index of begining of crossing segment  
      iend <- (ibeg+crossLength-1) #index of end  
      SegmentParent <- matrix(data=c(parent1[ibeg:iend], parent2[ibeg:iend]), byrow = T,nrow=2, ncol=crossLength)#crossing 
   #segment for both parents  
      child <-c()  
      child[ibeg:iend] <-SegmentParent[1,]  
      for(locus in 1:parentLength){  
           soughtAllele <-parent2[locus]  
           saPosParent1 <- which( SegmentParent[1,] == soughtAllele) #soughtAllele position in parent 1  
           saPosParent2 <- which( SegmentParent[2,] == soughtAllele) #soughtAllele position in parent 2  
           if (length(saPosParent1)) {#Number occurred yet in crossing segment  
                next  
           }  
           else if (length(saPosParent2) ){#Occured in skipped segment  
                newSoughtAllele <- soughtAllele  
                while (length(saPosParent2)){  
                     newLocus <- saPosParent2[1]  
                     newSoughtAllele <- SegmentParent[1,newLocus]  
                     saPosParent2 <- which( SegmentParent[2,] == newSoughtAllele)  
                     if(!length(saPosParent2)){  
                          break  
                     }                      
                }  
                saPosParent2 <- which( parent2 == newSoughtAllele)  
                newLocus <- saPosParent2[1]  
                child[newLocus] <- soughtAllele  
           }  
           else{#Doesnt occured in crossing segments. Let it stay, where it is now  
                child[locus] <- soughtAllele  
                }  
      }#for  
      return (child)  
 }  
Raw can be downloaded from http://pastebin.com/7FMfrzfp