美文网首页
Using libsvm - part[1]

Using libsvm - part[1]

作者: ChunYu_Wang | 来源:发表于2017-05-05 09:27 被阅读0次

    Purpose

    libsvm is a tool collection for SVM (Support Vector Machines) related topics created by Chih-Jen Lin, NTU.

    Currently, version 3.22 provides multiple interfaces for Matlab/octave/python and more. I will try to introduce the usage of this powerful toolbox in a pratical way.

    SVM - Support Vector Machnes

    • It is very hard to explain this concept without massive math or numerical procedures, refer to Original paper of libsvm by Chih-Jen Lin if you want to know more then how to use.

    • The idea behind SVMs is to make the original problem linearly separable by applying an non-linear mapping function. The SVM then automatically discovers the optimal separating hyperplane, which indicates we can predict future data sets by comparing with this hyperplane. So, SVM is a tool for CLASSIFICATION and PREDICTION under the hood whose accuracy is determined by the selection of the mapping method.

    • Basic steps for a SVM procedure:

        1. Select a training set of instance-label pairs: P[i]=(x[i],y[i]) where x[i] holds quantitive properties of P[i] and y[i] is a binary label for P[i] which indicates y[i] can only be 1 or 0;
        1. Select a mapping function framework for target SVM, then its parameters will be given by solving an equivalent optimization problem;
        1. Select the hyperplane in mapped space to represent the margin of two values of y;
        1. Classify y[j] for P[j] from test set by applying mapping function to P[j] and comparing relative position with the selected hyperplane in step 3.

    Using libsvm package to solve problem

    Install libsvm package

      1. Download libsvm package from Download SECTION on its homepage;
      1. Untar/unzip the tarball/zip file to obtain the source code;
      1. Check all Makefiles inside the packages, if you are not familiar with make, treat the Makefiles as the method lists for converting the source code into binary;
      1. Make it directly if you just need to use these tools in command line or make it inside subdirs to support other methods like python;
      • 4.1. If you are blocked by "make: g++: Command not found", just install "gcc-c++" package (Fedora) or other C++ compilers.
          # An installation example on FC rawhide
          [chunwang@localhost matlab]$ uname -r 
          4.11.0-0.rc7.git3.1.fc27.x86_64
      
          # Download packages
          [chunwang@localhost libsvm]$ export LIBSVM_URL="http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/libsvm.cgi?+http://www.csie.ntu.edu.tw/~cjlin/libsvm+tar.gz"
          [chunwang@localhost libsvm]$ wget $LIBSVM_URL -O libsvm.tar.gz 2>&1 &>/dev/null; echo $?
          0
      
          # Untar to obtain source code
          [chunwang@localhost libsvm]$ (tar -xvf ./libsvm.tar.gz && rm -f libsvm.tar.gz) 2>&1 &>/dev/null; echo $?
          0
      
          # Check and Make
          [chunwang@localhost libsvm]$ cd libsvm-3.22/
          [chunwang@localhost libsvm-3.22]$ find . -name Makefile
          ./java/Makefile
          ./svm-toy/qt/Makefile
          ./svm-toy/gtk/Makefile
          ./python/Makefile
          ./matlab/Makefile
          ./Makefile
          [chunwang@localhost libsvm-3.22]$ cat ./Makefile|grep all:
          all: svm-train svm-predict svm-scale
          [chunwang@localhost libsvm-3.22]$ rpm -q gcc-c++ || sudo yum install -y gcc-c++
          gcc-c++-7.0.1-0.16.fc27.x86_64
      
          #- Make binary directly
          [chunwang@localhost libsvm-3.22]$ make all &>/dev/null; echo $?
          0
          #- Make for python
          [chunwang@localhost libsvm-3.22]$ cd python/; make &>/dev/null; echo $?; cd ~-
          0
          #- Make for octave
          [chunwang@localhost libsvm-3.22]$ cd matlab/
          [chunwang@localhost matlab]$ octave --eval "make octave" &>/dev/null; echo $?; cd ~-
          0
      

    Using libsvm to analysis

      1. Convert data into libsvm input data form;
      • By reading example file integrated into libsvm package, the form is very easy to parse:
          [chunwang@localhost libsvm-3.22]$ cat ./heart_scale | head -1 
          +1 1:0.708333 2:1 3:1 4:-0.320755 5:-0.105023 6:-1 7:1 8:-0.419847 9:-1 10:-0.225806 12:1 13:-1
      
          # Line[i] == "y[i] j:x[i][j] ..." where y[i] is +1/-1 and j is a static int
          # An convert example using AWK
          [chunwang@localhost libsvm-3.22]$ echo 32,-2,+1 | awk -F"," '{print $NF" 1:"$1" 2:"$2}'
          +1 1:32 2:-2
      
      1. Train a model using processed data input file and obtain result (Using heart_scale as an example, select first 200 lines as training set).
      • Refer to Graphic Interface Section of libsvm homepage to obtain more information for the parameters of svm-train
      • A very simple example using default model (Using binary directly):
          # Turn original data file into 2 target sets
          [chunwang@localhost libsvm-3.22]$ head -200 ./heart_scale > ./heart_scale_train
          [chunwang@localhost libsvm-3.22]$ tail -70 ./heart_scale > ./heart_scale_test
      
          # Train the model by optimization
          [chunwang@localhost libsvm-3.22]$ ./svm-train heart_scale_train 
          *
          optimization finished, #iter = 147
          nu = 0.453249
          obj = -75.742327, rho = 0.439634
          nSV = 105, nBSV = 78
          Total nSV = 105
      
          # Predict and store result into target output file
          [chunwang@localhost libsvm-3.22]$ ./svm-predict heart_scale_test heart_scale_train.model heart_scale_test_output
          Accuracy = 81.4286% (57/70) (classification)
      
          # All test results will be stored in this output file, each line represents the result y[i] for Line[i] == "y[i] j:x[i][j] in test set
          [chunwang@localhost libsvm-3.22]$ cat ./heart_scale_test_output | sort | uniq
          1
          -1
      
          # Some Concepts in svm-train output:
          iter     : Iterations times
          nu       : Kernel function parameter
          obj      : Optimal objective value of the target SVM problem
          nSV      : Number of support vectors
          nBSV     : Number of bounded support vectors
          Accuracy = Correctly predicted data / Ttotal testing data × 100%
      
      • Equivalent processes with python or octave
         # python
      
         [chunwang@localhost python]$ cat ./test.py
         from svmutil import *
      
         y, x = svm_read_problem('../heart_scale')
         model = svm_train(y[:200], x[:200])
         p_label, p_acc, p_val = svm_predict(y[200:], x[200:], model)
      
         --------------------------------------------------------------
      
         # octave
      
         # Matlab or Octave change the input format of the x[i] and y[i] into matrix, so the input procedure is different
         >> [label, data] = libsvmread("../heart_scale")    # Read from data file using libsvmread
         >> model = svmtrain(label(1:200,:), data(1:200,:)) # Generate target SVM model
      
         >> svmpredict(label(201:270,:), data(201:270,:), model)     # Predict with test set and SVM model
         Accuracy = 81.4286% (57/70) (classification)
         ans =
      
            1
         ...
      

    Useful materials

    • [Google Scholar] Chih-Jen Lin - Link
    • [Quora] How to use libsvm in Matlab? - Link

    相关文章

      网友评论

          本文标题:Using libsvm - part[1]

          本文链接:https://www.haomeiwen.com/subject/dnejtxtx.html