美文网首页
分类算法实测

分类算法实测

作者: 匪_3f3e | 来源:发表于2018-11-15 21:12 被阅读0次

决策树算法

  数据集是某学校男性女性的身高体重信息。
数据说明: height 身高 、weight 体重 、category 0男1女 、rand 随机数 、features特征值
相关源码:ClassifyDecisionTree.scala

+------+------+--------+--------------------+------------+
|height|weight|category|                rand|    features|
+------+------+--------+--------------------+------------+
| 177.0|  68.2|       1|0.001924715176714...|[177.0,68.2]|
| 163.2|  55.9|       2|0.002309495097651...|[163.2,55.9]|
| 160.0|  50.2|       2|0.007363393934636031|[160.0,50.2]|
| 175.3|  86.4|       1|0.010757645346996858|[175.3,86.4]|
| 172.7|  66.2|       1|0.010887433925890977|[172.7,66.2]|
| 155.0|  49.2|       2|0.013443416084848447|[155.0,49.2]|
| 162.6|  54.5|       2|0.013660993453225356|[162.6,54.5]|
| 158.0|  55.5|       2|0.017720709335255047|[158.0,55.5]|
| 177.8|  80.9|       1|0.017746365296897437|[177.8,80.9]|
| 179.1|  89.1|       1|0.017900567625543484|[179.1,89.1]|
| 182.9|  85.0|       1|0.020723550349353026|[182.9,85.0]|
| 180.3|  82.6|       1| 0.02109944226439564|[180.3,82.6]|
| 161.3|  70.5|       2|0.021308838567239197|[161.3,70.5]|
| 175.3|  70.9|       1|0.021448847738743337|[175.3,70.9]|
| 180.6|  72.7|       1|0.021585780869558646|[180.6,72.7]|
| 164.5|  70.0|       1|0.025281037559654607|[164.5,70.0]|
| 157.5|  76.8|       2|0.026563453573523965|[157.5,76.8]|
| 188.0|  85.9|       1| 0.03103171572557828|[188.0,85.9]|
| 157.0|  63.0|       2| 0.03160072136660996|[157.0,63.0]|
| 182.4|  74.5|       1|0.032118109945869944|[182.4,74.5]|
+------+------+--------+--------------------+------------+

训练测试结果

+------+------+--------+--------------------+------------+---------------+--------------------+----------+
|height|weight|category|                rand|    features|  rawPrediction|         probability|prediction|
+------+------+--------+--------------------+------------+---------------+--------------------+----------+
| 177.0|  68.2|       1|0.001924715176714...|[177.0,68.2]|  [0.0,5.0,1.0]|[0.0,0.8333333333...|       1.0|
| 175.3|  70.9|       1|0.021448847738743337|[175.3,70.9]| [0.0,13.0,1.0]|[0.0,0.9285714285...|       1.0|
| 180.6|  72.7|       1|0.021585780869558646|[180.6,72.7]| [0.0,98.0,1.0]|[0.0,0.9898989898...|       1.0|
| 182.4|  74.5|       1|0.032118109945869944|[182.4,74.5]| [0.0,98.0,1.0]|[0.0,0.9898989898...|       1.0|
| 197.1|  90.9|       1| 0.03559399587317391|[197.1,90.9]| [0.0,98.0,1.0]|[0.0,0.9898989898...|       1.0|
| 165.7|  73.1|       2| 0.04257206175621786|[165.7,73.1]|[0.0,41.0,28.0]|[0.0,0.5942028985...|       1.0|
| 176.0|  86.4|       1| 0.04607047165343936|[176.0,86.4]|  [0.0,9.0,3.0]|     [0.0,0.75,0.25]|       1.0|
| 165.1|  64.1|       2|  0.0608822940489292|[165.1,64.1]|[0.0,41.0,28.0]|[0.0,0.5942028985...|       1.0|
| 159.2|  51.8|       2|  0.0781592434755799|[159.2,51.8]| [0.0,0.0,66.0]|       [0.0,0.0,1.0]|       2.0|
| 177.3|  73.2|       1| 0.10412537855836879|[177.3,73.2]| [0.0,98.0,1.0]|[0.0,0.9898989898...|       1.0|
| 175.5|  63.2|       1| 0.11522293393520988|[175.5,63.2]|  [0.0,0.0,5.0]|       [0.0,0.0,1.0]|       2.0|
| 167.0|  59.8|       2| 0.13131909453868662|[167.0,59.8]| [0.0,0.0,12.0]|       [0.0,0.0,1.0]|       2.0|
| 177.8|  86.4|       1| 0.14061099947455546|[177.8,86.4]| [0.0,98.0,1.0]|[0.0,0.9898989898...|       1.0|
| 167.4|  53.9|       1| 0.15083577087167999|[167.4,53.9]| [0.0,2.0,24.0]|[0.0,0.0769230769...|       2.0|
| 180.3|  83.2|       1| 0.15463786724535922|[180.3,83.2]| [0.0,98.0,1.0]|[0.0,0.9898989898...|       1.0|
| 160.0|  55.4|       2| 0.17400393305299244|[160.0,55.4]| [0.0,0.0,66.0]|       [0.0,0.0,1.0]|       2.0|
| 152.4|  46.5|       2|  0.1906697626456868|[152.4,46.5]| [0.0,0.0,13.0]|       [0.0,0.0,1.0]|       2.0|
| 183.5|  74.8|       1|  0.1920772640392049|[183.5,74.8]| [0.0,98.0,1.0]|[0.0,0.9898989898...|       1.0|
| 173.5|  81.8|       1| 0.20261133506541862|[173.5,81.8]|[0.0,41.0,28.0]|[0.0,0.5942028985...|       1.0|
| 158.8|  49.1|       2| 0.21372194219989293|[158.8,49.1]| [0.0,0.0,66.0]|       [0.0,0.0,1.0]|       2.0|
+------+------+--------+--------------------+------------+---------------+--------------------+----------+
only showing top 20 rows

accuracy is 0.8297872340425532

朴素贝叶斯算法

数据集采用的是Iris鸢尾花数据集
_c0、_c1、_c2、_c3是花的某种特征的特征值大小
label 表示花的细分种类
相关源码:ClassifyNativeBayes.scala

+---+---+---+---+-----+--------------------+
|_c0|_c1|_c2|_c3|label|                rand|
+---+---+---+---+-----+--------------------+
|4.3|3.0|1.1|0.1|    0|0.003326979325281032|
|5.4|3.4|1.7|0.2|    0|0.009592673729602486|
|6.1|3.0|4.6|1.4|    1|  0.0152037806503027|
|7.9|3.8|6.4|2.0|    2|0.015503439675020214|
|6.7|3.0|5.0|1.7|    1|0.020042734198535972|
|6.4|3.1|5.5|1.8|    2| 0.05476692766370894|
|5.5|2.5|4.0|1.3|    1| 0.05686437116523335|
|4.7|3.2|1.3|0.2|    0|  0.0595954341070446|
|6.9|3.1|5.4|2.1|    2| 0.06726753463099477|
|7.2|3.0|5.8|1.6|    2| 0.07696980523890262|
|6.7|3.3|5.7|2.5|    2| 0.08444880519447917|
|4.6|3.1|1.5|0.2|    0| 0.08524222662857528|
|5.6|2.9|3.6|1.3|    1| 0.10158676661407073|
|4.8|3.0|1.4|0.1|    0| 0.10675364426248701|
|6.3|2.5|4.9|1.5|    1| 0.11310239503362629|
|5.6|2.7|4.2|1.3|    1| 0.11453388616504145|
|5.5|3.5|1.3|0.2|    0| 0.11468327229190811|
|5.8|2.7|5.1|1.9|    2| 0.12196158211354247|
|5.1|3.5|1.4|0.3|    0| 0.12551737888690984|
|4.8|3.4|1.6|0.2|    0| 0.15533175180704428|
+---+---+---+---+-----+--------------------+
only showing top 20 rows

模型训练测试结果


+---+---+---+---+-----+--------------------+-----------------+--------------------+--------------------+----------+
|_c0|_c1|_c2|_c3|label|                rand|         features|       rawPrediction|         probability|prediction|
+---+---+---+---+-----+--------------------+-----------------+--------------------+--------------------+----------+
|4.3|3.0|1.1|0.1|    0|0.003326979325281032|[4.3,3.0,1.1,0.1]|[-9.8758096263421...|[0.74188275591212...|       0.0|
|6.1|3.0|4.6|1.4|    1|  0.0152037806503027|[6.1,3.0,4.6,1.4]|[-22.662166461937...|[0.04640246775348...|       1.0|
|6.4|3.1|5.5|1.8|    2| 0.05476692766370894|[6.4,3.1,5.5,1.8]|[-26.182568778203...|[0.01557191175966...|       1.0|
|5.5|2.5|4.0|1.3|    1| 0.05686437116523335|[5.5,2.5,4.0,1.3]|[-20.166766552586...|[0.05458454513502...|       1.0|
|6.9|3.1|5.4|2.1|    2| 0.06726753463099477|[6.9,3.1,5.4,2.1]|[-27.433208055916...|[0.01236259344067...|       1.0|
|7.2|3.0|5.8|1.6|    2| 0.07696980523890262|[7.2,3.0,5.8,1.6]|[-26.494335876097...|[0.01825264119093...|       1.0|
|5.6|2.9|3.6|1.3|    1| 0.10158676661407073|[5.6,2.9,3.6,1.3]|[-19.897110451150...|[0.09237937741037...|       1.0|
|4.8|3.4|1.6|0.2|    0| 0.15533175180704428|[4.8,3.4,1.6,0.2]|[-11.997775483590...|[0.70646055942144...|       0.0|
|7.4|2.8|6.1|1.9|    2| 0.18358013977287357|[7.4,2.8,6.1,1.9]|[-28.090693830596...|[0.00891391025433...|       1.0|
|4.5|2.3|1.3|0.3|    0| 0.24053166847543628|[4.5,2.3,1.3,0.3]|[-10.370756938066...|[0.56456409063282...|       0.0|
|5.7|3.8|1.7|0.3|    0| 0.24371079476801594|[5.7,3.8,1.7,0.3]|[-13.627203949814...|[0.74744750899684...|       0.0|
|6.1|2.9|4.7|1.4|    1| 0.25897191452004664|[6.1,2.9,4.7,1.4]|[-22.747269522018...|[0.04072359198491...|       1.0|
|6.1|2.6|5.6|1.4|    2| 0.32632952248541935|[6.1,2.6,5.6,1.4]|[-24.165921622143...|[0.01746092235489...|       1.0|
|7.7|2.6|6.9|2.3|    2| 0.34150870108653764|[7.7,2.6,6.9,2.3]|[-31.090843380520...|[0.00261340157005...|       2.0|
|5.2|4.1|1.5|0.1|    0| 0.34961811399305576|[5.2,4.1,1.5,0.1]|[-12.484838515158...|[0.82939929861274...|       0.0|
|4.8|3.1|1.6|0.2|    0| 0.35223492445532156|[4.8,3.1,1.6,0.2]|[-11.671413203895...|[0.66862768689173...|       0.0|
|5.9|3.0|5.1|1.8|    2| 0.35296188357024383|[5.9,3.0,5.1,1.8]|[-24.944438710599...|[0.01785017728176...|       1.0|
|5.1|3.7|1.5|0.4|    0|  0.5390894438157275|[5.1,3.7,1.5,0.4]|[-13.069681739915...|[0.71488057398769...|       0.0|
|5.1|3.8|1.9|0.4|    0|  0.5457874776234811|[5.1,3.8,1.9,0.4]|[-13.954031113067...|[0.66260455445175...|       0.0|
|7.7|2.8|6.7|2.0|    2|  0.5473906288859796|[7.7,2.8,6.7,2.0]|[-29.829888190450...|[0.00523190601445...|       2.0|
+---+---+---+---+-----+--------------------+-----------------+--------------------+--------------------+----------+
only showing top 20 rows

accuracy is 0.7777777777777778

支持向量机SVM

数据集也是使用的iris数据集
SVM算法只支持2分类,所以对数据集进行了筛选,只留下label为0、1的数据
相关源码:ClassifySVM.scala

+---+---+---+---+-----+--------------------+
|_c0|_c1|_c2|_c3|label|                rand|
+---+---+---+---+-----+--------------------+
|5.1|3.5|1.4|0.2|    0|0.005383118037440182|
|5.7|4.4|1.5|0.4|    0|0.007194431761283537|
|7.0|3.2|4.7|1.4|    1|0.033787938439531984|
|4.6|3.2|1.4|0.2|    0| 0.03515755168692547|
|6.7|3.1|4.4|1.4|    1|0.047194768581304225|
|5.5|2.4|3.8|1.1|    1|0.053851496474066396|
|4.9|3.1|1.5|0.1|    0| 0.05504111221690233|
|5.7|2.8|4.1|1.3|    1| 0.05782788372655445|
|4.9|3.0|1.4|0.2|    0|0.060189662689951184|
|4.8|3.4|1.6|0.2|    0| 0.06897490026440856|
|5.4|3.4|1.7|0.2|    0| 0.09155599582098428|
|5.8|2.7|3.9|1.2|    1| 0.09326583469757688|
|6.1|2.8|4.0|1.3|    1|  0.0982254496580297|
|4.9|2.4|3.3|1.0|    1| 0.12326679062811396|
|6.2|2.9|4.3|1.3|    1| 0.12413265352469693|
|6.0|2.9|4.5|1.5|    1| 0.13204735458660521|
|4.4|3.2|1.3|0.2|    0|  0.1403506514781847|
|5.6|3.0|4.1|1.3|    1| 0.14172346739032382|
|6.5|2.8|4.6|1.5|    1| 0.14371681994803165|
|5.1|2.5|3.0|1.1|    1| 0.18510325676932826|
+---+---+---+---+-----+--------------------+

+---+---+---+---+-----+--------------------+-----------------+--------------------+----------+
|_c0|_c1|_c2|_c3|label|                rand|         features|       rawPrediction|prediction|
+---+---+---+---+-----+--------------------+-----------------+--------------------+----------+
|6.1|2.8|4.7|1.2|    1|0.012481961930157603|[6.1,2.8,4.7,1.2]|[-1.6509634270222...|       1.0|
|5.0|3.3|1.4|0.2|    0|0.016082628759263806|[5.0,3.3,1.4,0.2]|[1.29354104026248...|       0.0|
|4.4|2.9|1.4|0.2|    0| 0.22290326246094538|[4.4,2.9,1.4,0.2]|[1.05281732989969...|       0.0|
|5.4|3.0|4.5|1.5|    1|  0.2668875621875405|[5.4,3.0,4.5,1.5]|[-1.4835570568200...|       1.0|
|5.4|3.9|1.3|0.4|    0|  0.3533812726039295|[5.4,3.9,1.3,0.4]|[1.63057317590279...|       0.0|
|6.6|3.0|4.4|1.4|    1|  0.3553239162288241|[6.6,3.0,4.4,1.4]|[-1.6624321643118...|       1.0|
|5.1|3.3|1.7|0.5|    0|  0.5343838606275636|[5.1,3.3,1.7,0.5]|[0.87191902881306...|       0.0|
|5.1|3.4|1.5|0.2|    0|  0.5482515144522366|[5.1,3.4,1.5,0.2]|[1.32991481935905...|       0.0|
|6.7|3.1|4.4|1.4|    1|  0.8046227561337921|[6.7,3.1,4.4,1.4]|[-1.5893019484248...|       1.0|
|5.6|3.0|4.5|1.5|    1|  0.8385862859176035|[5.6,3.0,4.5,1.5]|[-1.5353542100051...|       1.0|
|6.0|2.2|4.0|1.0|    1|  0.9669924229306907|[6.0,2.2,4.0,1.0]|[-1.7716397981170...|       1.0|
+---+---+---+---+-----+--------------------+-----------------+--------------------+----------+

accuracy is 1.0

总结

  总的来说,分类算法的结果还是令人满意的,准确度都还比较高,而且可以通过调参进一步提高准确度。相比回归算法,分类算法更容易得出令人满意的效果,用小数据集也能达到较优的预测效果。由于使用的都是小数据集,结果仅能用于测试和调参。
  可以看出,决策树算法最终的准确度为0.829,朴素贝叶斯算法得出的准确度达到了0.77,SVM算法最终的准确度为1.0。1.0的准确度是有偶然性的,在大数量集的情况下不可能达到这个准确度,经过多次测试,svm的准确度维持在0.8以上,不过需要注意的是svm仅支持2分类。

相关文章

网友评论

      本文标题:分类算法实测

      本文链接:https://www.haomeiwen.com/subject/vixoxqtx.html