源程序下载地址,本机电脑安装java环境,具体环境安装可自行百度,google.
用以实现用 mRMR 从特征集中提取特征的程序(python)
#inport neccesary bags
import csv#用来保存csv文件
import pandas as pd
import numpy as np
import re
import os#用来调用系统程序
#改变默认文件夹位置
os.chdir("XXX")
#input path name
datapath ="XXX"
#output path name
outputpath="XXX"
"""
mrmr and svm
"""
#read csv data from path
train_data = pd.read_csv(datapath, header=None, index_col=None)
X = np.array(train_data)
Y = list(map(lambda x: 1, xrange(len(train_data) // 2)))
Y2 = list(map(lambda x: 0, xrange(len(train_data) // 2)))
Y.extend(Y2)
Y=np.array(Y)
Y=Y.reshape(2260,1)
#concatenate class and data
full_csv_with_class=np.concatenate([Y,X],axis=1)
print full_csv_with_class
#print the results of original csv data and final full data
print "the shape of data:"+str(X.shape)
print "the shape of data and class:"+str(full_csv_with_class.shape)
#generating virtual headers
columns=["class"]
columns_numbers=np.arange(full_csv_with_class.shape[1]-1)
columns.extend(columns_numbers)
# Write data into files
csvFile2 = open(outputpath,'w')
writer = csv.writer(csvFile2)
m = len(full_csv_with_class)
writer.writerow(columns)
for i in range(m):
writer.writerow(full_csv_with_class[i])
csvFile2.close()
[[ 1. 1. 1. ..., 0. 1. 0.075]
[ 1. 0. 0. ..., 1. 1. 0.1 ]
[ 1. 1. 0. ..., 1. 0. 0.175]
...,
[ 0. 0. 0. ..., 1. 1. 0.075]
[ 0. 0. 0. ..., 0. 1. 0.025]
[ 0. 0. 0. ..., 0. 1. 0.05 ]]
the shape of data:(2260, 200)
the shape of data and class:(2260, 201)
os.system("./mRMR/mrmr -i "+outputpath+" -n 200 >mRMR/output.mrmrout")
print "complete "
complete
#读取文件
fn=open("mRMR/output.mrmrout",'r')
location_mark=0
final_set=[]
for line in fn.readlines():
if line.strip() =="":
location_mark=0
if location_mark==1 and line.split()[1]!="Fea":
final_set.append(int(line.split()[1]))
if re.findall(r"mRMR",line) and re.findall(r"feature",line):
location_mark=1
print final_set
[133, 135, 140, 130, 145, 110, 115, 105, 120, 125, 150, 102, 185, 190, 180, 195, 100, 160, 165, 155, 170, 175, 101, 5, 85, 95, 98, 90, 99, 200, 177, 33, 50, 14, 8, 149, 109, 94, 121, 134, 113, 84, 21, 156, 71, 31, 6, 59, 189, 158, 122, 176, 58, 46, 64, 188, 10, 1, 38, 184, 19, 138, 2, 159, 81, 181, 44, 199, 26, 63, 82, 45, 148, 114, 172, 183, 32, 7, 48, 131, 146, 163, 83, 39, 49, 171, 80, 132, 197, 77, 88, 56, 9, 157, 198, 75, 164, 147, 70, 76, 196, 27, 182, 25, 96, 127, 13, 57, 126, 65, 107, 34, 108, 60, 139, 69, 55, 89, 30, 35, 40, 106, 20, 15, 104, 97, 111, 18, 103, 41, 78, 116, 61, 192, 3, 43, 67, 23, 118, 191, 4, 11, 194, 119, 66, 17, 87, 137, 136, 167, 141, 53, 117, 154, 28, 86, 42, 151, 52, 74, 68, 193, 51, 22, 179, 153, 62, 186, 152, 169, 12, 161, 129, 112, 166, 93, 47, 79, 162, 128, 29, 16, 143, 36, 187, 168, 144, 73, 124, 91, 54, 174, 178, 24, 173, 37, 142, 72, 123, 92]
precision_copy=0
recall_copy=0
SN_copy=0
SP_copy=0
GM_copy=0
TP_copy=0
TN_copy=0
FP_copy=0
FN_copy=0
ACC_copy=0
F1_Score_copy=0
F_measure_copy=0
MCC_copy=0
pos_copy=0
neg_copy=0
y_pred_prob_copy=[]
y_pred_copy=[]
关键语句:
os.system("./mRMR/mrmr -i "+outputpath+" -n 200 >mRMR/output.mrmrout")
- ./mRMR/mrmr代表执行程序,也即最上面github里面下载的
- -i outputpath代表输出的csv地址,也即原始特诊集合(一下会说明)
- -n 200代表选取200维度,一次从得分排列
-
mRMR/output.mrmrout代表输出的文件(文件情况如下)
aHR0cDovL2ltZy5ibG9nLmNzZG4ubmV0LzIwMTgwMzIzMDkzNTQ5MDI4.png
csv格式需要特别说明,分类的类别需要在第一行,同时必须要有columns的标签(class一行必须有)
[133, 135, 140, 130, 145, 110, 115, 105, 120, 125, 150, 102, 185, 190, 180, 195, 100, 160, 165, 155, 170, 175, 101, 5, 85, 95, 98, 90, 99, 200, 177, 33, 50, 14, 8, 149, 109, 94, 121, 134, 113, 84, 21, 156, 71, 31, 6, 59, 189, 158, 122, 176, 58, 46, 64, 188, 10, 1, 38, 184, 19, 138, 2, 159, 81, 181, 44, 199, 26, 63, 82, 45, 148, 114, 172, 183, 32, 7, 48, 131, 146, 163, 83, 39, 49, 171, 80, 132, 197, 77, 88, 56, 9, 157, 198, 75, 164, 147, 70, 76, 196, 27, 182, 25, 96, 127, 13, 57, 126, 65, 107, 34, 108, 60, 139, 69, 55, 89, 30, 35, 40, 106, 20, 15, 104, 97, 111, 18, 103, 41, 78, 116, 61, 192, 3, 43, 67, 23, 118, 191, 4, 11, 194, 119, 66, 17, 87, 137, 136, 167, 141, 53, 117, 154, 28, 86, 42, 151, 52, 74, 68, 193, 51, 22, 179, 153, 62, 186, 152, 169, 12, 161, 129, 112, 166, 93, 47, 79, 162, 128, 29, 16, 143, 36, 187, 168, 144, 73, 124, 91, 54, 174, 178, 24, 173, 37, 142, 72, 123, 92]
这些数字是从mRMR/output.mrmrout里面提取出来的特征维度的排序
读者可根据这些排序的维度逐渐提取以寻找最优的维度集合。
重申mrmr程序和特征提取程序地址
网友评论