美文网首页
KDD CUP99 数据挖掘(1)——数据读取,将txt存为cs

KDD CUP99 数据挖掘(1)——数据读取,将txt存为cs

作者: _曹杰 | 来源:发表于2017-12-19 20:32 被阅读0次

    最近在处理KDDcup99的数据,将自己遇到的问题和方法记录下来,以分享给大家。

    资源整合

    KDD CUP1999的数据集下载地址 http://kdd.ics.uci.edu/databases/kddcup99/下载地址

    KDD CUP1999的数据集的介绍  KDD CUP 99数据集

    KDD CUP1999 的数据集参考项目下载 可供参考项目地址,下载代码可运行

    WEKA学习PPT https://pan.baidu.com/s/1slTz5Bf学习文件

    数据下载

    下载的KDDCUP99的数据文件是这样的

    kddcup.namesAlist of features.

    kddcup.data.gzThefull data set (18M; 743M Uncompressed)

    kddcup.data_10_percent.gzA10% subset. (2.1M; 75M Uncompressed)

    kddcup.newtestdata_10_percent_unlabeled.gz(1.4M; 45M Uncompressed)

    kddcup.testdata.unlabeled.gz(11.2M;430M Uncompressed)

    kddcup.testdata.unlabeled_10_percent.gz(1.4M;45M Uncompressed)

    corrected.gzTestdata with corrected labels.

    training_attack_typesAlist of intrusion types.

    数据集的介绍请看链接1,把corrected.data文件作为训练集,kddcup.data_10_percent作为测试集即可。

    数据读取

    下载的文本是纯文本文件,用NotePad++打开另存为.txt文件,方便python读取。下面我做的工作就是添加标签,然后把txt文件另存为csv文件

    纯文本文件

    添加标识,标识为连接1的文章所示,python代码


    import pandas as pd

    col_names = ["duration","protocol_type","service","flag","src_bytes",

     "dst_bytes","land","wrong_fragment","urgent","hot","num_failed_logins",

     "logged_in","num_compromised","root_shell","su_attempted","num_root",

    "num_file_creations","num_shells","num_access_files","num_outbound_cmds",

    "is_host_login","is_guest_login","count","srv_count","serror_rate",

     "srv_serror_rate","rerror_rate","srv_rerror_rate","same_srv_rate",

    "diff_srv_rate","srv_diff_host_rate","dst_host_count","dst_host_srv_count",

     "dst_host_same_srv_rate","dst_host_diff_srv_rate","dst_host_same_src_port_rate",

     "dst_host_srv_diff_host_rate","dst_host_serror_rate","dst_host_srv_serror_rate",

    "dst_host_rerror_rate","dst_host_srv_rerror_rate","label"]  #42个标识

    data = pd.read_table("corrected.txt",header=None, sep=',',names = col_names)

    print(data.head(10))       #查看前10行

    data.to_csv("corrected.csv")    #另存为csv文件


    提前用excel创建空的corrected.csv文件,要不然报错文件不存在(路径都是绝对路径)。

    添加标识后的csv文件

    下一步数据预处理。。。。。请看下期。

    相关文章

      网友评论

          本文标题:KDD CUP99 数据挖掘(1)——数据读取,将txt存为cs

          本文链接:https://www.haomeiwen.com/subject/bkviwxtx.html